2017-09-25 15:48:29 +02:00
![Lookyloo icon ](lookyloo/static/lookyloo.jpeg )
2017-07-23 19:56:51 +02:00
*Lookyloo* is a web interface allowing to scrape a website and then displays a
tree of domains calling each other.
2017-08-12 17:45:33 +02:00
# What is that name?!
2017-07-23 19:56:51 +02:00
```
1. People who just come to look.
2. People who go out of their way to look at people or something often causing crowds and more disruption.
3. People who enjoy staring at watching other peoples misfortune. Oftentimes car onlookers to car accidents.
Same as Looky Lou; often spelled as Looky-loo (hyphen) or lookylou
In L.A. usually the lookyloo's cause more accidents by not paying full attention to what is ahead of them.
```
2017-10-02 12:19:55 +02:00
Source: [Urban Dictionary ](https://www.urbandictionary.com/define.php?term=lookyloo )
2017-07-23 19:56:51 +02:00
2018-01-05 17:09:42 +01:00
# Screenshot
![Screenshot of Lookyloo ](doc/example.png )
2017-07-23 19:56:51 +02:00
2017-08-12 17:45:33 +02:00
# Implementation details
2017-07-23 19:56:51 +02:00
2017-10-02 12:19:55 +02:00
This code is very heavily inspired by [webplugin ](https://github.com/etetoolkit/webplugin ) and adapted to use flask as backend.
2017-07-23 19:56:51 +02:00
2019-01-23 15:13:29 +01:00
The two core dependencies of this project are the following:
2017-07-23 19:56:51 +02:00
2019-01-23 15:13:29 +01:00
* [ETE Toolkit ](http://etetoolkit.org/ ): A Python framework for the analysis and visualization of trees.
* [Splash ](https://splash.readthedocs.io/en/stable/ ): Lightweight, scriptable browser as a service with an HTTP API
2017-07-23 19:56:51 +02:00
2017-08-12 19:11:02 +02:00
2019-01-23 15:13:29 +01:00
# Installation
**IMPORTANT**: Use [pipenv ](https://pipenv.readthedocs.io/en/latest/ )
**NOTE**: Yes, it requires python3.6+. No, it will never support anything older.
## Installation of Splash
2017-08-12 17:45:33 +02:00
2017-10-02 12:19:55 +02:00
You need a running splash instance, preferably on [docker ](https://splash.readthedocs.io/en/stable/install.html )
2017-08-12 17:45:33 +02:00
```bash
sudo apt install docker.io
sudo docker pull scrapinghub/splash
2019-03-29 20:11:44 +01:00
sudo docker run -p 8050:8050 -p 5023:5023 scrapinghub/splash --disable-ui --disable-lua --disable-browser-caches
2017-08-12 20:45:57 +02:00
# On a server with a decent abount of RAM, you may want to run it this way:
2019-03-29 20:11:44 +01:00
# sudo docker run -p 8050:8050 -p 5023:5023 scrapinghub/splash --disable-ui -s 100 --disable-lua -m 50000 --disable-browser-caches
2017-08-12 17:45:33 +02:00
```
2019-02-05 14:41:32 +01:00
## Install redis
```bash
git clone https://github.com/antirez/redis.git
cd redis
git checkout 5.0
make
cd ..
```
2019-01-23 15:13:29 +01:00
## Installation of Lookyloo
2017-08-12 17:45:33 +02:00
```bash
2019-02-05 14:41:32 +01:00
git clone https://github.com/CIRCL/lookyloo.git
2019-01-23 15:13:29 +01:00
cd lookyloo
pipenv install
echo LOOKYLOO_HOME="'`pwd`'" > .env
2017-07-23 19:56:51 +02:00
```
2019-01-23 15:13:29 +01:00
# Run the app
2017-08-12 20:12:14 +02:00
```bash
2019-02-05 14:41:32 +01:00
pipenv run start.py
2017-08-12 20:12:14 +02:00
```
2019-01-23 15:13:29 +01:00
# Run the app in production
2017-08-12 20:12:14 +02:00
## With a reverse proxy (Nginx)
```bash
pip install uwsgi
```
2019-01-23 15:13:29 +01:00
## Config files
2017-08-12 20:12:14 +02:00
You have to configure the two following files:
2017-10-02 12:19:55 +02:00
* `etc/nginx/sites-available/lookyloo`
2017-08-12 20:12:14 +02:00
* `etc/systemd/system/lookyloo.service`
And copy them to the appropriate directories and run the following command:
2017-08-12 20:40:08 +02:00
```bash
sudo ln -s /etc/nginx/sites-available/lookyloo /etc/nginx/sites-enabled
```
If needed, remove the default site
```bash
sudo rm /etc/nginx/sites-enabled/default
```
2017-08-12 20:12:14 +02:00
Make sure everything is working:
```bash
sudo systemctl start lookyloo
sudo systemctl enable lookyloo
sudo nginx -t
# If it is cool:
2017-10-02 12:19:55 +02:00
sudo service nginx restart
2017-08-12 20:12:14 +02:00
```
2017-10-02 12:30:35 +02:00
And you can open ```http://< IP-or-domain > /```
2017-08-12 20:40:08 +02:00
2017-10-02 12:19:55 +02:00
Now, you should configure [TLS (let's encrypt and so on) ](https://www.digitalocean.com/community/tutorials/how-to-secure-nginx-with-let-s-encrypt-on-ubuntu-16-04 )
2017-08-12 20:40:08 +02:00
2018-04-08 18:11:48 +02:00
# Run the app with Docker
2018-04-08 18:19:05 +02:00
## Dockerfile
The repository includes a [Dockerfile ](Dockerfile ) for building a containerized instance of the app.
2018-04-08 18:11:48 +02:00
2018-04-08 18:19:05 +02:00
Lookyloo stores the scraped data in /lookyloo/scraped. If you want to persist the scraped data between runs it is sufficient to define a volume for this directory.
## Running a complete setup with Docker Compose
2018-04-08 18:11:48 +02:00
Additionally you can start a complete setup, including the necessary Docker instance of splashy, by using
Docker Compose and the included service definition in [docker-compose.yml ](docker-compose.yml ) by running
```
docker-compose up
```
2019-01-23 15:13:29 +01:00
After building and startup is complete lookyloo should be available at [http://localhost:5000/ ](http://localhost:5000/ )
2018-04-08 18:11:48 +02:00
2019-01-23 15:13:29 +01:00
If you want to persist the data between different runs uncomment the "volumes" definition in the last two lines of
[docker-compose.yml ](docker-compose.yml ) and define a data storage directory in your Docker host system there.