chg: Update readme to point to the website

pull/96/head
Raphaël Vinot 2020-10-06 11:49:47 +02:00
parent bc4cf6a3a5
commit 6066bb1a4c
1 changed files with 12 additions and 136 deletions

148
README.md
View File

@ -1,16 +1,15 @@
![Lookyloo icon](website/web/static/lookyloo.jpeg)
[![Lookyloo icon](website/web/static/lookyloo.jpeg)](https://www.lookyloo.eu/docs/main/index.html)
[![Gitter](https://badges.gitter.im/lookyloo-app/community.svg)](https://gitter.im/lookyloo-app/community)
*Lookyloo* is a web interface allowing to scrape a website and then displays a
tree of domains calling each other.
*[Lookyloo](https://www.lookyloo.eu/docs/main/index.html)* is a web interface allowing
to scrape a website and then displays a tree of domains calling each other.
Thank you very much [Tech Blog @ willshouse.com](https://techblog.willshouse.com/2012/01/03/most-common-user-agents/)
for the up-to-date list of UserAgents.
# What is that name?!
```
1. People who just come to look.
2. People who go out of their way to look at people or something often causing crowds and more disruption.
@ -21,147 +20,24 @@ In L.A. usually the lookyloo's cause more accidents by not paying full attention
Source: [Urban Dictionary](https://www.urbandictionary.com/define.php?term=lookyloo)
# Screenshot
# No, really, what is Lookyloo?
![Screenshot of Lookyloo](doc/example.png)
Find all you need to know about Lookyloo [on the website](https://www.lookyloo.eu/docs/main/index.html).
# Implementation details
This code is very heavily inspired by [webplugin](https://github.com/etetoolkit/webplugin) and adapted to use flask as backend.
The two core dependencies of this project are the following:
* [ETE Toolkit](http://etetoolkit.org/): A Python framework for the analysis and visualization of trees.
* [Splash](https://splash.readthedocs.io/en/stable/): Lightweight, scriptable browser as a service with an HTTP API
# Cookies
If you want to scrape a website as if you were loggged in, you need to pass your sessions cookies.
You can do it the the folloing way:
1. Install [Cookie Quick Manager](https://addons.mozilla.org/en-US/firefox/addon/cookie-quick-manager/)
2. Click onthe icon in the top right of your browser > Manage all cookies
3. Search for a domain, tick the Sub-domain box if needed
4. Right clock on the domain you want to export > save to file > $LOOKYLOO_HOME/cookies.json
Then, you need to restart the webserver and from now on, every cookies you have in that file will be available for the browser used by Splash
# Python client
You can use `pylookyloo` as a standalone script, or as a library, [more details here](https://github.com/Lookyloo/lookyloo/tree/master/client)
![Screenshot of Lookyloo capturing Github](https://www.lookyloo.eu/docs/main/_images/sample_github.png)
# Installation
**IMPORTANT**: Use [poetry](https://github.com/python-poetry/poetry/#installation)
Please refer to the [install guide](https://www.lookyloo.eu/docs/main/install-lookyloo.html).
**NOTE**: Yes, it requires python3.7+. No, it will never support anything older.
# Python client
**NOTE**: If you want to run a public instance, you should set `only_global_lookups=True`
in `website/web/__init__.py` and `bin/async_scrape.py` to disallow scraping of private IPs.
`pylookyloo` is the recommended client to interact with a Lookyloo instance.
## Installation of Splash
You need a running splash instance, preferably on [docker](https://splash.readthedocs.io/en/stable/install.html)
It is avaliable on PyPi, so you can install it this way:
```bash
sudo apt install docker.io
sudo docker pull scrapinghub/splash
sudo docker run -p 8050:8050 -p 5023:5023 scrapinghub/splash --disable-browser-caches
# On a server with a decent abount of RAM, you may want to run it this way:
# sudo docker run -p 8050:8050 -p 5023:5023 scrapinghub/splash -s 100 -m 50000 --disable-browser-caches
pip install pylookyloo
```
## Install redis
```bash
git clone https://github.com/antirez/redis.git
cd redis
git checkout 5.0
make
cd ..
```
## Installation of Lookyloo
```bash
git clone https://github.com/Lookyloo/lookyloo.git
cd lookyloo
poetry install
echo LOOKYLOO_HOME="'`pwd`'" > .env
poetry run update
```
# Run the app
```bash
poetry run start
```
# Run the app in production
## With a reverse proxy (Nginx)
```bash
pip install uwsgi
```
## Config files
You have to configure the two following files:
* `etc/nginx/sites-available/lookyloo`
* `etc/systemd/system/lookyloo.service`
Copy them to the appropriate directories, and run the following command:
```bash
sudo ln -s /etc/nginx/sites-available/lookyloo /etc/nginx/sites-enabled
```
If needed, remove the default site
```bash
sudo rm /etc/nginx/sites-enabled/default
```
Make sure everything is working:
```bash
sudo systemctl start lookyloo
sudo systemctl enable lookyloo
sudo nginx -t
# If it is cool:
sudo service nginx restart
```
And you can open ```http://<IP-or-domain>/```
Now, you should configure [TLS (let's encrypt and so on)](https://www.digitalocean.com/community/tutorials/how-to-secure-nginx-with-let-s-encrypt-on-ubuntu-16-04)
# Use aquarium for a reliable multi-users app
Aquarium is a haproxy + splash bundle that will allow lookyloo to be used by more than one user at once.
The initial version of the project was created by [TeamHG-Memex](https://github.com/TeamHG-Memex/aquarium) but
we have a [dedicated repository](https://github.com/Lookyloo/aquarium) that fits our needs better.
Follow [the documentation](https://github.com/Lookyloo/aquarium/blob/master/README.rst) if you want to use it.
# Run the app with a simple docker setup
## Dockerfile
The repository includes a [Dockerfile](Dockerfile) for building a containerized instance of the app.
Lookyloo stores the scraped data in /lookyloo/scraped. If you want to persist the scraped data between runs it is sufficient to define a volume for this directory.
## Running a complete setup with Docker Compose
Additionally you can start a complete setup, including the necessary Docker instance of splashy, by using
Docker Compose and the included service definition in [docker-compose.yml](docker-compose.yml) by running
```
docker-compose up
```
After building and startup is complete lookyloo should be available at [http://localhost:5000/](http://localhost:5000/)
If you want to persist the data between different runs uncomment the "volumes" definition in the last two lines of
[docker-compose.yml](docker-compose.yml) and define a data storage directory in your Docker host system there.
Find more details [here](https://www.lookyloo.eu/docs/main/pylookyloo-overview.html).