2020-10-06 11:49:47 +02:00
[![Lookyloo icon ](website/web/static/lookyloo.jpeg )](https://www.lookyloo.eu/docs/main/index.html)
2020-12-22 13:16:45 +01:00
2021-03-23 15:11:12 +01:00
*[Lookyloo](https://lookyloo.circl.lu/)* is a web interface that captures a webpage and then displays a tree of the domains, that call each other.
2019-05-27 15:36:49 +02:00
2020-12-22 01:47:54 +01:00
2020-12-29 14:38:35 +01:00
[![Gitter ](https://badges.gitter.im/Lookyloo/community.svg )](https://gitter.im/Lookyloo/community?utm_source=badge& utm_medium=badge& utm_campaign=pr-badge)
2020-12-23 20:38:18 +01:00
2020-12-22 13:16:45 +01:00
* [What is Lookyloo? ](#whats-in-a-name )
2021-09-07 16:44:49 +02:00
* [REST API ](#rest-api )
2020-12-22 01:47:54 +01:00
* [Install Lookyloo ](#installation )
* [Lookyloo Client ](#python-client )
2020-12-23 20:38:18 +01:00
* [Contributing to Lookyloo ](#contributing-to-lookyloo )
2021-09-07 09:02:31 +02:00
* [Code of Conduct ](#code-of-conduct )
2020-12-23 20:38:18 +01:00
* [Support ](#support )
2021-09-07 09:02:31 +02:00
* [Security ](#security )
* [Credits ](#credits )
* [License ](#license )
2021-03-23 15:11:12 +01:00
2020-12-22 01:47:54 +01:00
## What's in a name?!
2017-07-23 19:56:51 +02:00
```
2021-03-23 15:11:12 +01:00
Lookyloo ...
2020-11-16 23:42:50 +01:00
2017-07-23 19:56:51 +02:00
Same as Looky Lou; often spelled as Looky-loo (hyphen) or lookylou
2020-11-16 23:42:50 +01:00
1. A person who just comes to look.
2. A person who goes out of the way to look at people or something, often causing crowds and disruption.
3. A person who enjoys watching other people's misfortune. Oftentimes car onlookers that stare at a car accidents.
In L.A., usually the lookyloos cause more accidents by not paying full attention to what is ahead of them.
```
2017-10-02 12:19:55 +02:00
Source: [Urban Dictionary ](https://www.urbandictionary.com/define.php?term=lookyloo )
2017-07-23 19:56:51 +02:00
2020-11-16 23:42:50 +01:00
2020-12-22 01:47:54 +01:00
## No, really, what is Lookyloo?
2020-01-23 15:03:36 +01:00
2020-11-16 23:51:48 +01:00
Lookyloo is a web interface that allows you to capture and map the journey of a website page.
2020-11-16 23:42:50 +01:00
Find all you need to know about Lookyloo on our [documentation website ](https://www.lookyloo.eu/docs/main/index.html ).
2020-01-23 15:03:36 +01:00
2020-11-16 23:46:54 +01:00
Here's an example of a Lookyloo capture of the site **github.com**
2020-10-06 11:49:47 +02:00
![Screenshot of Lookyloo capturing Github ](https://www.lookyloo.eu/docs/main/_images/sample_github.png )
2017-08-12 19:11:02 +02:00
2021-09-07 16:44:49 +02:00
# REST API
The API is self documented with swagger. You can play with it [on the demo instance ](https://lookyloo.circl.lu/doc/ ).
2020-11-16 23:42:50 +01:00
2019-01-23 15:13:29 +01:00
# Installation
2020-10-06 11:49:47 +02:00
Please refer to the [install guide ](https://www.lookyloo.eu/docs/main/install-lookyloo.html ).
2017-08-12 20:12:14 +02:00
2020-11-16 23:42:50 +01:00
2020-10-06 11:49:47 +02:00
# Python client
2017-08-12 20:12:14 +02:00
2020-10-06 11:49:47 +02:00
`pylookyloo` is the recommended client to interact with a Lookyloo instance.
2017-08-12 20:40:08 +02:00
2020-11-16 23:42:50 +01:00
It is avaliable on PyPi, so you can install it using the following command:
2017-08-12 20:12:14 +02:00
```bash
2020-10-06 11:49:47 +02:00
pip install pylookyloo
2017-08-12 20:12:14 +02:00
```
2021-03-23 15:11:12 +01:00
For more details on `pylookyloo` , read the overview [docs ](https://www.lookyloo.eu/docs/main/pylookyloo-overview.html ), the [documentation ](https://pylookyloo.readthedocs.io/en/latest/ ) of the module itself, or the code in this [GitHub repository ](https://github.com/Lookyloo/PyLookyloo ).
2020-12-22 01:47:54 +01:00
2023-11-20 11:45:41 +01:00
# Notes regarding using S3FS for storage
## Directory listing
TL;DR: it is slow.
If you have namy captures (say more than 1000/day), and store captures in a s3fs bucket mounted with s3fs-fuse,
doing a directory listing in bash (`ls`) will most probably lock the I/O for every process
trying to access any file in the whole bucket. The same will be true if you access the
filesystem using python methods (`iterdir`, `scandir` ...))
A workaround is to use the python s3fs module as it will not access the filesystem for listing directories.
You can configure the s3fs credentials in `config/generic.json` key `s3fs` .
2023-11-20 12:07:48 +01:00
**Warning**: this will not save you if you run `ls` on a directoy that contains *a lot* of captures.
2023-11-20 11:45:41 +01:00
## Versioning
By default, a MinIO bucket (backend for s3fs) will have versioning enabled, wich means it
keeps a copy of every version of every file you're storing. It becomes a problem if you have a lot of captures
as the index files are updated on every change, and the max amount of versions is 10.000.
So by the time you have > 10.000 captures in a directory, you'll get I/O errors when you try
to update the index file. And you absolutely do not care about that versioning in lookyloo.
To check if versioning is enabled (can be either enabled or suspended):
```
mc version info < alias_in_config > /< bucket >
```
The command below will suspend versioning:
```bash
mc version suspend < alias_in_config > /< bucket >
```
2023-11-20 12:06:12 +01:00
### I'm stuck, my file is raising I/O errors
2023-11-20 11:45:41 +01:00
2023-11-20 12:06:12 +01:00
It will happen when your index was updated 10.000 times and versioning was enabled.
2023-11-20 11:45:41 +01:00
2023-11-20 12:06:12 +01:00
This is how to check you're in this situation:
* Error message from bash (unhelpful):
2023-11-20 11:45:41 +01:00
```bash
$ (git::main) rm /path/to/lookyloo/archived_captures/Year/Month/Day/index
rm: cannot remove '/path/to/lookyloo/archived_captures/Year/Month/Day/index': Input/output error
```
2023-11-20 12:06:12 +01:00
* Check with python
2023-11-20 11:45:41 +01:00
```python
from lookyloo.default import get_config
import s3fs
s3fs_config = get_config('generic', 's3fs')
s3fs_client = s3fs.S3FileSystem(key=s3fs_config['config']['key'],
secret=s3fs_config['config']['secret'],
endpoint_url=s3fs_config['config']['endpoint_url'])
s3fs_bucket = s3fs_config['config']['bucket_name']
s3fs_client.rm_file(s3fs_bucket + '/Year/Month/Day/index')
```
2023-11-20 12:06:12 +01:00
* Error from python (somewhat more helpful):
2023-11-20 11:45:41 +01:00
```
OSError: [Errno 5] An error occurred (MaxVersionsExceeded) when calling the DeleteObject operation: You've exceeded the limit on the number of versions you can create on this object
```
2023-11-20 12:06:12 +01:00
* **Solution**: run this command to remove all older versions of the file
```bash
mc rm --non-current --versions --recursive --force < alias_in_config > /< bucket > /Year/Month/Day/index
```
2023-11-20 11:45:41 +01:00
2020-12-22 01:47:54 +01:00
# Contributing to Lookyloo
To learn more about contributing to Lookyloo, see our [contributor guide ](https://www.lookyloo.eu/docs/main/contributing.html ).
2020-12-23 20:45:36 +01:00
### Code of Conduct
2020-12-24 05:10:53 +01:00
At Lookyloo, we pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community. You can access our Code of Conduct [here ](https://github.com/Lookyloo/lookyloo/blob/main/code_of_conduct.md ) or on the [Lookyloo docs site ](https://www.lookyloo.eu/docs/main/code-conduct.html ).
2020-12-22 01:47:54 +01:00
# Support
2020-12-23 20:51:46 +01:00
* To engage with the Lookyloo community contact us on [Gitter ](https://gitter.im/lookyloo-app/community ).
2021-03-23 15:11:12 +01:00
* Let us know how we can improve Lookyloo by opening an [issue ](https://github.com/Lookyloo/lookyloo/issues/new/choose ).
2020-12-23 20:51:46 +01:00
* Follow us on [Twitter ](https://twitter.com/lookyloo_app ).
2020-12-22 01:47:54 +01:00
2020-12-23 20:45:36 +01:00
### Security
To report vulnerabilities, see our [Security Policy ](lookyloo/SECURITY.md ).
2020-12-22 01:47:54 +01:00
2020-12-23 20:45:36 +01:00
### Credits
2020-12-22 01:47:54 +01:00
Thank you very much [Tech Blog @ willshouse.com ](https://techblog.willshouse.com/2012/01/03/most-common-user-agents/ ) for the up-to-date list of UserAgents.
2020-12-23 20:45:36 +01:00
### License
2021-09-07 09:02:31 +02:00
See our [LICENSE ](LICENSE ).