Lookyloo is a web interface allowing to scrape a website and then displays a tree of domains calling each other. https://lookyloo.circl.lu/

Go to file

Raphaël Vinot 4975506e2e new: Optionally deduplicate notification (UUID or redirects)		2025-01-06 17:16:56 +01:00
.github	chg: Drop python 3.8 support	2024-11-06 14:53:25 +01:00
bin	new: Allow user accessible MISP servers	2024-12-23 18:47:41 +01:00
cache	fix: Check valkey version, stop if < 8	2024-11-04 17:00:17 +01:00
config	new: Optionally deduplicate notification (UUID or redirects)	2025-01-06 17:16:56 +01:00
contributing	Initial set up contrib guide	2020-08-24 13:33:14 +01:00
doc	new: Add installation notes	2020-08-14 15:38:01 +02:00
etc	fix: Allow upload of bigger files	2022-09-21 15:26:40 +02:00
full_index	chg: Bump kvrocks config file to 2.10	2024-10-17 14:00:26 +02:00
indexing	fix: Check valkey version, stop if < 8	2024-11-04 17:00:17 +01:00
known_content	new: A few more single px gifs \o/	2024-02-28 15:03:10 +01:00
known_content_user	chg: Cleanups, allow to add context from ressources page	2020-09-03 16:32:53 +02:00
logs	new: Logging config in file	2022-11-23 15:54:22 +01:00
lookyloo	new: Optionally deduplicate notification (UUID or redirects)	2025-01-06 17:16:56 +01:00
tools	chg: Bump datatables	2024-11-01 16:24:18 +01:00
user_agents	chg: Improve somewhat the useragents available for capturing	2022-06-09 18:58:17 +02:00
website	new: Optionally deduplicate notification (UUID or redirects)	2025-01-06 17:16:56 +01:00
.dockerignore	chg: Do not put the content of scraped in the package.	2020-07-07 13:56:58 +02:00
.gitignore	new: find related captures by hostname and URL	2024-05-14 18:54:04 +02:00
.pre-commit-config.yaml	chg: Drop python 3.8 support	2024-11-06 14:53:25 +01:00
Dockerfile	fix: Install system deps in dockerfile	2024-08-13 14:17:09 +02:00
LICENSE	Update LICENSE	2021-02-09 14:51:53 +01:00
README.md	Update README.md	2023-11-20 12:07:48 +01:00
SECURITY.md	chg: Add basic pre-hook config	2022-03-31 11:30:53 +02:00
code_of_conduct.md	chg: Add basic pre-hook config	2022-03-31 11:30:53 +02:00
docker-compose.yml	fix: Add few more volumes for docker	2023-03-06 14:49:03 +01:00
mypy.ini	chg: Use new annotations	2024-01-12 17:15:41 +01:00
poetry.lock	chg: Bump deps	2024-12-31 18:26:40 +01:00
pyproject.toml	chg: Bump deps	2024-12-31 18:26:40 +01:00

README.md

Lookyloo is a web interface that captures a webpage and then displays a tree of the domains, that call each other.

What is Lookyloo?
REST API
Install Lookyloo
Lookyloo Client
Contributing to Lookyloo
- Code of Conduct
Support
- Security
- Credits
- License

What's in a name?!

Lookyloo ...

Same as Looky Lou; often spelled as Looky-loo (hyphen) or lookylou

1. A person who just comes to look.
2. A person who goes out of the way to look at people or something, often causing crowds and disruption.
3. A person who enjoys watching other people's misfortune. Oftentimes car onlookers that stare at a car accidents.

In L.A., usually the lookyloos cause more accidents by not paying full attention to what is ahead of them.

Source: Urban Dictionary

No, really, what is Lookyloo?

Lookyloo is a web interface that allows you to capture and map the journey of a website page.

Find all you need to know about Lookyloo on our documentation website.

Here's an example of a Lookyloo capture of the site github.com

REST API

The API is self documented with swagger. You can play with it on the demo instance.

Installation

Please refer to the install guide.

Python client

pylookyloo is the recommended client to interact with a Lookyloo instance.

It is avaliable on PyPi, so you can install it using the following command:

pip install pylookyloo

For more details on pylookyloo, read the overview docs, the documentation of the module itself, or the code in this GitHub repository.

Notes regarding using S3FS for storage

Directory listing

TL;DR: it is slow.

If you have namy captures (say more than 1000/day), and store captures in a s3fs bucket mounted with s3fs-fuse, doing a directory listing in bash (ls) will most probably lock the I/O for every process trying to access any file in the whole bucket. The same will be true if you access the filesystem using python methods (iterdir, scandir...))

A workaround is to use the python s3fs module as it will not access the filesystem for listing directories. You can configure the s3fs credentials in config/generic.json key s3fs.

Warning: this will not save you if you run ls on a directoy that contains a lot of captures.

Versioning

By default, a MinIO bucket (backend for s3fs) will have versioning enabled, wich means it keeps a copy of every version of every file you're storing. It becomes a problem if you have a lot of captures as the index files are updated on every change, and the max amount of versions is 10.000. So by the time you have > 10.000 captures in a directory, you'll get I/O errors when you try to update the index file. And you absolutely do not care about that versioning in lookyloo.

To check if versioning is enabled (can be either enabled or suspended):

mc version info <alias_in_config>/<bucket>

The command below will suspend versioning:

mc version suspend <alias_in_config>/<bucket>

I'm stuck, my file is raising I/O errors

It will happen when your index was updated 10.000 times and versioning was enabled.

This is how to check you're in this situation:

Error message from bash (unhelpful):

$ (git::main) rm /path/to/lookyloo/archived_captures/Year/Month/Day/index
rm: cannot remove '/path/to/lookyloo/archived_captures/Year/Month/Day/index': Input/output error

Check with python

from lookyloo.default import get_config
import s3fs

s3fs_config = get_config('generic', 's3fs')
s3fs_client = s3fs.S3FileSystem(key=s3fs_config['config']['key'],
                                secret=s3fs_config['config']['secret'],
                                endpoint_url=s3fs_config['config']['endpoint_url'])

s3fs_bucket = s3fs_config['config']['bucket_name']
s3fs_client.rm_file(s3fs_bucket + '/Year/Month/Day/index')

Error from python (somewhat more helpful):

OSError: [Errno 5] An error occurred (MaxVersionsExceeded) when calling the DeleteObject operation: You've exceeded the limit on the number of versions you can create on this object

Solution: run this command to remove all older versions of the file

mc rm --non-current --versions --recursive --force <alias_in_config>/<bucket>/Year/Month/Day/index

Contributing to Lookyloo

To learn more about contributing to Lookyloo, see our contributor guide.

Code of Conduct

At Lookyloo, we pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community. You can access our Code of Conduct here or on the Lookyloo docs site.

Support

To engage with the Lookyloo community contact us on Gitter.
Let us know how we can improve Lookyloo by opening an issue.
Follow us on Twitter.

Security

To report vulnerabilities, see our Security Policy.

Credits

Thank you very much Tech Blog @ willshouse.com for the up-to-date list of UserAgents.

License

See our LICENSE.