diff --git a/HOWTO.md b/HOWTO.md index 3978408f..855f3d54 100644 --- a/HOWTO.md +++ b/HOWTO.md @@ -89,76 +89,34 @@ Also, you can quickly stop or start modules by clicking on the ```` or ``` Finally, you can quit this program by pressing either ```` or ````. -Terms frequency usage ---------------------- - -In AIL, you can track terms, set of terms and even regexes without creating a dedicated module. To do so, go to the tab `Terms Frequency` in the web interface. -- You can track a term by simply putting it in the box. -- You can track a set of terms by simply putting terms in an array surrounded by the '\' character. You can also set a custom threshold regarding the number of terms that must match to trigger the detection. For example, if you want to track the terms _term1_ and _term2_ at the same time, you can use the following rule: `\[term1, term2, [100]]\` -- You can track regexes as easily as tracking a term. You just have to put your regex in the box surrounded by the '/' character. For example, if you want to track the regex matching all email address having the domain _domain.net_, you can use the following aggressive rule: `/*.domain.net/`. - Crawler --------------------- In AIL, you can crawl Tor hidden services. Don't forget to review the proxy configuration of your Tor client and especially if you enabled the SOCKS5 proxy and binding on the appropriate IP address reachable via the dockers where Splash runs. -There are two types of installation. You can install a *local* or a *remote* Splash server. -``(Splash host) = the server running the splash service`` -``(AIL host) = the server running AIL`` - -### Installation/Configuration - -1. *(Splash host)* Launch ``crawler_hidden_services_install.sh`` to install all requirements (type ``y`` if a localhost splash server is used or use the ``-y`` option) - -2. *(Splash host)* To install and setup your tor proxy: - - Install the tor proxy: ``sudo apt-get install tor -y`` - (Not required if ``Splash host == AIL host`` - The tor proxy is installed by default in AIL) - - (Warning: Some v3 onion address are not resolved with the tor proxy provided via apt get. Use the tor proxy provided by [The torproject](https://2019.www.torproject.org/docs/debian) to solve this issue) - - Allow Tor to bind to any interface or to the docker interface (by default binds to 127.0.0.1 only) in ``/etc/tor/torrc`` - ``SOCKSPort 0.0.0.0:9050`` or - ``SOCKSPort 172.17.0.1:9050`` - - Add the following line ``SOCKSPolicy accept 172.17.0.0/16`` in ``/etc/tor/torrc`` - (for a linux docker, the localhost IP is *172.17.0.1*; Should be adapted for other platform) - - Restart the tor proxy: ``sudo service tor restart`` - -3. *(AIL host)* Edit the ``/configs/core.cfg`` file: - - In the crawler section, set ``activate_crawler`` to ``True`` - - Change the IP address of Splash servers if needed (remote only) - - Set ``splash_onion_port`` according to your Splash servers port numbers that will be used. - those ports numbers should be described as a single port (ex: 8050) or a port range (ex: 8050-8052 for 8050,8051,8052 ports). +### Installation -### Starting the scripts +[Install AIL-Splash-Manager](https://github.com/ail-project/ail-splash-manager) -- *(Splash host)* Launch all Splash servers with: -```sudo ./bin/torcrawler/launch_splash_crawler.sh -f -p -n ``` -With ```` and ```` matching those specified at ``splash_onion_port`` in the configuration file of point 3 (``/configs/core.cfg``) +### Configuration -All Splash dockers are launched inside the ``Docker_Splash`` screen. You can use ``sudo screen -r Docker_Splash`` to connect to the screen session and check all Splash servers status. - -- (AIL host) launch all AIL crawler scripts using: -```./bin/LAUNCH.sh -c``` +1. Search the Splash-Manager API key. This API key is generated when you launch the manager for the first time. +(located in your Splash Manager directory ``ail-splash-manager/token_admin.txt``) -### TL;DR - Local setup +2. Splash Manager URL and API Key: +In the webinterface, go to ``Crawlers>Settings`` and click on the Edit button +![Splash Manager Config](./doc/screenshots/splash_manager_config_edit_1.png?raw=true "AIL framework Splash Manager Config") -#### Installation -- ```crawler_hidden_services_install.sh -y``` -- Add the following line in ``SOCKSPolicy accept 172.17.0.0/16`` in ``/etc/tor/torrc`` -- ```sudo service tor restart``` -- set activate_crawler to True in ``/configs/core.cfg`` -#### Start -- ```sudo ./bin/torcrawler/launch_splash_crawler.sh -f $AIL_HOME/configs/docker/splash_onion/etc/splash/proxy-profiles/ -p 8050 -n 1``` +![Splash Manager Config](./doc/screenshots/splash_manager_config_edit_2.png?raw=true "AIL framework Splash Manager Config") -If AIL framework is not started, it's required to start it before the crawler service: +3. Launch AIL Crawlers: +Choose the number of crawlers you want to launch +![Splash Manager Nb Crawlers Config](./doc/screenshots/splash_manager_nb_crawlers_1.png?raw=true "AIL framework Nb Crawlers Config") +![Splash Manager Nb Crawlers Config](./doc/screenshots/splash_manager_nb_crawlers_2.png?raw=true "AIL framework Nb Crawlers Config") -- ```./bin/LAUNCH.sh -l``` - -Then starting the crawler service (if you follow the procedure above) - -- ```./bin/LAUNCH.sh -c``` #### Old updates diff --git a/doc/screenshots/splash_manager_config_edit_1.png b/doc/screenshots/splash_manager_config_edit_1.png new file mode 100644 index 00000000..5de9a2b0 Binary files /dev/null and b/doc/screenshots/splash_manager_config_edit_1.png differ diff --git a/doc/screenshots/splash_manager_config_edit_2.png b/doc/screenshots/splash_manager_config_edit_2.png new file mode 100644 index 00000000..eeea02fa Binary files /dev/null and b/doc/screenshots/splash_manager_config_edit_2.png differ diff --git a/doc/screenshots/splash_manager_nb_crawlers_1.png b/doc/screenshots/splash_manager_nb_crawlers_1.png new file mode 100644 index 00000000..885b5d3f Binary files /dev/null and b/doc/screenshots/splash_manager_nb_crawlers_1.png differ diff --git a/doc/screenshots/splash_manager_nb_crawlers_2.png b/doc/screenshots/splash_manager_nb_crawlers_2.png new file mode 100644 index 00000000..e0bad14f Binary files /dev/null and b/doc/screenshots/splash_manager_nb_crawlers_2.png differ