mirror of https://github.com/CIRCL/AIL-framework
@ -0,0 +1,98 @@
Feeding, adding new features and contributing
How to feed the AIL framework
For the moment, there are three different ways to feed AIL with data:
1. Be a collaborator of CIRCL and ask to access our feed. It will be sent to the static IP your are using for AIL.
2. You can setup [pystemon](https://github.com/CIRCL/pystemon) and use the custom feeder provided by AIL (see below).
3. You can feed your own data using the [./bin/import_dir.py](./bin/import_dir.py) script.
### Feeding AIL with pystemon
AIL is an analysis tool, not a collector!
However, if you want to collect some pastes and feed them to AIL, the procedure is described below. Nevertheless, moderate your queries!
Feed data to AIL:
1. Clone the [pystemon's git repository](https://github.com/CIRCL/pystemon)
2. Install its python dependencies inside your virtual environment
3. Launch pystemon ``` ./pystemon ```
4. Edit your configuration file ```bin/packages/config.cfg``` and modify the pystemonpath path accordingly
5. Launch pystemon-feeder ``` ./pystemon-feeder.py ```
How to create a new module
If you want to add a new processing or analysis module in AIL, follow these simple steps:
1. Add your module name in [./bin/packages/modules.cfg](./bin/packages/modules.cfg) and subscribe to at least one module at minimum (Usually, Redis_Global).
2. Use [./bin/template.py](./bin/template.py) as a sample module and create a new file in bin/ with the module name used in the modules.cfg configuration.
How to create a new webpage
If you want to add a new webpage for a module in AIL, follow these simple steps:
1. Launch [./var/www/create_new_web_module.py](./var/www/create_new_web_module.py) and enter the name to use for your webpage (Usually, your newly created python module).
2. A template and flask skeleton has been created for your new webpage in [./var/www/modules/](./var/www/modules/)
3. Edit the created html files under the template folder as well as the Flask_* python script so that they fit your needs.
4. You can change the order of your module in the top navigation header in the file [./var/www/templates/header_base.html](./var/www/templates/header_base.html)
5. You can ignore module, and so, not display them in the top navigation header by adding the module name in the file [./var/www/templates/ignored_modules.txt](./var/www/templates/ignored_modules.txt)
How to contribute a module
Feel free to fork the code, play with it, make some patches or add additional analysis modules.
To contribute your module, feel free to pull your contribution.
Additional information
Manage modules: ModulesInformationV2.py
You can do a lots of things easily with the [./bin/ModulesInformationV2](./bin/ModulesInformationV2) script:
- Monitor the health of other modules
- Monitor the ressources comsumption of other modules
- Start one or more modules
- Kill running modules
- Restart automatically stuck modules
- Show the paste currently processed by a module
### Navigation
You can navigate into the interface by using arrow keys. In order to perform an action on a selected module, you can either press <ENTER> or <SPACE> to show the dialog box.
To change list, you can press the <TAB> key.
Also, you can quickly stop or start modules by clicking on the <K> or <S> symbol respectively. These are located in the _Action_ column.
Finally, you can quit this program by pressing either <q> or <C-c>
Terms frequency usage
In AIL, you can track terms, set of terms and even regexes without creating a dedicated module. To do so, go to the tab `Terms Frequency` in the web interface.
- You can track a term by simply putting it in the box.
- You can track a set of terms by simply putting terms in an array surrounded by the '\' character. You can also set a custom threshold regarding the number of terms that must match to trigger the detection. For example, if you want to track the terms _term1_ and _term2_ at the same time, you can use the following rule: `\[term1, term2, [100]]\`
- You can track regexes as easily as tracking a term. You just have to put your regex in the box surrounded by the '/' character. For example, if you want to track the regex matching all email address having the domain _domain.net_, you can use the following aggressive rule: `/*.domain.net/`.
@ -0,0 +1,22 @@
Redis and LevelDB overview
* Redis on TCP port 6379
- DB 0 - Cache hostname/dns
- DB 1 - Paste meta-data
* Redis on TCP port 6380 - Redis Log only
* Redis on TCP port 6381
- DB 0 - PubSub + Queue and Paste content LRU cache
- DB 1 - _Mixer_ Cache
* LevelDB on TCP port 6382
- DB 1 - Curve
- DB 2 - Trending
- DB 3 - Terms
- DB 4 - Sentiments
* LevelDB on TCP port <year>
- DB 0 - Lines duplicate
- DB 1 - Hashs
@ -11,50 +11,24 @@ AIL is a modular framework to analyse potential information leaks from unstructu

Trending charts



Sentiment analysis

Terms manager and occurence

## Top terms


[AIL framework screencast](https://www.youtube.com/watch?v=1_ZrZkRKmNo)
* Modular architecture to handle streams of unstructured or structured information
* Default support for external ZMQ feeds, such as provided by CIRCL or other providers
* Multiple feed support
* Each module can process and reprocess the information already processed by AIL
* Detecting and extracting URLs including their geographical location (e.g. IP address location)
* Extracting and validating potential leak of credit cards numbers
* Extracting and validating potential leak of credit cards numbers, credentials, ...
* Extracting and validating email addresses leaked including DNS MX validation
* Module for extracting Tor .onion addresses (to be further processed for analysis)
* Keep tracks of duplicates
* Extracting and validating potential hostnames (e.g. to feed Passive DNS systems)
* A full-text indexer module to index unstructured information
* Modules and web statistics
* Statistics on modules and web
* Realtime modules manager in terminal
* Global sentiment analysis for each providers based on nltk vader module
* Terms tracking and occurrence
* Terms, Set of terms and Regex tracking and occurrence
* Many more modules for extracting phone numbers, credentials and others
@ -101,69 +75,51 @@ Eventually you can browse the status of the AIL framework website at the followi
How to
How to feed the AIL framework
For the moment, there are two different ways to feed AIL with data:
1. Be a collaborator of CIRCL and ask to access our feed. It will be sent to the static IP your are using for AIL.
2. You can setup [pystemon](https://github.com/CIRCL/pystemon) and use the custom feeder provided by AIL (see below).
###Feeding AIL with pystemon
AIL is an analysis tool, not a collector!
However, if you want to collect some pastes and feed them to AIL, the procedure is described below.
Nevertheless, moderate your queries!
Here are the steps to setup pystemon and feed data to AIL:
1. Clone the [pystemon's git repository](https://github.com/CIRCL/pystemon)
2. Install its python dependencies inside your virtual environment
3. Launch pystemon ``` ./pystemon ```
4. Edit your configuration file ```bin/packages/config.cfg``` and modify the pystemonpath path accordingly
5. Launch pystemon-feeder ``` ./pystemon-feeder.py ```
HOWTO are available in [HOWTO.md](HOWTO.md)
How to create a new module
If you want to add a new processing or analysis module in AIL, follow these simple steps:
Trending charts
1. Add your module name in [./bin/packages/modules.cfg](./bin/packages/modules.cfg) and subscribe to the Redis_Global at minimum.


2. Use [./bin/template.py](./bin/template.py) as a sample module and create a new file in bin/ with the module name used in the modules.cfg configuration.
How to contribute a module

Feel free to fork the code, play with it, make some patches or add additional analysis modules.
Sentiment analysis
To contribute your module, feel free to pull your contribution.

Overview and License
Terms manager and occurence

### Top terms


Redis and LevelDB overview
[AIL framework screencast](https://www.youtube.com/watch?v=1_ZrZkRKmNo)
* Redis on TCP port 6379 - DB 1 - Paste meta-data
* DB 0 - Cache hostname/dns
* Redis on TCP port 6380 - Redis Pub-Sub only
* Redis on TCP port 6381 - DB 0 - Queue and Paste content LRU cache
* Redis on TCP port 6382 - DB 1-4 - Trending, terms and sentiments
* LevelDB on TCP port <year> - Lines duplicate
Command line module manager

Copyright (C) 2014 Jules Debra
@ -5,25 +5,7 @@
The ZMQ_Sub_Attribute Module
This module is consuming the Redis-list created by the ZMQ_PubSub_Line_Q Module
It perform a sorting on the line's length and publish/forward them to
differents channels:
*Channel 1 if max length(line) < max
*Channel 2 if max length(line) > max
The collected informations about the processed pastes
(number of lines and maximum length line) are stored in Redis.
..note:: Module ZMQ_Something_Q and ZMQ_Something are closely bound, always put
the same Subscriber name in both of them.
*Need running Redis instances. (LevelDB & Redis)
*Need the ZMQ_PubSub_Line_Q Module running to be able to work properly.
This module is saving Attribute of the paste into redis
import time
@ -1,5 +1,16 @@
#!/usr/bin/env python2
# -*-coding:UTF-8 -*
The Credential Module
This module is consuming the Redis-list created by the Categ module.
It apply credential regexes on paste content and warn if above a threshold.
import time
import sys
from packages import Paste
@ -1,5 +1,17 @@
#!/usr/bin/env python
# -*-coding:UTF-8 -*
The CreditCards Module
This module is consuming the Redis-list created by the Categ module.
It apply credit card regexes on paste content and warn if above a threshold.
import pprint
import time
from packages import Paste
@ -7,7 +19,6 @@ from packages import lib_refine
from pubsublogger import publisher
import re
from Helper import Process
if __name__ == "__main__":
@ -5,14 +5,6 @@
This module manage top sets for terms frequency.
Every 'refresh_rate' update the weekly and monthly set
*Need running Redis instances. (Redis)
*Categories files of words in /files/ need to be created
*Need the ZMQ_PubSub_Tokenize_Q Module running to be able to work properly.
import redis
@ -1,7 +1,13 @@
#!/usr/bin/env python2
# -*-coding:UTF-8 -*
Template for new modules
The CVE Module
This module is consuming the Redis-list created by the Categ module.
It apply CVE regexes on paste content and warn if a reference to a CVE is spotted.
import time
@ -5,8 +5,8 @@
The DomClassifier Module
The DomClassifier modules is fetching the list of files to be
processed and index each file with a full-text indexer (Whoosh until now).
The DomClassifier modules extract and classify Internet domains/hostnames/IP addresses from
the out output of the Global module.
import time
@ -1,7 +1,14 @@
#!/usr/bin/env python2
# -*-coding:UTF-8 -*
Template for new modules
The Keys Module
This module is consuming the Redis-list created by the Global module.
It is looking for PGP encrypted messages
import time
@ -1,6 +1,16 @@
#!/usr/bin/env python
# -*-coding:UTF-8 -*
The CreditCards Module
This module is consuming the Redis-list created by the Categ module.
It apply mail regexes on paste content and warn if above a threshold.
import redis
import pprint
import time
@ -1,8 +1,8 @@
#!/usr/bin/env python
# -*-coding:UTF-8 -*
The ZMQ_Feed_Q Module
The Mixer Module
This module is consuming the Redis-list created by the ZMQ_Feed_Q Module.
@ -22,13 +22,7 @@ Depending on the configuration, this module will process the feed as follow:
Note that the hash of the content is defined as the sha1(gzip64encoded).
Every data coming from a named feed can be sent to a pre-processing module before going to the global module.
The mapping can be done via the variable feed_queue_mapping
*Need running Redis instances.
*Need the ZMQ_Feed_Q Module running to be able to work properly.
The mapping can be done via the variable FEED_QUEUE_MAPPING
import base64
@ -44,7 +38,7 @@ from Helper import Process
refresh_time = 30
feed_queue_mapping = { "feeder2": "preProcess1" } # Map a feeder name to a pre-processing module
FEED_QUEUE_MAPPING = { "feeder2": "preProcess1" } # Map a feeder name to a pre-processing module
if __name__ == '__main__':
publisher.port = 6380
@ -117,8 +111,8 @@ if __name__ == '__main__':
else: # New content
# populate Global OR populate another set based on the feeder_name
if feeder_name in feed_queue_mapping:
p.populate_set_out(relay_message, feed_queue_mapping[feeder_name])
if feeder_name in FEED_QUEUE_MAPPING:
p.populate_set_out(relay_message, FEED_QUEUE_MAPPING[feeder_name])
p.populate_set_out(relay_message, 'Mixer')
@ -139,8 +133,8 @@ if __name__ == '__main__':
server.expire('HASH_'+paste_name, ttl_key)
# populate Global OR populate another set based on the feeder_name
if feeder_name in feed_queue_mapping:
p.populate_set_out(relay_message, feed_queue_mapping[feeder_name])
if feeder_name in FEED_QUEUE_MAPPING:
p.populate_set_out(relay_message, FEED_QUEUE_MAPPING[feeder_name])
p.populate_set_out(relay_message, 'Mixer')
@ -153,8 +147,8 @@ if __name__ == '__main__':
server.expire(paste_name, ttl_key)
# populate Global OR populate another set based on the feeder_name
if feeder_name in feed_queue_mapping:
p.populate_set_out(relay_message, feed_queue_mapping[feeder_name])
if feeder_name in FEED_QUEUE_MAPPING:
p.populate_set_out(relay_message, FEED_QUEUE_MAPPING[feeder_name])
p.populate_set_out(relay_message, 'Mixer')
@ -145,6 +145,7 @@ if __name__ == "__main__":
for url in fetch(p, r_cache, urls, domains_list, path):
publisher.warning('{}Checked {};{}'.format(to_print, url, PST.p_path))
p.populate_set_out('onion;{}'.format(PST.p_path), 'BrowseWarningPaste')
publisher.info('{}Onion related;{}'.format(to_print, PST.p_path))
@ -1,7 +1,14 @@
#!/usr/bin/env python2
# -*-coding:UTF-8 -*
module for finding phone numbers
The Phone Module
This module is consuming the Redis-list created by the Categ module.
It apply phone number regexes on paste content and warn if above a threshold.
import time
@ -17,6 +24,7 @@ def search_phone(message):
content = paste.get_p_content()
# regex to find phone numbers, may raise many false positives (shalt thou seek optimization, upgrading is required)
reg_phone = re.compile(r'(\+\d{1,4}(\(\d\))?\d?|0\d?)(\d{6,8}|([-/\. ]{1}\d{2,3}){3,4})')
reg_phone = re.compile(r'(\+\d{1,4}(\(\d\))?\d?|0\d?)(\d{6,8}|([-/\. ]{1}\(?\d{2,4}\)?){3,4})')
# list of the regex results in the Paste, may be null
results = reg_phone.findall(content)
@ -2,6 +2,8 @@
# -*-coding:UTF-8 -*
This Module is used for term frequency.
It processes every paste coming from the global module and test the regexs
supplied in the term webpage.
import redis
@ -6,6 +6,11 @@ from pubsublogger import publisher
from Helper import Process
import re
This module takes its input from the global module.
It applies some regex and publish matched content
if __name__ == "__main__":
publisher.port = 6380
publisher.channel = "Script"
@ -1,7 +1,14 @@
#!/usr/bin/env python2
# -*-coding:UTF-8 -*
Sql Injection module
The SQLInjectionDetection Module
This module is consuming the Redis-list created by the Web module.
It test different possibility to makes some sqlInjection.
import time
@ -4,8 +4,8 @@
Sentiment analyser module.
It takes its inputs from 'global'.
The content analysed comes from the pastes with length of the line
above a defined threshold removed (get_p_content_with_removed_lines).
The content is analysed if the length of the line is
above a defined threshold (get_p_content_with_removed_lines).
This is done because NLTK sentences tokemnizer (sent_tokenize) seems to crash
for long lines (function _slices_from_text line#1276).
@ -2,6 +2,8 @@
# -*-coding:UTF-8 -*
This Module is used for term frequency.
It processes every paste coming from the global module and test the sets
supplied in the term webpage.
import redis
@ -1,8 +1,8 @@
#!/usr/bin/env python
# -*-coding:UTF-8 -*
The ZMQ_PubSub_Lines Module
The Tokenize Module
This module is consuming the Redis-list created by the ZMQ_PubSub_Tokenize_Q
@ -1,5 +1,14 @@
#!/usr/bin/env python
# -*-coding:UTF-8 -*
The Web Module
This module tries to parse URLs and warns if some defined contry code are present.
import redis
import pprint
import time
@ -1,7 +1,13 @@
#!/usr/bin/env python2
# -*-coding:UTF-8 -*
Template for new modules
The WebStats Module
This module makes stats on URL recolted from the web module.
It consider the TLD, Domain and protocol.
import time
@ -57,8 +57,8 @@ publish = Redis_Duplicate,Redis_ModuleStats,Redis_BrowseWarningPaste
subscribe = Redis_Onion
publish = Redis_ValidOnion,ZMQ_FetchedOnion
#publish = Redis_Global,Redis_ValidOnion,ZMQ_FetchedOnion
publish = Redis_ValidOnion,ZMQ_FetchedOnion,Redis_BrowseWarningPaste
#publish = Redis_Global,Redis_ValidOnion,ZMQ_FetchedOnion,Redis_BrowseWarningPaste
subscribe = Redis_ValidOnion
@ -1,6 +1,15 @@
#!/usr/bin/env python2
# -*-coding:UTF-8 -*
The preProcess Module
This module is just an example of how we can pre-process a feed coming from the Mixer
module before seding it to the Global module.
import time
from pubsublogger import publisher
Binary file not shown.
Before Width: | Height: | Size: 66 KiB After Width: | Height: | Size: 51 KiB |
@ -75,6 +75,7 @@
<li name='nav-pan'><a data-toggle="tab" href="#keys-tab" data-attribute-name="keys" data-panel="keys-panel">Keys</a></li>
<li name='nav-pan'><a data-toggle="tab" href="#mail-tab" data-attribute-name="mail" data-panel="mail-panel">Mails</a></li>
<li name='nav-pan'><a data-toggle="tab" href="#phone-tab" data-attribute-name="phone" data-panel="phone-panel">Phones</a></li>
<li name='nav-pan'><a data-toggle="tab" href="#onion-tab" data-attribute-name="onion" data-panel="onion-panel">Onions</a></li>
@ -101,6 +102,9 @@
<div class="col-lg-12 tab-pane fade" id="phone-tab">
<img id="loading-gif-modal" src="{{url_for('static', filename='image/loading.gif') }}" style="margin: 4px;">
<div class="col-lg-12 tab-pane fade" id="onion-tab">
<img id="loading-gif-modal" src="{{url_for('static', filename='image/loading.gif') }}" style="margin: 4px;">
</div> <!-- tab-content -->
<!-- /.row -->
Reference in New Issue