mirror of https://github.com/CIRCL/AIL-framework
commit
7c88e7d013
|
@ -0,0 +1,98 @@
|
|||
Feeding, adding new features and contributing
|
||||
=============================================
|
||||
|
||||
How to feed the AIL framework
|
||||
-----------------------------
|
||||
|
||||
For the moment, there are three different ways to feed AIL with data:
|
||||
|
||||
1. Be a collaborator of CIRCL and ask to access our feed. It will be sent to the static IP your are using for AIL.
|
||||
|
||||
2. You can setup [pystemon](https://github.com/CIRCL/pystemon) and use the custom feeder provided by AIL (see below).
|
||||
|
||||
3. You can feed your own data using the [./bin/import_dir.py](./bin/import_dir.py) script.
|
||||
|
||||
### Feeding AIL with pystemon
|
||||
|
||||
AIL is an analysis tool, not a collector!
|
||||
However, if you want to collect some pastes and feed them to AIL, the procedure is described below. Nevertheless, moderate your queries!
|
||||
|
||||
Feed data to AIL:
|
||||
|
||||
1. Clone the [pystemon's git repository](https://github.com/CIRCL/pystemon)
|
||||
|
||||
2. Install its python dependencies inside your virtual environment
|
||||
|
||||
3. Launch pystemon ``` ./pystemon ```
|
||||
|
||||
4. Edit your configuration file ```bin/packages/config.cfg``` and modify the pystemonpath path accordingly
|
||||
|
||||
5. Launch pystemon-feeder ``` ./pystemon-feeder.py ```
|
||||
|
||||
|
||||
How to create a new module
|
||||
--------------------------
|
||||
|
||||
If you want to add a new processing or analysis module in AIL, follow these simple steps:
|
||||
|
||||
1. Add your module name in [./bin/packages/modules.cfg](./bin/packages/modules.cfg) and subscribe to at least one module at minimum (Usually, Redis_Global).
|
||||
|
||||
2. Use [./bin/template.py](./bin/template.py) as a sample module and create a new file in bin/ with the module name used in the modules.cfg configuration.
|
||||
|
||||
|
||||
How to create a new webpage
|
||||
---------------------------
|
||||
|
||||
If you want to add a new webpage for a module in AIL, follow these simple steps:
|
||||
|
||||
1. Launch [./var/www/create_new_web_module.py](./var/www/create_new_web_module.py) and enter the name to use for your webpage (Usually, your newly created python module).
|
||||
|
||||
2. A template and flask skeleton has been created for your new webpage in [./var/www/modules/](./var/www/modules/)
|
||||
|
||||
3. Edit the created html files under the template folder as well as the Flask_* python script so that they fit your needs.
|
||||
|
||||
4. You can change the order of your module in the top navigation header in the file [./var/www/templates/header_base.html](./var/www/templates/header_base.html)
|
||||
|
||||
5. You can ignore module, and so, not display them in the top navigation header by adding the module name in the file [./var/www/templates/ignored_modules.txt](./var/www/templates/ignored_modules.txt)
|
||||
|
||||
How to contribute a module
|
||||
--------------------------
|
||||
|
||||
Feel free to fork the code, play with it, make some patches or add additional analysis modules.
|
||||
|
||||
To contribute your module, feel free to pull your contribution.
|
||||
|
||||
|
||||
Additional information
|
||||
======================
|
||||
|
||||
Manage modules: ModulesInformationV2.py
|
||||
---------------------------------------
|
||||
|
||||
You can do a lots of things easily with the [./bin/ModulesInformationV2](./bin/ModulesInformationV2) script:
|
||||
|
||||
- Monitor the health of other modules
|
||||
- Monitor the ressources comsumption of other modules
|
||||
- Start one or more modules
|
||||
- Kill running modules
|
||||
- Restart automatically stuck modules
|
||||
- Show the paste currently processed by a module
|
||||
|
||||
### Navigation
|
||||
|
||||
You can navigate into the interface by using arrow keys. In order to perform an action on a selected module, you can either press <ENTER> or <SPACE> to show the dialog box.
|
||||
|
||||
To change list, you can press the <TAB> key.
|
||||
|
||||
Also, you can quickly stop or start modules by clicking on the <K> or <S> symbol respectively. These are located in the _Action_ column.
|
||||
|
||||
Finally, you can quit this program by pressing either <q> or <C-c>
|
||||
|
||||
|
||||
Terms frequency usage
|
||||
---------------------
|
||||
|
||||
In AIL, you can track terms, set of terms and even regexes without creating a dedicated module. To do so, go to the tab `Terms Frequency` in the web interface.
|
||||
- You can track a term by simply putting it in the box.
|
||||
- You can track a set of terms by simply putting terms in an array surrounded by the '\' character. You can also set a custom threshold regarding the number of terms that must match to trigger the detection. For example, if you want to track the terms _term1_ and _term2_ at the same time, you can use the following rule: `\[term1, term2, [100]]\`
|
||||
- You can track regexes as easily as tracking a term. You just have to put your regex in the box surrounded by the '/' character. For example, if you want to track the regex matching all email address having the domain _domain.net_, you can use the following aggressive rule: `/*.domain.net/`.
|
|
@ -0,0 +1,22 @@
|
|||
Overview
|
||||
========
|
||||
|
||||
Redis and LevelDB overview
|
||||
--------------------------
|
||||
|
||||
* Redis on TCP port 6379
|
||||
- DB 0 - Cache hostname/dns
|
||||
- DB 1 - Paste meta-data
|
||||
* Redis on TCP port 6380 - Redis Log only
|
||||
* Redis on TCP port 6381
|
||||
- DB 0 - PubSub + Queue and Paste content LRU cache
|
||||
- DB 1 - _Mixer_ Cache
|
||||
* LevelDB on TCP port 6382
|
||||
- DB 1 - Curve
|
||||
- DB 2 - Trending
|
||||
- DB 3 - Terms
|
||||
- DB 4 - Sentiments
|
||||
* LevelDB on TCP port <year>
|
||||
- DB 0 - Lines duplicate
|
||||
- DB 1 - Hashs
|
||||
|
118
README.md
118
README.md
|
@ -11,50 +11,24 @@ AIL is a modular framework to analyse potential information leaks from unstructu
|
|||
|
||||
![Dashboard](./doc/screenshots/dashboard.png?raw=true "AIL framework dashboard")
|
||||
|
||||
Trending charts
|
||||
---------------
|
||||
|
||||
![Trending-Web](./doc/screenshots/trending-web.png?raw=true "AIL framework webtrending")
|
||||
![Trending-Modules](./doc/screenshots/trending-module.png?raw=true "AIL framework modulestrending")
|
||||
|
||||
Browsing
|
||||
--------
|
||||
|
||||
![Browse-Pastes](./doc/screenshots/browse-important.png?raw=true "AIL framework browseImportantPastes")
|
||||
|
||||
Sentiment analysis
|
||||
------------------
|
||||
|
||||
![Sentiment](./doc/screenshots/sentiment.png?raw=true "AIL framework sentimentanalysis")
|
||||
|
||||
Terms manager and occurence
|
||||
---------------------------
|
||||
|
||||
![Term-Manager](./doc/screenshots/terms-manager.png?raw=true "AIL framework termManager")
|
||||
|
||||
## Top terms
|
||||
|
||||
![Term-Top](./doc/screenshots/terms-top.png?raw=true "AIL framework termTop")
|
||||
![Term-Plot](./doc/screenshots/terms-plot.png?raw=true "AIL framework termPlot")
|
||||
|
||||
|
||||
[AIL framework screencast](https://www.youtube.com/watch?v=1_ZrZkRKmNo)
|
||||
|
||||
Features
|
||||
--------
|
||||
|
||||
* Modular architecture to handle streams of unstructured or structured information
|
||||
* Default support for external ZMQ feeds, such as provided by CIRCL or other providers
|
||||
* Multiple feed support
|
||||
* Each module can process and reprocess the information already processed by AIL
|
||||
* Detecting and extracting URLs including their geographical location (e.g. IP address location)
|
||||
* Extracting and validating potential leak of credit cards numbers
|
||||
* Extracting and validating potential leak of credit cards numbers, credentials, ...
|
||||
* Extracting and validating email addresses leaked including DNS MX validation
|
||||
* Module for extracting Tor .onion addresses (to be further processed for analysis)
|
||||
* Keep tracks of duplicates
|
||||
* Extracting and validating potential hostnames (e.g. to feed Passive DNS systems)
|
||||
* A full-text indexer module to index unstructured information
|
||||
* Modules and web statistics
|
||||
* Statistics on modules and web
|
||||
* Realtime modules manager in terminal
|
||||
* Global sentiment analysis for each providers based on nltk vader module
|
||||
* Terms tracking and occurrence
|
||||
* Terms, Set of terms and Regex tracking and occurrence
|
||||
* Many more modules for extracting phone numbers, credentials and others
|
||||
|
||||
Installation
|
||||
|
@ -101,69 +75,51 @@ Eventually you can browse the status of the AIL framework website at the followi
|
|||
|
||||
``http://localhost:7000/``
|
||||
|
||||
How to
|
||||
======
|
||||
HOWTO
|
||||
-----
|
||||
|
||||
How to feed the AIL framework
|
||||
-----------------------------
|
||||
|
||||
For the moment, there are two different ways to feed AIL with data:
|
||||
|
||||
1. Be a collaborator of CIRCL and ask to access our feed. It will be sent to the static IP your are using for AIL.
|
||||
|
||||
2. You can setup [pystemon](https://github.com/CIRCL/pystemon) and use the custom feeder provided by AIL (see below).
|
||||
|
||||
###Feeding AIL with pystemon
|
||||
AIL is an analysis tool, not a collector!
|
||||
However, if you want to collect some pastes and feed them to AIL, the procedure is described below.
|
||||
|
||||
Nevertheless, moderate your queries!
|
||||
|
||||
Here are the steps to setup pystemon and feed data to AIL:
|
||||
|
||||
1. Clone the [pystemon's git repository](https://github.com/CIRCL/pystemon)
|
||||
|
||||
2. Install its python dependencies inside your virtual environment
|
||||
|
||||
3. Launch pystemon ``` ./pystemon ```
|
||||
|
||||
4. Edit your configuration file ```bin/packages/config.cfg``` and modify the pystemonpath path accordingly
|
||||
|
||||
5. Launch pystemon-feeder ``` ./pystemon-feeder.py ```
|
||||
HOWTO are available in [HOWTO.md](HOWTO.md)
|
||||
|
||||
|
||||
How to create a new module
|
||||
--------------------------
|
||||
Screenshots
|
||||
===========
|
||||
|
||||
If you want to add a new processing or analysis module in AIL, follow these simple steps:
|
||||
Trending charts
|
||||
---------------
|
||||
|
||||
1. Add your module name in [./bin/packages/modules.cfg](./bin/packages/modules.cfg) and subscribe to the Redis_Global at minimum.
|
||||
![Trending-Web](./doc/screenshots/trending-web.png?raw=true "AIL framework webtrending")
|
||||
![Trending-Modules](./doc/screenshots/trending-module.png?raw=true "AIL framework modulestrending")
|
||||
|
||||
2. Use [./bin/template.py](./bin/template.py) as a sample module and create a new file in bin/ with the module name used in the modules.cfg configuration.
|
||||
Browsing
|
||||
--------
|
||||
|
||||
How to contribute a module
|
||||
--------------------------
|
||||
![Browse-Pastes](./doc/screenshots/browse-important.png?raw=true "AIL framework browseImportantPastes")
|
||||
|
||||
Feel free to fork the code, play with it, make some patches or add additional analysis modules.
|
||||
Sentiment analysis
|
||||
------------------
|
||||
|
||||
To contribute your module, feel free to pull your contribution.
|
||||
![Sentiment](./doc/screenshots/sentiment.png?raw=true "AIL framework sentimentanalysis")
|
||||
|
||||
Overview and License
|
||||
====================
|
||||
Terms manager and occurence
|
||||
---------------------------
|
||||
|
||||
![Term-Manager](./doc/screenshots/terms-manager.png?raw=true "AIL framework termManager")
|
||||
|
||||
### Top terms
|
||||
|
||||
![Term-Top](./doc/screenshots/terms-top.png?raw=true "AIL framework termTop")
|
||||
![Term-Plot](./doc/screenshots/terms-plot.png?raw=true "AIL framework termPlot")
|
||||
|
||||
|
||||
Redis and LevelDB overview
|
||||
--------------------------
|
||||
[AIL framework screencast](https://www.youtube.com/watch?v=1_ZrZkRKmNo)
|
||||
|
||||
* Redis on TCP port 6379 - DB 1 - Paste meta-data
|
||||
* DB 0 - Cache hostname/dns
|
||||
* Redis on TCP port 6380 - Redis Pub-Sub only
|
||||
* Redis on TCP port 6381 - DB 0 - Queue and Paste content LRU cache
|
||||
* Redis on TCP port 6382 - DB 1-4 - Trending, terms and sentiments
|
||||
* LevelDB on TCP port <year> - Lines duplicate
|
||||
Command line module manager
|
||||
---------------------------
|
||||
|
||||
LICENSE
|
||||
-------
|
||||
![Module-Manager](./doc/screenshots/module-manager.png?raw=true "AIL framework ModuleInformationV2.py")
|
||||
|
||||
License
|
||||
=======
|
||||
|
||||
```
|
||||
Copyright (C) 2014 Jules Debra
|
||||
|
|
|
@ -5,25 +5,7 @@
|
|||
The ZMQ_Sub_Attribute Module
|
||||
============================
|
||||
|
||||
This module is consuming the Redis-list created by the ZMQ_PubSub_Line_Q Module
|
||||
|
||||
It perform a sorting on the line's length and publish/forward them to
|
||||
differents channels:
|
||||
|
||||
*Channel 1 if max length(line) < max
|
||||
*Channel 2 if max length(line) > max
|
||||
|
||||
The collected informations about the processed pastes
|
||||
(number of lines and maximum length line) are stored in Redis.
|
||||
|
||||
..note:: Module ZMQ_Something_Q and ZMQ_Something are closely bound, always put
|
||||
the same Subscriber name in both of them.
|
||||
|
||||
Requirements
|
||||
------------
|
||||
|
||||
*Need running Redis instances. (LevelDB & Redis)
|
||||
*Need the ZMQ_PubSub_Line_Q Module running to be able to work properly.
|
||||
This module is saving Attribute of the paste into redis
|
||||
|
||||
"""
|
||||
import time
|
||||
|
|
|
@ -1,5 +1,16 @@
|
|||
#!/usr/bin/env python2
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
"""
|
||||
The Credential Module
|
||||
=====================
|
||||
|
||||
This module is consuming the Redis-list created by the Categ module.
|
||||
|
||||
It apply credential regexes on paste content and warn if above a threshold.
|
||||
|
||||
"""
|
||||
|
||||
import time
|
||||
import sys
|
||||
from packages import Paste
|
||||
|
|
|
@ -1,5 +1,17 @@
|
|||
#!/usr/bin/env python
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
"""
|
||||
The CreditCards Module
|
||||
======================
|
||||
|
||||
This module is consuming the Redis-list created by the Categ module.
|
||||
|
||||
It apply credit card regexes on paste content and warn if above a threshold.
|
||||
|
||||
"""
|
||||
|
||||
|
||||
import pprint
|
||||
import time
|
||||
from packages import Paste
|
||||
|
@ -7,7 +19,6 @@ from packages import lib_refine
|
|||
from pubsublogger import publisher
|
||||
import re
|
||||
|
||||
|
||||
from Helper import Process
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
|
|
@ -5,14 +5,6 @@
|
|||
This module manage top sets for terms frequency.
|
||||
Every 'refresh_rate' update the weekly and monthly set
|
||||
|
||||
|
||||
Requirements
|
||||
------------
|
||||
|
||||
*Need running Redis instances. (Redis)
|
||||
*Categories files of words in /files/ need to be created
|
||||
*Need the ZMQ_PubSub_Tokenize_Q Module running to be able to work properly.
|
||||
|
||||
"""
|
||||
|
||||
import redis
|
||||
|
|
|
@ -1,7 +1,13 @@
|
|||
#!/usr/bin/env python2
|
||||
# -*-coding:UTF-8 -*
|
||||
"""
|
||||
Template for new modules
|
||||
The CVE Module
|
||||
======================
|
||||
|
||||
This module is consuming the Redis-list created by the Categ module.
|
||||
|
||||
It apply CVE regexes on paste content and warn if a reference to a CVE is spotted.
|
||||
|
||||
"""
|
||||
|
||||
import time
|
||||
|
|
|
@ -5,8 +5,8 @@
|
|||
The DomClassifier Module
|
||||
============================
|
||||
|
||||
The DomClassifier modules is fetching the list of files to be
|
||||
processed and index each file with a full-text indexer (Whoosh until now).
|
||||
The DomClassifier modules extract and classify Internet domains/hostnames/IP addresses from
|
||||
the out output of the Global module.
|
||||
|
||||
"""
|
||||
import time
|
||||
|
|
|
@ -1,7 +1,14 @@
|
|||
#!/usr/bin/env python2
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
"""
|
||||
Template for new modules
|
||||
The Keys Module
|
||||
======================
|
||||
|
||||
This module is consuming the Redis-list created by the Global module.
|
||||
|
||||
It is looking for PGP encrypted messages
|
||||
|
||||
"""
|
||||
|
||||
import time
|
||||
|
|
10
bin/Mail.py
10
bin/Mail.py
|
@ -1,6 +1,16 @@
|
|||
#!/usr/bin/env python
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
"""
|
||||
The CreditCards Module
|
||||
======================
|
||||
|
||||
This module is consuming the Redis-list created by the Categ module.
|
||||
|
||||
It apply mail regexes on paste content and warn if above a threshold.
|
||||
|
||||
"""
|
||||
|
||||
import redis
|
||||
import pprint
|
||||
import time
|
||||
|
|
26
bin/Mixer.py
26
bin/Mixer.py
|
@ -1,8 +1,8 @@
|
|||
#!/usr/bin/env python
|
||||
# -*-coding:UTF-8 -*
|
||||
"""
|
||||
The ZMQ_Feed_Q Module
|
||||
=====================
|
||||
The Mixer Module
|
||||
================
|
||||
|
||||
This module is consuming the Redis-list created by the ZMQ_Feed_Q Module.
|
||||
|
||||
|
@ -22,13 +22,7 @@ Depending on the configuration, this module will process the feed as follow:
|
|||
Note that the hash of the content is defined as the sha1(gzip64encoded).
|
||||
|
||||
Every data coming from a named feed can be sent to a pre-processing module before going to the global module.
|
||||
The mapping can be done via the variable feed_queue_mapping
|
||||
|
||||
Requirements
|
||||
------------
|
||||
|
||||
*Need running Redis instances.
|
||||
*Need the ZMQ_Feed_Q Module running to be able to work properly.
|
||||
The mapping can be done via the variable FEED_QUEUE_MAPPING
|
||||
|
||||
"""
|
||||
import base64
|
||||
|
@ -44,7 +38,7 @@ from Helper import Process
|
|||
|
||||
# CONFIG #
|
||||
refresh_time = 30
|
||||
feed_queue_mapping = { "feeder2": "preProcess1" } # Map a feeder name to a pre-processing module
|
||||
FEED_QUEUE_MAPPING = { "feeder2": "preProcess1" } # Map a feeder name to a pre-processing module
|
||||
|
||||
if __name__ == '__main__':
|
||||
publisher.port = 6380
|
||||
|
@ -117,8 +111,8 @@ if __name__ == '__main__':
|
|||
else: # New content
|
||||
|
||||
# populate Global OR populate another set based on the feeder_name
|
||||
if feeder_name in feed_queue_mapping:
|
||||
p.populate_set_out(relay_message, feed_queue_mapping[feeder_name])
|
||||
if feeder_name in FEED_QUEUE_MAPPING:
|
||||
p.populate_set_out(relay_message, FEED_QUEUE_MAPPING[feeder_name])
|
||||
else:
|
||||
p.populate_set_out(relay_message, 'Mixer')
|
||||
|
||||
|
@ -139,8 +133,8 @@ if __name__ == '__main__':
|
|||
server.expire('HASH_'+paste_name, ttl_key)
|
||||
|
||||
# populate Global OR populate another set based on the feeder_name
|
||||
if feeder_name in feed_queue_mapping:
|
||||
p.populate_set_out(relay_message, feed_queue_mapping[feeder_name])
|
||||
if feeder_name in FEED_QUEUE_MAPPING:
|
||||
p.populate_set_out(relay_message, FEED_QUEUE_MAPPING[feeder_name])
|
||||
else:
|
||||
p.populate_set_out(relay_message, 'Mixer')
|
||||
|
||||
|
@ -153,8 +147,8 @@ if __name__ == '__main__':
|
|||
server.expire(paste_name, ttl_key)
|
||||
|
||||
# populate Global OR populate another set based on the feeder_name
|
||||
if feeder_name in feed_queue_mapping:
|
||||
p.populate_set_out(relay_message, feed_queue_mapping[feeder_name])
|
||||
if feeder_name in FEED_QUEUE_MAPPING:
|
||||
p.populate_set_out(relay_message, FEED_QUEUE_MAPPING[feeder_name])
|
||||
else:
|
||||
p.populate_set_out(relay_message, 'Mixer')
|
||||
|
||||
|
|
|
@ -145,6 +145,7 @@ if __name__ == "__main__":
|
|||
PST.p_name)
|
||||
for url in fetch(p, r_cache, urls, domains_list, path):
|
||||
publisher.warning('{}Checked {};{}'.format(to_print, url, PST.p_path))
|
||||
p.populate_set_out('onion;{}'.format(PST.p_path), 'BrowseWarningPaste')
|
||||
else:
|
||||
publisher.info('{}Onion related;{}'.format(to_print, PST.p_path))
|
||||
|
||||
|
|
10
bin/Phone.py
10
bin/Phone.py
|
@ -1,7 +1,14 @@
|
|||
#!/usr/bin/env python2
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
"""
|
||||
module for finding phone numbers
|
||||
The Phone Module
|
||||
================
|
||||
|
||||
This module is consuming the Redis-list created by the Categ module.
|
||||
|
||||
It apply phone number regexes on paste content and warn if above a threshold.
|
||||
|
||||
"""
|
||||
|
||||
import time
|
||||
|
@ -17,6 +24,7 @@ def search_phone(message):
|
|||
content = paste.get_p_content()
|
||||
# regex to find phone numbers, may raise many false positives (shalt thou seek optimization, upgrading is required)
|
||||
reg_phone = re.compile(r'(\+\d{1,4}(\(\d\))?\d?|0\d?)(\d{6,8}|([-/\. ]{1}\d{2,3}){3,4})')
|
||||
reg_phone = re.compile(r'(\+\d{1,4}(\(\d\))?\d?|0\d?)(\d{6,8}|([-/\. ]{1}\(?\d{2,4}\)?){3,4})')
|
||||
# list of the regex results in the Paste, may be null
|
||||
results = reg_phone.findall(content)
|
||||
|
||||
|
|
|
@ -2,6 +2,8 @@
|
|||
# -*-coding:UTF-8 -*
|
||||
"""
|
||||
This Module is used for term frequency.
|
||||
It processes every paste coming from the global module and test the regexs
|
||||
supplied in the term webpage.
|
||||
|
||||
"""
|
||||
import redis
|
||||
|
|
|
@ -6,6 +6,11 @@ from pubsublogger import publisher
|
|||
from Helper import Process
|
||||
import re
|
||||
|
||||
'''
|
||||
This module takes its input from the global module.
|
||||
It applies some regex and publish matched content
|
||||
'''
|
||||
|
||||
if __name__ == "__main__":
|
||||
publisher.port = 6380
|
||||
publisher.channel = "Script"
|
||||
|
|
|
@ -1,7 +1,14 @@
|
|||
#!/usr/bin/env python2
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
"""
|
||||
Sql Injection module
|
||||
The SQLInjectionDetection Module
|
||||
================================
|
||||
|
||||
This module is consuming the Redis-list created by the Web module.
|
||||
|
||||
It test different possibility to makes some sqlInjection.
|
||||
|
||||
"""
|
||||
|
||||
import time
|
||||
|
|
|
@ -4,8 +4,8 @@
|
|||
Sentiment analyser module.
|
||||
It takes its inputs from 'global'.
|
||||
|
||||
The content analysed comes from the pastes with length of the line
|
||||
above a defined threshold removed (get_p_content_with_removed_lines).
|
||||
The content is analysed if the length of the line is
|
||||
above a defined threshold (get_p_content_with_removed_lines).
|
||||
This is done because NLTK sentences tokemnizer (sent_tokenize) seems to crash
|
||||
for long lines (function _slices_from_text line#1276).
|
||||
|
||||
|
|
|
@ -2,6 +2,8 @@
|
|||
# -*-coding:UTF-8 -*
|
||||
"""
|
||||
This Module is used for term frequency.
|
||||
It processes every paste coming from the global module and test the sets
|
||||
supplied in the term webpage.
|
||||
|
||||
"""
|
||||
import redis
|
||||
|
|
|
@ -1,8 +1,8 @@
|
|||
#!/usr/bin/env python
|
||||
# -*-coding:UTF-8 -*
|
||||
"""
|
||||
The ZMQ_PubSub_Lines Module
|
||||
============================
|
||||
The Tokenize Module
|
||||
===================
|
||||
|
||||
This module is consuming the Redis-list created by the ZMQ_PubSub_Tokenize_Q
|
||||
Module.
|
||||
|
|
|
@ -1,5 +1,14 @@
|
|||
#!/usr/bin/env python
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
"""
|
||||
The Web Module
|
||||
============================
|
||||
|
||||
This module tries to parse URLs and warns if some defined contry code are present.
|
||||
|
||||
"""
|
||||
|
||||
import redis
|
||||
import pprint
|
||||
import time
|
||||
|
|
|
@ -1,7 +1,13 @@
|
|||
#!/usr/bin/env python2
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
"""
|
||||
Template for new modules
|
||||
The WebStats Module
|
||||
======================
|
||||
|
||||
This module makes stats on URL recolted from the web module.
|
||||
It consider the TLD, Domain and protocol.
|
||||
|
||||
"""
|
||||
|
||||
import time
|
||||
|
|
|
@ -57,8 +57,8 @@ publish = Redis_Duplicate,Redis_ModuleStats,Redis_BrowseWarningPaste
|
|||
|
||||
[Onion]
|
||||
subscribe = Redis_Onion
|
||||
publish = Redis_ValidOnion,ZMQ_FetchedOnion
|
||||
#publish = Redis_Global,Redis_ValidOnion,ZMQ_FetchedOnion
|
||||
publish = Redis_ValidOnion,ZMQ_FetchedOnion,Redis_BrowseWarningPaste
|
||||
#publish = Redis_Global,Redis_ValidOnion,ZMQ_FetchedOnion,Redis_BrowseWarningPaste
|
||||
|
||||
[DumpValidOnion]
|
||||
subscribe = Redis_ValidOnion
|
||||
|
|
|
@ -1,6 +1,15 @@
|
|||
#!/usr/bin/env python2
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
'''
|
||||
The preProcess Module
|
||||
=====================
|
||||
|
||||
This module is just an example of how we can pre-process a feed coming from the Mixer
|
||||
module before seding it to the Global module.
|
||||
|
||||
'''
|
||||
|
||||
import time
|
||||
from pubsublogger import publisher
|
||||
|
||||
|
|
Binary file not shown.
Before Width: | Height: | Size: 66 KiB After Width: | Height: | Size: 51 KiB |
|
@ -75,6 +75,7 @@
|
|||
<li name='nav-pan'><a data-toggle="tab" href="#keys-tab" data-attribute-name="keys" data-panel="keys-panel">Keys</a></li>
|
||||
<li name='nav-pan'><a data-toggle="tab" href="#mail-tab" data-attribute-name="mail" data-panel="mail-panel">Mails</a></li>
|
||||
<li name='nav-pan'><a data-toggle="tab" href="#phone-tab" data-attribute-name="phone" data-panel="phone-panel">Phones</a></li>
|
||||
<li name='nav-pan'><a data-toggle="tab" href="#onion-tab" data-attribute-name="onion" data-panel="onion-panel">Onions</a></li>
|
||||
</ul>
|
||||
</br>
|
||||
|
||||
|
@ -101,6 +102,9 @@
|
|||
<div class="col-lg-12 tab-pane fade" id="phone-tab">
|
||||
<img id="loading-gif-modal" src="{{url_for('static', filename='image/loading.gif') }}" style="margin: 4px;">
|
||||
</div>
|
||||
<div class="col-lg-12 tab-pane fade" id="onion-tab">
|
||||
<img id="loading-gif-modal" src="{{url_for('static', filename='image/loading.gif') }}" style="margin: 4px;">
|
||||
</div>
|
||||
</div> <!-- tab-content -->
|
||||
<!-- /.row -->
|
||||
</div>
|
||||
|
|
Loading…
Reference in New Issue