Commit Graph

1070 Commits (50ec4d822f0a46cfff435fbc4cdb53c3e70d9992)

Author SHA1 Message Date
Alexandre Dulaunoy 25757b0fff A simple feeder script feeding data from pystemon to AIL.
The configuration matches the default Redis parameters used
in the pystemon configuration.

https://github.com/cvandeplas/pystemon/blob/master/pystemon.yaml#L16
2014-09-19 14:03:05 +02:00
Raphaël Vinot 65b9a01644 Add config file for DomainClassifier, proper reporting 2014-09-17 17:22:56 +02:00
Alexandre Dulaunoy 27b134ec03 Add proper publisher for classified domains/hostnames 2014-09-10 09:27:47 +02:00
Raphaël Vinot f017680365 fix onions, cc and domain classifier modules 2014-09-08 16:51:43 +02:00
Alexandre Dulaunoy de6e21d5a7 DomainClassifier sample configuration added 2014-09-08 16:44:05 +02:00
Alexandre Dulaunoy 246621f663 First version of the DomainClassifier 2014-09-08 16:43:21 +02:00
Alexandre Dulaunoy 1397db9691 Global queue for DomainClassifier 2014-09-08 11:07:45 +02:00
Raphaël Vinot e983c839ad Categ now listen to the Global queue 2014-09-05 17:05:45 +02:00
Raphaël Vinot 46f27ada4e More cleanup 2014-09-05 10:42:01 +02:00
Raphaël Vinot fca00beed9 Add Domain Classifier module.
Cleanup in the config files.
2014-09-05 10:41:00 +02:00
Raphaël Vinot b7c9e489c9 Fix the exceptions 2014-09-04 11:46:07 +02:00
Raphaël Vinot 9e8611a42d stop killing the disk when creating the word curve 2014-09-02 18:20:28 +02:00
Raphaël Vinot 7542eaf739 Update starting script. 2014-09-02 15:21:36 +02:00
Raphaël Vinot 0c6b09f379 Fix the onion module, log the valid onions. 2014-09-01 16:18:06 +02:00
Raphaël Vinot f4b89669fc The onion module now fetches the URLs it finds. 2014-08-31 22:42:12 +02:00
Raphaël Vinot abfe13436b Big refactoring, make the queues more flexible 2014-08-29 19:37:56 +02:00
Raphaël Vinot 623e876f3b Cleanup.
* Remove useless subscriber
* Fix typo in the config file
* Update Helper accordingly
2014-08-26 17:36:57 +02:00
Alexandre Dulaunoy 3b499a2ec8 ZMQ Publisher removed
ZMQ Publisher removed to allow concurrent use of the scripts.
In short term, we would replace all publishing part within AIL
into pub-sub Redis to avoid ZMQ limitation.
2014-08-26 14:38:49 +02:00
Alexandre Dulaunoy f070ac2005 cymruwhois uses dotted decimal format 2014-08-25 10:05:36 +02:00
Raphaël Vinot 3886d1b834 Small fixes to make the refactoring production ready
* the port for the logging is 6380
* use os.environ properly
* fix typos
2014-08-22 17:35:40 +02:00
Raphaël Vinot 78125db4ea Use env variables everywhere 2014-08-22 14:52:02 +02:00
Raphaël Vinot 277d138a5d cleanup, add FIXME 2014-08-21 14:39:17 +02:00
Raphaël Vinot 63b29176c1 move Redis_Data_Merging to Paste 2014-08-21 12:22:07 +02:00
Raphaël Vinot 50cfac857e Update config
Make all paths in the config file relative to the home directory.
2014-08-20 16:00:56 +02:00
Raphaël Vinot a68f5b6a0e fix subscriber names, update default config 2014-08-20 15:54:21 +02:00
Raphaël Vinot 2485ba5df2 Merge remote-tracking branch 'origin/master' into testing
Conflicts:
	bin/ZMQ_Sub_Urls.py
2014-08-20 15:24:10 +02:00
Raphaël Vinot 99c8cc7941 completely remove ZMQ_PubSub.py 2014-08-20 15:14:57 +02:00
Alexandre Dulaunoy 1d64dc44c8 MIME type guessing - removed one duplicate call to libmagic 2014-08-20 10:22:33 +02:00
Raphaël Vinot 8d9ffbaa53 Do not create a ZMQ sub if it is not required. 2014-08-19 19:53:33 +02:00
Raphaël Vinot 45b0bf3983 Improve the cleanup. Still some to do. 2014-08-19 19:07:07 +02:00
Raphaël Vinot f1753d67c6 Cleanup the queues. 2014-08-19 16:05:37 +02:00
Alexandre Dulaunoy e8fcea6cd6 Remove undeclared variable 2014-08-18 16:17:36 +02:00
Alexandre Dulaunoy 7d8ee102a3 Assignment before use (if Enumerate fails) 2014-08-18 15:58:06 +02:00
Alexandre Dulaunoy 4304c6858e Configuration path fixed 2014-08-18 09:02:08 +02:00
Raphaël Vinot 078c8ea836 Big cleanup, pep8 2014-08-14 18:07:18 +02:00
Jules ab6765315e Merge pull request #13 from adulau/master
Log where URLs are hosted - cc_critical option added
2014-08-14 14:28:01 +02:00
Alexandre Dulaunoy 762def3a23 Log where URLs are hosted - cc_critical option added
It logs where the hostname of the URL is hosted (ASN and geographic location).
A simple option cc_critical added to set the country code to log as critical.
2014-08-14 14:22:11 +02:00
Raphaël Vinot 4a1f300a1a Cleanup (remove unused imports, more pep8 compatible) 2014-08-14 14:11:07 +02:00
Starow 04a8f1bdf2 maxi cleanup old code :'( 2014-08-14 11:48:46 +02:00
Starow 29b24b6466 printing set of domain for debugging 2014-08-13 16:35:27 +02:00
Raphaël Vinot ece3bc173e Cleanup of main Paste module 2014-08-13 11:56:22 +02:00
Raphaël Vinot 5b17d416c8 remove script installed by pubsublogger 2014-08-13 11:55:59 +02:00
Raphaël Vinot 935e51c961 Remove 3rd party code (pubsublogger), add it in the deps. 2014-08-13 10:19:43 +02:00
Starow 37033ca3a6 Minor logs modifications 2014-08-13 10:08:44 +02:00
Starow 6aa4d7cb7d Harmonising logs messages + Changing some dygraph options 2014-08-12 15:42:16 +02:00
Alexandre Dulaunoy 0b4a80b7ea -s option added to find similar documents
By default, the index is not storing the vector of the document (Whoosh
document schema). It won't work if you don't change the schema of the
index for the content. It depends of your storage strategy.
2014-08-12 13:42:26 +02:00
Alexandre Dulaunoy fd6e1a8436 -f option added: dump full document for each match 2014-08-12 13:26:56 +02:00
Alexandre Dulaunoy 0a6664ffba Indexer: Some index statistics added
usage: indexer_lookup.py [-h] [-q Q] [-n] [-t] [-l]

Fulltext search for AIL

optional arguments:
  -h, --help  show this help message and exit
  -q Q        query to lookup (one or more)
  -n          return number of indexed documents
  -t          dump top 500 terms
  -l          dump all terms encountered in indexed documents
2014-08-11 15:07:12 +02:00
Alexandre Dulaunoy f65a94d47b -l added -> dumping all terms indexed 2014-08-11 14:56:15 +02:00
Alexandre Dulaunoy f3d1ca052e Return the number of indexed documents 2014-08-11 14:50:35 +02:00
Alexandre Dulaunoy 611d2a466f Configuration that should not be there... 2014-08-11 14:24:27 +02:00
Alexandre Dulaunoy 2b8f2689bf Indexer queue and script added to "BBS-like" LAUNCH script 2014-08-11 14:06:52 +02:00
Alexandre Dulaunoy 9657c6bf80 Merge branch 'master' of https://github.com/CIRCL/AIL-framework 2014-08-11 13:46:37 +02:00
Alexandre Dulaunoy b1053af3cd Indexer module: script to query the index
Test script to query the index generated from the Indexer module.

python indexer_lookup.py -q Visa -q Mastercard
2014-08-11 12:03:27 +02:00
Starow 079db6f80c Hardcoded path from ZMQ_Curve are now referring correctly in config.cfg.sample fix #6 2014-08-11 11:33:18 +02:00
Alexandre Dulaunoy 7bdd4a41a5 Indexer module added - initial version with Whoosh full-text indexer
The indexer module indexes all the pastes using Whoosh. The module
can be extended to support additional full-text indexers in the future.
2014-08-11 11:04:09 +02:00
Starow d1d4b2ebe0 Importing dns.exeption fix #4 fix #7 2014-08-11 09:27:50 +02:00
Starow 192074e569 Merge branch 'master' of https://github.com/CIRCL/AIL-framework 2014-08-11 09:21:09 +02:00
Starow a5c1d59d29 Catching the exception dns.exception.Timeout fix #7 2014-08-11 09:18:55 +02:00
Starow 54091a2174 Catching the exception dns.exception.Timeout fix #4 2014-08-11 09:08:28 +02:00
Starow eb603e8762 Fixing a bug about caching paste inside Redis :) 2014-08-08 17:23:51 +02:00
Starow 7a1db94f9e Adding a letter (s) 2014-08-08 17:19:42 +02:00
Starow 043800287a adding a . 2014-08-08 17:18:03 +02:00
Starow bf682c4b44 Fixing last commit ... 2014-08-08 17:13:18 +02:00
Starow 503c23ca3b Fixing last commit 2014-08-08 17:08:41 +02:00
Starow c9e1eaf182 Improving cache code 2014-08-08 17:04:25 +02:00
Starow 44addf1afe Redis cache added fix #5
The paste will be add in Redis during 5min and also saved on disk.
Now if a module want to get the paste for further processing, it will first try to get it in the cache
instead of getting it directly on the disk and wasting I/O.
2014-08-08 16:48:02 +02:00
Starow 97f3a3df9e update pubsublogger with the last version 2014-08-07 14:49:34 +02:00
Starow c10003a630 Changing ZMQ Curve Module comment 2014-08-07 14:46:43 +02:00
Starow 1379ef705a Initial import of AIL framework - Analysis Information Leak framework
AIL is a modular framework to analyse potential information leak from unstructured data source like pastes from Past
ebin or similar services. AIL framework is flexible and can be extended to support other functionalities to mine sen
sitive information
2014-08-06 11:43:40 +02:00