Raphaël Vinot
078c8ea836
Big cleanup, pep8
2014-08-14 18:07:18 +02:00
Jules
ab6765315e
Merge pull request #13 from adulau/master
...
Log where URLs are hosted - cc_critical option added
2014-08-14 14:28:01 +02:00
Alexandre Dulaunoy
762def3a23
Log where URLs are hosted - cc_critical option added
...
It logs where the hostname of the URL is hosted (ASN and geographic location).
A simple option cc_critical added to set the country code to log as critical.
2014-08-14 14:22:11 +02:00
Raphaël Vinot
4a1f300a1a
Cleanup (remove unused imports, more pep8 compatible)
2014-08-14 14:11:07 +02:00
Starow
04a8f1bdf2
maxi cleanup old code :'(
2014-08-14 11:48:46 +02:00
Starow
29b24b6466
printing set of domain for debugging
2014-08-13 16:35:27 +02:00
Raphaël Vinot
ece3bc173e
Cleanup of main Paste module
2014-08-13 11:56:22 +02:00
Raphaël Vinot
5b17d416c8
remove script installed by pubsublogger
2014-08-13 11:55:59 +02:00
Raphaël Vinot
935e51c961
Remove 3rd party code (pubsublogger), add it in the deps.
2014-08-13 10:19:43 +02:00
Starow
37033ca3a6
Minor logs modifications
2014-08-13 10:08:44 +02:00
Starow
6aa4d7cb7d
Harmonising logs messages + Changing some dygraph options
2014-08-12 15:42:16 +02:00
Alexandre Dulaunoy
0b4a80b7ea
-s option added to find similar documents
...
By default, the index is not storing the vector of the document (Whoosh
document schema). It won't work if you don't change the schema of the
index for the content. It depends of your storage strategy.
2014-08-12 13:42:26 +02:00
Alexandre Dulaunoy
fd6e1a8436
-f option added: dump full document for each match
2014-08-12 13:26:56 +02:00
Alexandre Dulaunoy
0a6664ffba
Indexer: Some index statistics added
...
usage: indexer_lookup.py [-h] [-q Q] [-n] [-t] [-l]
Fulltext search for AIL
optional arguments:
-h, --help show this help message and exit
-q Q query to lookup (one or more)
-n return number of indexed documents
-t dump top 500 terms
-l dump all terms encountered in indexed documents
2014-08-11 15:07:12 +02:00
Alexandre Dulaunoy
f65a94d47b
-l added -> dumping all terms indexed
2014-08-11 14:56:15 +02:00
Alexandre Dulaunoy
f3d1ca052e
Return the number of indexed documents
2014-08-11 14:50:35 +02:00
Alexandre Dulaunoy
611d2a466f
Configuration that should not be there...
2014-08-11 14:24:27 +02:00
Alexandre Dulaunoy
2b8f2689bf
Indexer queue and script added to "BBS-like" LAUNCH script
2014-08-11 14:06:52 +02:00
Alexandre Dulaunoy
9657c6bf80
Merge branch 'master' of https://github.com/CIRCL/AIL-framework
2014-08-11 13:46:37 +02:00
Alexandre Dulaunoy
b1053af3cd
Indexer module: script to query the index
...
Test script to query the index generated from the Indexer module.
python indexer_lookup.py -q Visa -q Mastercard
2014-08-11 12:03:27 +02:00
Starow
079db6f80c
Hardcoded path from ZMQ_Curve are now referring correctly in config.cfg.sample fix #6
2014-08-11 11:33:18 +02:00
Alexandre Dulaunoy
7bdd4a41a5
Indexer module added - initial version with Whoosh full-text indexer
...
The indexer module indexes all the pastes using Whoosh. The module
can be extended to support additional full-text indexers in the future.
2014-08-11 11:04:09 +02:00
Starow
d1d4b2ebe0
Importing dns.exeption fix #4 fix #7
2014-08-11 09:27:50 +02:00
Starow
192074e569
Merge branch 'master' of https://github.com/CIRCL/AIL-framework
2014-08-11 09:21:09 +02:00
Starow
a5c1d59d29
Catching the exception dns.exception.Timeout fix #7
2014-08-11 09:18:55 +02:00
Starow
54091a2174
Catching the exception dns.exception.Timeout fix #4
2014-08-11 09:08:28 +02:00
Starow
eb603e8762
Fixing a bug about caching paste inside Redis :)
2014-08-08 17:23:51 +02:00
Starow
7a1db94f9e
Adding a letter (s)
2014-08-08 17:19:42 +02:00
Starow
043800287a
adding a .
2014-08-08 17:18:03 +02:00
Starow
bf682c4b44
Fixing last commit ...
2014-08-08 17:13:18 +02:00
Starow
503c23ca3b
Fixing last commit
2014-08-08 17:08:41 +02:00
Starow
c9e1eaf182
Improving cache code
2014-08-08 17:04:25 +02:00
Starow
44addf1afe
Redis cache added fix #5
...
The paste will be add in Redis during 5min and also saved on disk.
Now if a module want to get the paste for further processing, it will first try to get it in the cache
instead of getting it directly on the disk and wasting I/O.
2014-08-08 16:48:02 +02:00
Starow
97f3a3df9e
update pubsublogger with the last version
2014-08-07 14:49:34 +02:00
Starow
c10003a630
Changing ZMQ Curve Module comment
2014-08-07 14:46:43 +02:00
Starow
1379ef705a
Initial import of AIL framework - Analysis Information Leak framework
...
AIL is a modular framework to analyse potential information leak from unstructured data source like pastes from Past
ebin or similar services. AIL framework is flexible and can be extended to support other functionalities to mine sen
sitive information
2014-08-06 11:43:40 +02:00