2017-05-03 14:25:58 +02:00
|
|
|
Overview
|
|
|
|
========
|
|
|
|
|
2018-06-20 10:48:13 +02:00
|
|
|
Redis and ARDB overview
|
2017-05-03 14:25:58 +02:00
|
|
|
--------------------------
|
|
|
|
|
2017-05-03 14:42:37 +02:00
|
|
|
* Redis on TCP port 6379
|
|
|
|
- DB 0 - Cache hostname/dns
|
|
|
|
- DB 1 - Paste meta-data
|
2017-05-03 14:25:58 +02:00
|
|
|
* Redis on TCP port 6380 - Redis Log only
|
2017-05-03 14:42:37 +02:00
|
|
|
* Redis on TCP port 6381
|
|
|
|
- DB 0 - PubSub + Queue and Paste content LRU cache
|
|
|
|
- DB 1 - _Mixer_ Cache
|
2018-06-20 10:48:13 +02:00
|
|
|
* ARDB on TCP port 6382
|
2019-02-18 15:24:47 +01:00
|
|
|
|
|
|
|
|
|
|
|
DB 1 - Curve
|
|
|
|
DB 2 - TermFreq
|
|
|
|
DB 3 - Trending
|
|
|
|
DB 4 - Sentiments
|
|
|
|
DB 5 - TermCred
|
|
|
|
DB 6 - Tags
|
|
|
|
DB 7 - Metadata
|
|
|
|
DB 8 - Statistics
|
|
|
|
DB 9 - Crawler
|
2019-03-14 17:04:55 +01:00
|
|
|
|
2018-06-20 10:48:13 +02:00
|
|
|
* ARDB on TCP port <year>
|
2017-05-03 14:42:37 +02:00
|
|
|
- DB 0 - Lines duplicate
|
2018-06-20 10:48:13 +02:00
|
|
|
- DB 1 - Hashes
|
2017-05-03 14:25:58 +02:00
|
|
|
|
2019-02-18 15:24:47 +01:00
|
|
|
# Database Map:
|
|
|
|
|
2019-04-18 10:56:00 +02:00
|
|
|
## DB0 - Core:
|
|
|
|
|
|
|
|
##### Update keys:
|
|
|
|
| Key | Value |
|
|
|
|
| ------ | ------ |
|
|
|
|
| | |
|
|
|
|
| ail:version | **current version** |
|
|
|
|
| | |
|
|
|
|
| ail:update_**update_version** | **background update name** |
|
|
|
|
| | **background update name** |
|
|
|
|
| | **...** |
|
|
|
|
| | |
|
|
|
|
| ail:update_date_v1.5 | **update date** |
|
|
|
|
| | |
|
|
|
|
| ail:update_error | **update message error** |
|
|
|
|
| | |
|
|
|
|
| ail:update_in_progress | **update version in progress** |
|
|
|
|
| ail:current_background_update | **current update version** |
|
|
|
|
| | |
|
|
|
|
| ail:current_background_script | **name of the background script currently executed** |
|
|
|
|
| ail:current_background_script_stat | **progress in % of the background script** |
|
|
|
|
|
2019-04-15 13:27:46 +02:00
|
|
|
## DB2 - TermFreq:
|
2019-04-15 11:46:20 +02:00
|
|
|
|
|
|
|
##### Set:
|
|
|
|
| Key | Value |
|
|
|
|
| ------ | ------ |
|
|
|
|
| TrackedSetTermSet | **tracked_term** |
|
|
|
|
| TrackedSetSet | **tracked_set** |
|
|
|
|
| TrackedRegexSet | **tracked_regex** |
|
|
|
|
| | |
|
|
|
|
| tracked_**tracked_term** | **item_path** |
|
|
|
|
| set_**tracked_set** | **item_path** |
|
|
|
|
| regex_**tracked_regex** | **item_path** |
|
|
|
|
| | |
|
|
|
|
| TrackedNotifications | **tracked_trem / set / regex** |
|
|
|
|
| | |
|
|
|
|
| TrackedNotificationTags_**tracked_trem / set / regex** | **tag** |
|
|
|
|
| | |
|
|
|
|
| TrackedNotificationEmails_**tracked_trem / set / regex** | **email** |
|
|
|
|
|
|
|
|
##### Zset:
|
|
|
|
| Key | Field | Value |
|
|
|
|
| ------ | ------ | ------ |
|
|
|
|
| per_paste_TopTermFreq_set_month | **term** | **nb_seen** |
|
|
|
|
| per_paste_TopTermFreq_set_week | **term** | **nb_seen** |
|
|
|
|
| per_paste_TopTermFreq_set_day_**epoch** | **term** | **nb_seen** |
|
|
|
|
| | | |
|
|
|
|
| TopTermFreq_set_month | **term** | **nb_seen** |
|
|
|
|
| TopTermFreq_set_week | **term** | **nb_seen** |
|
|
|
|
| TopTermFreq_set_day_**epoch** | **term** | **nb_seen** |
|
|
|
|
|
|
|
|
|
|
|
|
##### Hset:
|
|
|
|
| Key | Field | Value |
|
|
|
|
| ------ | ------ | ------ |
|
|
|
|
| TrackedTermDate | **tracked_term** | **epoch** |
|
|
|
|
| TrackedSetDate | **tracked_set** | **epoch** |
|
|
|
|
| TrackedRegexDate | **tracked_regex** | **epoch** |
|
|
|
|
| | | |
|
|
|
|
| BlackListTermDate | **blacklisted_term** | **epoch** |
|
|
|
|
| | | |
|
|
|
|
| **epoch** | **term** | **nb_seen** |
|
|
|
|
|
|
|
|
## DB6 - Tags:
|
2019-02-18 15:24:47 +01:00
|
|
|
|
|
|
|
##### Hset:
|
|
|
|
| Key | Field | Value |
|
|
|
|
| ------ | ------ | ------ |
|
2019-04-15 11:46:20 +02:00
|
|
|
| per_paste_**epoch** | **term** | **nb_seen** |
|
2019-02-18 15:24:47 +01:00
|
|
|
| | |
|
2019-03-14 17:04:55 +01:00
|
|
|
| tag_metadata:**tag** | first_seen | **date** |
|
|
|
|
| tag_metadata:**tag** | last_seen | **date** |
|
2019-02-18 15:24:47 +01:00
|
|
|
|
|
|
|
##### Set:
|
|
|
|
| Key | Value |
|
|
|
|
| ------ | ------ |
|
2019-03-14 17:04:55 +01:00
|
|
|
| list_tags | **tag** |
|
|
|
|
| active_taxonomies | **taxonomie** |
|
|
|
|
| active_galaxies | **galaxie** |
|
|
|
|
| active_tag_**taxonomie or galaxy** | **tag** |
|
|
|
|
| synonym_tag_misp-galaxy:**galaxy** | **tag synonym** |
|
|
|
|
| list_export_tags | **user_tag** |
|
|
|
|
| **tag**:**date** | **paste** |
|
2019-02-18 15:24:47 +01:00
|
|
|
|
|
|
|
|
|
|
|
##### old:
|
|
|
|
| Key | Value |
|
|
|
|
| ------ | ------ |
|
2019-03-14 17:04:55 +01:00
|
|
|
| *tag* | *paste* |
|
2019-02-18 15:24:47 +01:00
|
|
|
|
2019-03-22 16:48:07 +01:00
|
|
|
## DB7 - Metadata:
|
|
|
|
|
|
|
|
#### Crawled Items:
|
|
|
|
##### Hset:
|
|
|
|
| Key | Field | Value |
|
|
|
|
| ------ | ------ | ------ |
|
|
|
|
| paste_metadata:**item path** | super_father | **first url crawled** |
|
|
|
|
| | father | **item father** |
|
|
|
|
| | domain | **crawled domain**:**domain port** |
|
2019-04-12 16:07:40 +02:00
|
|
|
| | screenshot | **screenshot hash** |
|
2019-02-18 15:24:47 +01:00
|
|
|
|
2019-04-08 17:04:09 +02:00
|
|
|
##### Set:
|
|
|
|
| Key | Field |
|
|
|
|
| ------ | ------ |
|
|
|
|
| tag:**item path** | **tag** |
|
|
|
|
| | |
|
|
|
|
| paste_children:**item path** | **item path** |
|
|
|
|
| | |
|
|
|
|
| hash_paste:**item path** | **hash** |
|
|
|
|
| base64_paste:**item path** | **hash** |
|
|
|
|
| hexadecimal_paste:**item path** | **hash** |
|
|
|
|
| binary_paste:**item path** | **hash** |
|
|
|
|
|
|
|
|
##### Zset:
|
|
|
|
| Key | Field | Value |
|
|
|
|
| ------ | ------ | ------ |
|
|
|
|
| nb_seen_hash:**hash** | **item** | **nb_seen** |
|
|
|
|
| base64_hash:**hash** | **item** | **nb_seen** |
|
|
|
|
| binary_hash:**hash** | **item** | **nb_seen** |
|
|
|
|
| hexadecimal_hash:**hash** | **item** | **nb_seen** |
|
|
|
|
|
2019-05-14 17:49:31 +02:00
|
|
|
#### PgpDump
|
|
|
|
|
|
|
|
##### Hset:
|
|
|
|
| Key | Field | Value |
|
|
|
|
| ------ | ------ | ------ |
|
2019-05-24 12:02:43 +02:00
|
|
|
| pgpdump_metadata_key:*key id* | first_seen | **date** |
|
2019-05-14 17:49:31 +02:00
|
|
|
| | last_seen | **date** |
|
|
|
|
| | |
|
2019-05-24 12:02:43 +02:00
|
|
|
| pgpdump_metadata_name:*name* | first_seen | **date** |
|
2019-05-14 17:49:31 +02:00
|
|
|
| | last_seen | **date** |
|
|
|
|
| | |
|
2019-05-24 12:02:43 +02:00
|
|
|
| pgpdump_metadata_mail:*mail* | first_seen | **date** |
|
2019-05-14 17:49:31 +02:00
|
|
|
| | last_seen | **date** |
|
|
|
|
|
|
|
|
##### set:
|
|
|
|
| Key | Value |
|
|
|
|
| ------ | ------ |
|
2019-05-24 12:02:43 +02:00
|
|
|
| set_pgpdump_key:*key id* | *item_path* |
|
2019-05-14 17:49:31 +02:00
|
|
|
| | |
|
2019-05-24 12:02:43 +02:00
|
|
|
| set_pgpdump_name:*name* | *item_path* |
|
2019-05-14 17:49:31 +02:00
|
|
|
| | |
|
2019-05-24 12:02:43 +02:00
|
|
|
| set_pgpdump_mail:*mail* | *item_path* |
|
2019-05-14 17:49:31 +02:00
|
|
|
|
|
|
|
##### Hset date:
|
|
|
|
| Key | Field | Value |
|
|
|
|
| ------ | ------ |
|
2019-05-24 12:02:43 +02:00
|
|
|
| pgpdump:key:*date* | *key* | *nb seen* |
|
2019-05-14 17:49:31 +02:00
|
|
|
| | |
|
2019-05-24 12:02:43 +02:00
|
|
|
| pgpdump:name:*date* | *name* | *nb seen* |
|
2019-05-14 17:49:31 +02:00
|
|
|
| | |
|
2019-05-24 12:02:43 +02:00
|
|
|
| pgpdump:mail:*date* | *mail* | *nb seen* |
|
2019-05-14 17:49:31 +02:00
|
|
|
|
|
|
|
##### zset:
|
|
|
|
| Key | Field | Value |
|
|
|
|
| ------ | ------ | ------ |
|
2019-05-24 12:02:43 +02:00
|
|
|
| pgpdump_all:key | *key* | *nb seen* |
|
2019-05-14 17:49:31 +02:00
|
|
|
| | |
|
2019-05-24 12:02:43 +02:00
|
|
|
| pgpdump_all:name | *name* | *nb seen* |
|
2019-05-14 17:49:31 +02:00
|
|
|
| | |
|
2019-05-24 12:02:43 +02:00
|
|
|
| pgpdump_all:mail | *mail* | *nb seen* |
|
2019-05-14 17:49:31 +02:00
|
|
|
|
|
|
|
##### set:
|
|
|
|
| Key | Value |
|
|
|
|
| ------ | ------ |
|
2019-05-24 12:02:43 +02:00
|
|
|
| item_pgpdump_key:*item_path* | *key* |
|
2019-05-14 17:49:31 +02:00
|
|
|
| | |
|
2019-05-24 12:02:43 +02:00
|
|
|
| item_pgpdump_name:*item_path* | *name* |
|
2019-05-14 17:49:31 +02:00
|
|
|
| | |
|
2019-05-24 12:02:43 +02:00
|
|
|
| item_pgpdump_mail:*item_path* | *mail* |
|
2019-05-14 17:49:31 +02:00
|
|
|
|
2019-05-21 16:14:09 +02:00
|
|
|
#### Cryptocurrency
|
|
|
|
|
|
|
|
Supported cryptocurrency:
|
|
|
|
- bitcoin
|
|
|
|
|
|
|
|
##### Hset:
|
|
|
|
| Key | Field | Value |
|
|
|
|
| ------ | ------ | ------ |
|
|
|
|
| cryptocurrency_metadata_**cryptocurrency name**:**cryptocurrency address** | first_seen | **date** |
|
|
|
|
| | last_seen | **date** |
|
|
|
|
|
|
|
|
##### set:
|
|
|
|
| Key | Value |
|
|
|
|
| ------ | ------ |
|
|
|
|
| set_cryptocurrency_**cryptocurrency name**:**cryptocurrency address** | **item_path** |
|
|
|
|
|
|
|
|
##### Hset date:
|
|
|
|
| Key | Field | Value |
|
|
|
|
| ------ | ------ |
|
2019-05-24 12:02:43 +02:00
|
|
|
| cryptocurrency:**cryptocurrency name**:**date** | **cryptocurrency address** | **nb seen** |
|
2019-05-21 16:14:09 +02:00
|
|
|
|
|
|
|
##### zset:
|
|
|
|
| Key | Field | Value |
|
|
|
|
| ------ | ------ | ------ |
|
|
|
|
| cryptocurrency_all:**cryptocurrency name** | **cryptocurrency address** | **nb seen** |
|
|
|
|
|
|
|
|
##### set:
|
|
|
|
| Key | Value |
|
|
|
|
| ------ | ------ |
|
|
|
|
| item_cryptocurrency_**cryptocurrency name**:**item_path** | **cryptocurrency address** |
|
|
|
|
|
|
|
|
|
2019-03-14 17:04:55 +01:00
|
|
|
## DB9 - Crawler:
|
2019-02-18 15:24:47 +01:00
|
|
|
|
2019-03-14 17:04:55 +01:00
|
|
|
##### Hset:
|
|
|
|
| Key | Field | Value |
|
|
|
|
| ------ | ------ | ------ |
|
2019-04-08 17:04:09 +02:00
|
|
|
| **service type**_metadata:**domain** | first_seen | **date** |
|
2019-03-14 17:04:55 +01:00
|
|
|
| | last_check | **date** |
|
2019-03-22 16:48:07 +01:00
|
|
|
| | ports | **port**;**port**;**port** ... |
|
2019-03-14 17:04:55 +01:00
|
|
|
| | paste_parent | **parent last crawling (can be auto or manual)** |
|
2019-02-18 15:24:47 +01:00
|
|
|
|
2019-03-14 17:04:55 +01:00
|
|
|
##### Zset:
|
|
|
|
| Key | Field | Value |
|
|
|
|
| ------ | ------ | ------ |
|
2019-04-24 16:42:05 +02:00
|
|
|
| crawler\_history\_**service type**:**domain**:**port** | **item root (first crawled item)** | **epoch (seconds)** |
|
|
|
|
|
|
|
|
##### Set:
|
|
|
|
| Key | Value |
|
|
|
|
| ------ | ------ | ------ |
|
|
|
|
| screenshot:**sha256** | **item path** |
|
2019-02-18 15:24:47 +01:00
|
|
|
|
2019-04-23 11:15:34 +02:00
|
|
|
##### crawler config:
|
2019-03-14 17:04:55 +01:00
|
|
|
| Key | Value |
|
|
|
|
| ------ | ------ |
|
2019-03-22 16:48:07 +01:00
|
|
|
| crawler\_config:**crawler mode**:**service type**:**domain** | **json config** |
|
2019-03-14 17:04:55 +01:00
|
|
|
|
2019-04-23 11:15:34 +02:00
|
|
|
##### automatic crawler config:
|
|
|
|
| Key | Value |
|
|
|
|
| ------ | ------ |
|
|
|
|
| crawler\_config:**crawler mode**:**service type**:**domain**:**url** | **json config** |
|
|
|
|
|
2019-03-22 16:48:07 +01:00
|
|
|
###### exemple json config:
|
2019-03-14 17:04:55 +01:00
|
|
|
```json
|
|
|
|
{
|
|
|
|
"closespider_pagecount": 1,
|
|
|
|
"time": 3600,
|
|
|
|
"depth_limit": 0,
|
|
|
|
"har": 0,
|
|
|
|
"png": 0
|
|
|
|
}
|
|
|
|
```
|
2018-06-29 10:02:29 +02:00
|
|
|
|
2018-07-30 09:21:22 +02:00
|
|
|
ARDB overview
|
2019-04-10 17:47:40 +02:00
|
|
|
|
2018-11-15 10:39:41 +01:00
|
|
|
----------------------------------------- SENTIMENT ------------------------------------
|
|
|
|
|
|
|
|
SET - 'Provider_set' Provider
|
2019-04-10 17:47:40 +02:00
|
|
|
|
2018-11-15 10:39:41 +01:00
|
|
|
KEY - 'UniqID' INT
|
|
|
|
|
|
|
|
SET - provider_timestamp UniqID
|
|
|
|
|
|
|
|
SET - UniqID avg_score
|
|
|
|
|
2018-11-20 14:39:45 +01:00
|
|
|
|
2018-06-29 10:02:29 +02:00
|
|
|
|
|
|
|
* DB 7 - Metadata:
|
2019-04-10 17:47:40 +02:00
|
|
|
|
2018-11-20 14:39:45 +01:00
|
|
|
|
|
|
|
----------------------------------------------------------------------------------------
|
2018-06-29 10:02:29 +02:00
|
|
|
----------------------------------------- BASE64 ----------------------------------------
|
|
|
|
|
|
|
|
HSET - 'metadata_hash:'+hash 'saved_path' saved_path
|
|
|
|
'size' size
|
|
|
|
'first_seen' first_seen
|
|
|
|
'last_seen' last_seen
|
|
|
|
'estimated_type' estimated_type
|
|
|
|
'vt_link' vt_link
|
|
|
|
'vt_report' vt_report
|
|
|
|
'nb_seen_in_all_pastes' nb_seen_in_all_pastes
|
2018-07-20 10:32:52 +02:00
|
|
|
'base64_decoder' nb_encoded
|
|
|
|
'binary_decoder' nb_encoded
|
2018-06-29 10:02:29 +02:00
|
|
|
|
2018-07-19 16:50:42 +02:00
|
|
|
SET - 'all_decoder' decoder*
|
|
|
|
|
2018-09-12 10:06:53 +02:00
|
|
|
SET - 'hash_all_type' hash_type *
|
2018-07-18 11:45:19 +02:00
|
|
|
SET - 'hash_base64_all_type' hash_type *
|
|
|
|
SET - 'hash_binary_all_type' hash_type *
|
|
|
|
|
2018-07-20 09:43:09 +02:00
|
|
|
ZADD - 'hash_date:'+20180622 hash * nb_seen_this_day
|
2018-06-29 10:02:29 +02:00
|
|
|
ZADD - 'base64_date:'+20180622 hash * nb_seen_this_day
|
2018-07-19 16:50:42 +02:00
|
|
|
ZADD - 'binary_date:'+20180622 hash * nb_seen_this_day
|
2018-06-29 10:02:29 +02:00
|
|
|
|
|
|
|
ZADD - 'base64_type:'+type date nb_seen
|
2018-07-18 11:45:19 +02:00
|
|
|
ZADD - 'binary_type:'+type date nb_seen
|
2018-07-23 11:11:52 +02:00
|
|
|
|
|
|
|
GET - 'base64_decoded:'+date nd_decoded
|
|
|
|
GET - 'binary_decoded:'+date nd_decoded
|