AIL-framework/OVERVIEW.md

188 lines
4.6 KiB
Markdown
Raw Normal View History

2017-05-03 14:25:58 +02:00
Overview
========
2018-06-20 10:48:13 +02:00
Redis and ARDB overview
2017-05-03 14:25:58 +02:00
--------------------------
2017-05-03 14:42:37 +02:00
* Redis on TCP port 6379
- DB 0 - Cache hostname/dns
- DB 1 - Paste meta-data
2017-05-03 14:25:58 +02:00
* Redis on TCP port 6380 - Redis Log only
2017-05-03 14:42:37 +02:00
* Redis on TCP port 6381
- DB 0 - PubSub + Queue and Paste content LRU cache
- DB 1 - _Mixer_ Cache
2018-06-20 10:48:13 +02:00
* ARDB on TCP port 6382
2019-02-18 15:24:47 +01:00
DB 1 - Curve
DB 2 - TermFreq
DB 3 - Trending
DB 4 - Sentiments
DB 5 - TermCred
DB 6 - Tags
DB 7 - Metadata
DB 8 - Statistics
DB 9 - Crawler
2018-06-20 10:48:13 +02:00
* ARDB on TCP port <year>
2017-05-03 14:42:37 +02:00
- DB 0 - Lines duplicate
2018-06-20 10:48:13 +02:00
- DB 1 - Hashes
2017-05-03 14:25:58 +02:00
2019-02-18 15:24:47 +01:00
# Database Map:
## Tags:
##### Hset:
| Key | Field | Value |
| ------ | ------ | ------ |
| daily_tags:**date** | **tag** | **nb tagged this day** |
2019-02-18 15:24:47 +01:00
| | |
| tag_metadata:**tag** | first_seen | **date** |
| tag_metadata:**tag** | last_seen | **date** |
2019-02-18 15:24:47 +01:00
##### Set:
| Key | Value |
| ------ | ------ |
| list_tags | **tag** |
| active_taxonomies | **taxonomie** |
| active_galaxies | **galaxie** |
| active_tag_**taxonomie or galaxy** | **tag** |
| synonym_tag_misp-galaxy:**galaxy** | **tag synonym** |
| list_export_tags | **user_tag** |
| **tag**:**date** | **paste** |
2019-02-18 15:24:47 +01:00
##### old:
| Key | Value |
| ------ | ------ |
| *tag* | *paste* |
2019-02-18 15:24:47 +01:00
## DB7 - Metadata:
#### Crawled Items:
##### Hset:
| Key | Field | Value |
| ------ | ------ | ------ |
| paste_metadata:**item path** | super_father | **first url crawled** |
| | father | **item father** |
| | domain | **crawled domain**:**domain port** |
2019-02-18 15:24:47 +01:00
2019-04-08 17:04:09 +02:00
##### Set:
| Key | Field |
| ------ | ------ |
| tag:**item path** | **tag** |
| | |
| paste_children:**item path** | **item path** |
| | |
| hash_paste:**item path** | **hash** |
| base64_paste:**item path** | **hash** |
| hexadecimal_paste:**item path** | **hash** |
| binary_paste:**item path** | **hash** |
##### Zset:
| Key | Field | Value |
| ------ | ------ | ------ |
| nb_seen_hash:**hash** | **item** | **nb_seen** |
| base64_hash:**hash** | **item** | **nb_seen** |
| binary_hash:**hash** | **item** | **nb_seen** |
| hexadecimal_hash:**hash** | **item** | **nb_seen** |
## DB9 - Crawler:
2019-02-18 15:24:47 +01:00
##### Hset:
| Key | Field | Value |
| ------ | ------ | ------ |
2019-04-08 17:04:09 +02:00
| **service type**_metadata:**domain** | first_seen | **date** |
| | last_check | **date** |
| | ports | **port**;**port**;**port** ... |
| | paste_parent | **parent last crawling (can be auto or manual)** |
2019-02-18 15:24:47 +01:00
##### Zset:
| Key | Field | Value |
| ------ | ------ | ------ |
| crawler\_history\_**service type**:**domain** | **item root (first crawled item)** | **epoch (seconds)** |
2019-02-18 15:24:47 +01:00
##### Key:
| Key | Value |
| ------ | ------ |
| crawler\_config:**crawler mode**:**service type**:**domain** | **json config** |
###### exemple json config:
```json
{
"closespider_pagecount": 1,
"time": 3600,
"depth_limit": 0,
"har": 0,
"png": 0
}
```
2018-07-30 09:21:22 +02:00
ARDB overview
---------------------------
ARDB_DB
* DB 1 - Curve
* DB 2 - TermFreq
2018-11-06 13:38:37 +01:00
----------------------------------------- TERM ----------------------------------------
SET - 'TrackedRegexSet' term
HSET - 'TrackedRegexDate' tracked_regex today_timestamp
SET - 'TrackedSetSet' set_to_add
HSET - 'TrackedSetDate' set_to_add today_timestamp
SET - 'TrackedSetTermSet' term
HSET - 'TrackedTermDate' tracked_regex today_timestamp
SET - 'TrackedNotificationEmails_'+term/set email
SET - 'TrackedNotifications' term/set
2018-07-30 09:21:22 +02:00
* DB 3 - Trending
* DB 4 - Sentiment
* DB 5 - TermCred
* DB 6 - Tags
* DB 7 - Metadata
* DB 8 - Statistics
* DB 7 - Metadata:
----------------------------------------- BASE64 ----------------------------------------
HSET - 'metadata_hash:'+hash 'saved_path' saved_path
'size' size
'first_seen' first_seen
'last_seen' last_seen
'estimated_type' estimated_type
'vt_link' vt_link
'vt_report' vt_report
'nb_seen_in_all_pastes' nb_seen_in_all_pastes
2018-07-20 10:32:52 +02:00
'base64_decoder' nb_encoded
'binary_decoder' nb_encoded
SET - 'all_decoder' decoder*
2018-09-12 10:06:53 +02:00
SET - 'hash_all_type' hash_type *
2018-07-18 11:45:19 +02:00
SET - 'hash_base64_all_type' hash_type *
SET - 'hash_binary_all_type' hash_type *
2018-07-20 09:43:09 +02:00
SET - 'hash_paste:'+paste hash *
SET - 'base64_paste:'+paste hash *
2018-07-18 11:45:19 +02:00
SET - 'binary_paste:'+paste hash *
2018-07-20 09:43:09 +02:00
ZADD - 'hash_date:'+20180622 hash * nb_seen_this_day
ZADD - 'base64_date:'+20180622 hash * nb_seen_this_day
ZADD - 'binary_date:'+20180622 hash * nb_seen_this_day
2018-07-20 09:43:09 +02:00
ZADD - 'nb_seen_hash:'+hash paste * nb_seen_in_paste
ZADD - 'base64_hash:'+hash paste * nb_seen_in_paste
ZADD - 'binary_hash:'+hash paste * nb_seen_in_paste
ZADD - 'base64_type:'+type date nb_seen
2018-07-18 11:45:19 +02:00
ZADD - 'binary_type:'+type date nb_seen
2018-07-23 11:11:52 +02:00
GET - 'base64_decoded:'+date nd_decoded
GET - 'binary_decoded:'+date nd_decoded