AIL-framework/OVERVIEW.md

444 lines
12 KiB
Markdown
Raw Normal View History

2017-05-03 14:25:58 +02:00
Overview
========
2018-06-20 10:48:13 +02:00
Redis and ARDB overview
2017-05-03 14:25:58 +02:00
--------------------------
2017-05-03 14:42:37 +02:00
* Redis on TCP port 6379
- DB 0 - Cache hostname/dns
- DB 1 - Paste meta-data
2017-05-03 14:25:58 +02:00
* Redis on TCP port 6380 - Redis Log only
2017-05-03 14:42:37 +02:00
* Redis on TCP port 6381
- DB 0 - PubSub + Queue and Paste content LRU cache
- DB 1 - _Mixer_ Cache
2018-06-20 10:48:13 +02:00
* ARDB on TCP port 6382
2019-02-18 15:24:47 +01:00
DB 1 - Curve
DB 2 - TermFreq
DB 3 - Trending
DB 4 - Sentiments
DB 5 - TermCred
DB 6 - Tags
DB 7 - Metadata
DB 8 - Statistics
DB 9 - Crawler
2018-06-20 10:48:13 +02:00
* ARDB on TCP port <year>
2017-05-03 14:42:37 +02:00
- DB 0 - Lines duplicate
2018-06-20 10:48:13 +02:00
- DB 1 - Hashes
2017-05-03 14:25:58 +02:00
2019-02-18 15:24:47 +01:00
# Database Map:
### Redis cache
##### Brute force protection:
| Set Key | Value |
| ------ | ------ |
| failed_login_ip:**ip** | **nb login failed** | TTL
| failed_login_user_id:**user_id** | **nb login failed** | TTL
##### Item Import:
| Key | Value |
| ------ | ------ |
| **uuid**:nb_total | **nb total** | TTL *(if imported)*
| **uuid**:nb_end | **nb** | TTL *(if imported)*
| **uuid**:nb_sucess | **nb success** | TTL *(if imported)*
| **uuid**:end | **0 (in progress) or (item imported)** | TTL *(if imported)*
| **uuid**:processing | **process status: 0 or 1** | TTL *(if imported)*
| **uuid**:error | **error message** | TTL *(if imported)*
| Set Key | Value |
| ------ | ------ |
| **uuid**:paste_submit_link | **item_path** | TTL *(if imported)*
## DB0 - Core:
##### Update keys:
| Key | Value |
| ------ | ------ |
| | |
| ail:version | **current version** |
| | |
| ail:update_**update_version** | **background update name** |
| | **background update name** |
| | **...** |
| | |
| ail:update_error | **update message error** |
| | |
| ail:update_in_progress | **update version in progress** |
| ail:current_background_update | **current update version** |
| | |
| ail:current_background_script | **name of the background script currently executed** |
| ail:current_background_script_stat | **progress in % of the background script** |
| Hset Key | Field | Value |
| ------ | ------ | ------ |
| ail:update_date | **update tag** | **update date** |
##### User Management:
| Hset Key | Field | Value |
| ------ | ------ | ------ |
| user:all | **user id** | **password hash** |
| | | |
| user:tokens | **token** | **user id** |
| | | |
| user_metadata:**user id** | token | **token** |
| | change_passwd | **boolean** |
| | role | **role** |
| Set Key | Value |
| ------ | ------ |
| user_role:**role** | **user id** |
| Zrank Key | Field | Value |
| ------ | ------ | ------ |
| ail:all_role | **role** | **int, role priority (1=admin)** |
2019-07-26 14:28:02 +02:00
##### Item Import:
| Key | Value |
| ------ | ------ |
| **uuid**:isfile | **boolean** |
| **uuid**:paste_content | **item_content** |
| Set Key | Value |
| ------ | ------ |
| submitted:uuid | **uuid** |
| **uuid**:ltags | **tag** |
| **uuid**:ltagsgalaxies | **tag** |
## DB3 - Leak Hunter:
##### Tracker metadata:
| Hset - Key | Field | Value |
| ------ | ------ | ------ |
| tracker:**uuid** | tracker | **tacked word/set/regex** |
| | type | **word/set/regex** |
| | date | **date added** |
| | user_id | **created by user_id** |
| | dashboard | **0/1 Display alert on dashboard** |
| | description | **Tracker description** |
| | level | **0/1 Tracker visibility** |
##### Tracker by user_id (visibility level: user only):
| Set - Key | Value |
| ------ | ------ |
| user:tracker:**user_id** | **uuid - tracker uuid** |
| user:tracker:**user_id**:**word/set/regex - tracker type** | **uuid - tracker uuid** |
##### Global Tracker (visibility level: all users):
| Set - Key | Value |
| ------ | ------ |
| gobal:tracker | **uuid - tracker uuid** |
| gobal:tracker:**word/set/regex - tracker type** | **uuid - tracker uuid** |
##### All Tracker by type:
| Set - Key | Value |
| ------ | ------ |
| all:tracker:**word/set/regex - tracker type** | **tracked item** |
| Set - Key | Value |
| ------ | ------ |
| all:tracker_uuid:**tracker type**:**tracked item** | **uuid - tracker uuid** |
##### All Tracked items:
| Set - Key | Value |
| ------ | ------ |
| tracker:item:**uuid**:**date** | **item_id** |
##### All Tracked tags:
| Set - Key | Value |
| ------ | ------ |
| tracker:tags:**uuid** | **tag** |
##### All Tracked mail:
| Set - Key | Value |
| ------ | ------ |
| tracker:mail:**uuid** | **mail** |
##### Refresh Tracker:
| Key | Value |
| ------ | ------ |
| tracker:refresh:word | **last refreshed epoch** |
| tracker:refresh:set | - |
| tracker:refresh:regex | - |
##### Zset Stat Tracker:
| Key | Field | Value |
| ------ | ------ | ------ |
| tracker:stat:**uuid** | **date** | **nb_seen** |
##### Stat token:
| Key | Field | Value |
| ------ | ------ | ------ |
| stat_token_total_by_day:**date** | **word** | **nb_seen** |
| | | |
| stat_token_per_item_by_day:**date** | **word** | **nb_seen** |
| Set - Key | Value |
| ------ | ------ |
| stat_token_history | **date** |
2019-04-15 11:46:20 +02:00
## DB6 - Tags:
2019-02-18 15:24:47 +01:00
##### Hset:
| Key | Field | Value |
| ------ | ------ | ------ |
2019-04-15 11:46:20 +02:00
| per_paste_**epoch** | **term** | **nb_seen** |
2019-02-18 15:24:47 +01:00
| | |
| tag_metadata:**tag** | first_seen | **date** |
| tag_metadata:**tag** | last_seen | **date** |
2019-02-18 15:24:47 +01:00
##### Set:
| Key | Value |
| ------ | ------ |
| list_tags | **tag** |
| active_taxonomies | **taxonomie** |
| active_galaxies | **galaxie** |
| active_tag_**taxonomie or galaxy** | **tag** |
| synonym_tag_misp-galaxy:**galaxy** | **tag synonym** |
| list_export_tags | **user_tag** |
| **tag**:**date** | **paste** |
2019-02-18 15:24:47 +01:00
##### old:
| Key | Value |
| ------ | ------ |
| *tag* | *paste* |
2019-02-18 15:24:47 +01:00
## DB7 - Metadata:
#### Crawled Items:
##### Hset:
| Key | Field | Value |
| ------ | ------ | ------ |
| paste_metadata:**item path** | super_father | **first url crawled** |
| | father | **item father** |
| | domain | **crawled domain**:**domain port** |
| | screenshot | **screenshot hash** |
2019-02-18 15:24:47 +01:00
2019-04-08 17:04:09 +02:00
##### Set:
| Key | Field |
| ------ | ------ |
| tag:**item path** | **tag** |
| | |
| paste_children:**item path** | **item path** |
| | |
| hash_paste:**item path** | **hash** |
| base64_paste:**item path** | **hash** |
| hexadecimal_paste:**item path** | **hash** |
| binary_paste:**item path** | **hash** |
##### Zset:
| Key | Field | Value |
| ------ | ------ | ------ |
| nb_seen_hash:**hash** | **item** | **nb_seen** |
| base64_hash:**hash** | **item** | **nb_seen** |
| binary_hash:**hash** | **item** | **nb_seen** |
| hexadecimal_hash:**hash** | **item** | **nb_seen** |
#### PgpDump
##### Hset:
| Key | Field | Value |
| ------ | ------ | ------ |
| pgpdump_metadata_key:*key id* | first_seen | **date** |
| | last_seen | **date** |
| | |
| pgpdump_metadata_name:*name* | first_seen | **date** |
| | last_seen | **date** |
| | |
| pgpdump_metadata_mail:*mail* | first_seen | **date** |
| | last_seen | **date** |
##### set:
| Key | Value |
| ------ | ------ |
| set_pgpdump_key:*key id* | *item_path* |
| | |
| set_pgpdump_name:*name* | *item_path* |
| | |
| set_pgpdump_mail:*mail* | *item_path* |
| | |
| | |
| set_domain_pgpdump_**pgp_type**:**key** | **domain** |
##### Hset date:
| Key | Field | Value |
| ------ | ------ |
| pgpdump:key:*date* | *key* | *nb seen* |
| | |
| pgpdump:name:*date* | *name* | *nb seen* |
| | |
| pgpdump:mail:*date* | *mail* | *nb seen* |
##### zset:
| Key | Field | Value |
| ------ | ------ | ------ |
| pgpdump_all:key | *key* | *nb seen* |
| | |
| pgpdump_all:name | *name* | *nb seen* |
| | |
| pgpdump_all:mail | *mail* | *nb seen* |
##### set:
| Key | Value |
| ------ | ------ |
| item_pgpdump_key:*item_path* | *key* |
| | |
| item_pgpdump_name:*item_path* | *name* |
| | |
| item_pgpdump_mail:*item_path* | *mail* |
| | |
| | |
| domain_pgpdump_**pgp_type**:**domain** | **key** |
#### Cryptocurrency
Supported cryptocurrency:
- bitcoin
- bitcoin-cash
- dash
- etherum
- litecoin
- monero
- zcash
##### Hset:
| Key | Field | Value |
| ------ | ------ | ------ |
| cryptocurrency_metadata_**cryptocurrency name**:**cryptocurrency address** | first_seen | **date** |
| | last_seen | **date** |
##### set:
| Key | Value |
| ------ | ------ |
| set_cryptocurrency_**cryptocurrency name**:**cryptocurrency address** | **item_path** | PASTE
| domain_cryptocurrency_**cryptocurrency name**:**cryptocurrency address** | **domain** | DOMAIN
##### Hset date:
| Key | Field | Value |
| ------ | ------ |
| cryptocurrency:**cryptocurrency name**:**date** | **cryptocurrency address** | **nb seen** |
##### zset:
| Key | Field | Value |
| ------ | ------ | ------ |
| cryptocurrency_all:**cryptocurrency name** | **cryptocurrency address** | **nb seen** |
##### set:
| Key | Value |
| ------ | ------ |
| item_cryptocurrency_**cryptocurrency name**:**item_path** | **cryptocurrency address** | PASTE
| domain_cryptocurrency_**cryptocurrency name**:**item_path** | **cryptocurrency address** | DOMAIN
#### HASH
| Key | Value |
| ------ | ------ |
| hash_domain:**domain** | **hash** |
| domain_hash:**hash** | **domain** |
## DB9 - Crawler:
2019-02-18 15:24:47 +01:00
##### Hset:
| Key | Field | Value |
| ------ | ------ | ------ |
2019-04-08 17:04:09 +02:00
| **service type**_metadata:**domain** | first_seen | **date** |
| | last_check | **date** |
| | ports | **port**;**port**;**port** ... |
| | paste_parent | **parent last crawling (can be auto or manual)** |
2019-02-18 15:24:47 +01:00
##### Zset:
| Key | Field | Value |
| ------ | ------ | ------ |
2019-04-24 16:42:05 +02:00
| crawler\_history\_**service type**:**domain**:**port** | **item root (first crawled item)** | **epoch (seconds)** |
##### Set:
| Key | Value |
| ------ | ------ | ------ |
| screenshot:**sha256** | **item path** |
2019-02-18 15:24:47 +01:00
##### crawler config:
| Key | Value |
| ------ | ------ |
| crawler\_config:**crawler mode**:**service type**:**domain** | **json config** |
##### automatic crawler config:
| Key | Value |
| ------ | ------ |
| crawler\_config:**crawler mode**:**service type**:**domain**:**url** | **json config** |
###### exemple json config:
```json
{
"closespider_pagecount": 1,
"time": 3600,
"depth_limit": 0,
"har": 0,
"png": 0
}
```
##### CRAWLER QUEUES:
| SET - Key | Value |
| ------ | ------ |
| onion_crawler_queue | **url**;**item_id** | RE-CRAWL
| regular_crawler_queue | - |
| | |
| onion_crawler_priority_queue | **url**;**item_id** | USER
| regular_crawler_priority_queue | - |
| | |
| onion_crawler_discovery_queue | **url**;**item_id** | DISCOVER
| regular_crawler_discovery_queue | - |
##### TO CHANGE:
2018-07-30 09:21:22 +02:00
ARDB overview
2019-04-10 17:47:40 +02:00
2018-11-15 10:39:41 +01:00
----------------------------------------- SENTIMENT ------------------------------------
SET - 'Provider_set' Provider
2019-04-10 17:47:40 +02:00
2018-11-15 10:39:41 +01:00
KEY - 'UniqID' INT
SET - provider_timestamp UniqID
SET - UniqID avg_score
2018-11-20 14:39:45 +01:00
* DB 7 - Metadata:
2019-04-10 17:47:40 +02:00
2018-11-20 14:39:45 +01:00
----------------------------------------------------------------------------------------
----------------------------------------- BASE64 ----------------------------------------
HSET - 'metadata_hash:'+hash 'saved_path' saved_path
'size' size
'first_seen' first_seen
'last_seen' last_seen
'estimated_type' estimated_type
'vt_link' vt_link
'vt_report' vt_report
'nb_seen_in_all_pastes' nb_seen_in_all_pastes
2018-07-20 10:32:52 +02:00
'base64_decoder' nb_encoded
'binary_decoder' nb_encoded
SET - 'all_decoder' decoder*
2018-09-12 10:06:53 +02:00
SET - 'hash_all_type' hash_type *
2018-07-18 11:45:19 +02:00
SET - 'hash_base64_all_type' hash_type *
SET - 'hash_binary_all_type' hash_type *
2018-07-20 09:43:09 +02:00
ZADD - 'hash_date:'+20180622 hash * nb_seen_this_day
ZADD - 'base64_date:'+20180622 hash * nb_seen_this_day
ZADD - 'binary_date:'+20180622 hash * nb_seen_this_day
ZADD - 'base64_type:'+type date nb_seen
2018-07-18 11:45:19 +02:00
ZADD - 'binary_type:'+type date nb_seen
2018-07-23 11:11:52 +02:00
GET - 'base64_decoded:'+date nd_decoded
GET - 'binary_decoded:'+date nd_decoded