mirror of https://github.com/CIRCL/AIL-framework
commit
dd34382721
|
@ -11,9 +11,10 @@ ardb
|
|||
faup
|
||||
tlsh
|
||||
Blooms
|
||||
LEVEL_DB_DATA
|
||||
PASTES
|
||||
CRAWLED_SCREENSHOT
|
||||
BASE64
|
||||
HASHS
|
||||
DATA_ARDB
|
||||
indexdir/
|
||||
logs/
|
||||
|
@ -33,6 +34,7 @@ var/www/submitted
|
|||
bin/packages/config.cfg
|
||||
bin/packages/config.cfg.backup
|
||||
configs/keys
|
||||
files
|
||||
|
||||
# installed files
|
||||
nltk_data/
|
||||
|
|
53
HOWTO.md
53
HOWTO.md
|
@ -84,9 +84,9 @@ You can navigate into the interface by using arrow keys. In order to perform an
|
|||
|
||||
To change list, you can press the <TAB> key.
|
||||
|
||||
Also, you can quickly stop or start modules by clicking on the <K> or <S> symbol respectively. These are located in the _Action_ column.
|
||||
Also, you can quickly stop or start modules by clicking on the ``<K>`` or ``<S>`` symbol respectively. These are located in the _Action_ column.
|
||||
|
||||
Finally, you can quit this program by pressing either <q> or <C-c>
|
||||
Finally, you can quit this program by pressing either ``<q>`` or ``<C-c>``.
|
||||
|
||||
|
||||
Terms frequency usage
|
||||
|
@ -96,3 +96,52 @@ In AIL, you can track terms, set of terms and even regexes without creating a de
|
|||
- You can track a term by simply putting it in the box.
|
||||
- You can track a set of terms by simply putting terms in an array surrounded by the '\' character. You can also set a custom threshold regarding the number of terms that must match to trigger the detection. For example, if you want to track the terms _term1_ and _term2_ at the same time, you can use the following rule: `\[term1, term2, [100]]\`
|
||||
- You can track regexes as easily as tracking a term. You just have to put your regex in the box surrounded by the '/' character. For example, if you want to track the regex matching all email address having the domain _domain.net_, you can use the following aggressive rule: `/*.domain.net/`.
|
||||
|
||||
|
||||
Crawler
|
||||
---------------------
|
||||
In AIL, you can crawl hidden services.
|
||||
|
||||
There are two types of installation. You can install a *local* or a *remote* Splash server.
|
||||
``(Splash host) = the server running the splash service``
|
||||
``(AIL host) = the server running AIL``
|
||||
|
||||
### Installation/Configuration
|
||||
|
||||
1. *(Splash host)* Launch ``crawler_hidden_services_install.sh`` to install all requirements (type ``y`` if a localhost splah server is used or use the ``-y`` option)
|
||||
|
||||
2. *(Splash host)* To install and setup your tor proxy:
|
||||
- Install the tor proxy: ``sudo apt-get install tor -y``
|
||||
(Not required if ``Splah host == AIL host`` - The tor proxy is installed by default in AIL)
|
||||
- Add the following line ``SOCKSPolicy accept 172.17.0.0/16`` in ``/etc/tor/torrc``
|
||||
(for a linux docker, the localhost IP is *172.17.0.1*; Should be adapted for other platform)
|
||||
- Restart the tor proxy: ``sudo service tor restart``
|
||||
|
||||
3. *(AIL host)* Edit the ``/bin/packages/config.cfg`` file:
|
||||
- In the crawler section, set ``activate_crawler`` to ``True``
|
||||
- Change the IP address of Splash servers if needed (remote only)
|
||||
- Set ``splash_onion_port`` according to your Splash servers port numbers that will be used.
|
||||
those ports numbers should be described as a single port (ex: 8050) or a port range (ex: 8050-8052 for 8050,8051,8052 ports).
|
||||
|
||||
|
||||
### Starting the scripts
|
||||
|
||||
- *(Splash host)* Launch all Splash servers with:
|
||||
```sudo ./bin/torcrawler/launch_splash_crawler.sh -f <config absolute_path> -p <port_start> -n <number_of_splash>```
|
||||
With ``<port_start>`` and ``<number_of_splash>`` matching those specified at ``splash_onion_port`` in the configuration file of point 3 (``/bin/packages/config.cfg``)
|
||||
|
||||
All Splash dockers are launched inside the ``Docker_Splash`` screen. You can use ``sudo screen -r Docker_Splash`` to connect to the screen session and check all Splash servers status.
|
||||
|
||||
- (AIL host) launch all AIL crawler scripts using:
|
||||
```./bin/LAUNCH.sh -c```
|
||||
|
||||
|
||||
### TL;DR - Local setup
|
||||
#### Installation
|
||||
- ```crawler_hidden_services_install.sh -y```
|
||||
- Add the following line in ``SOCKSPolicy accept 172.17.0.0/16`` in ``/etc/tor/torrc``
|
||||
- ```sudo service tor restart```
|
||||
- set activate_crawler to True in ``/bin/packages/config.cfg``
|
||||
#### Start
|
||||
- ```sudo ./bin/torcrawler/launch_splash_crawler.sh -f $AIL_HOME/configs/docker/splash_onion/etc/splash/proxy-profiles/ -p 8050 -n 1";```
|
||||
- ```./bin/LAUNCH.sh -c```
|
||||
|
|
|
@ -32,6 +32,7 @@ def decode_base58(bc, length):
|
|||
for char in bc:
|
||||
n = n * 58 + digits58.index(char)
|
||||
return n.to_bytes(length, 'big')
|
||||
|
||||
def check_bc(bc):
|
||||
try:
|
||||
bcbytes = decode_base58(bc, 25)
|
||||
|
|
|
@ -0,0 +1,239 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
import os
|
||||
import sys
|
||||
import re
|
||||
import redis
|
||||
import datetime
|
||||
import time
|
||||
import subprocess
|
||||
import requests
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
from Helper import Process
|
||||
from pubsublogger import publisher
|
||||
|
||||
def on_error_send_message_back_in_queue(type_hidden_service, domain, message):
|
||||
# send this msg back in the queue
|
||||
if not r_onion.sismember('{}_domain_crawler_queue'.format(type_hidden_service), domain):
|
||||
r_onion.sadd('{}_domain_crawler_queue'.format(type_hidden_service), domain)
|
||||
r_onion.sadd('{}_crawler_queue'.format(type_hidden_service), message)
|
||||
|
||||
def crawl_onion(url, domain, date, date_month, message):
|
||||
|
||||
#if not r_onion.sismember('full_onion_up', domain) and not r_onion.sismember('onion_down:'+date , domain):
|
||||
super_father = r_serv_metadata.hget('paste_metadata:'+paste, 'super_father')
|
||||
if super_father is None:
|
||||
super_father=paste
|
||||
|
||||
try:
|
||||
r = requests.get(splash_url , timeout=30.0)
|
||||
except Exception:
|
||||
# TODO: relaunch docker or send error message
|
||||
|
||||
on_error_send_message_back_in_queue(type_hidden_service, domain, message)
|
||||
publisher.error('{} SPASH DOWN'.format(splash_url))
|
||||
print('--------------------------------------')
|
||||
print(' \033[91m DOCKER SPLASH DOWN\033[0m')
|
||||
print(' {} DOWN'.format(splash_url))
|
||||
exit(1)
|
||||
|
||||
if r.status_code == 200:
|
||||
process = subprocess.Popen(["python", './torcrawler/tor_crawler.py', splash_url, type_hidden_service, url, domain, paste, super_father],
|
||||
stdout=subprocess.PIPE)
|
||||
while process.poll() is None:
|
||||
time.sleep(1)
|
||||
|
||||
if process.returncode == 0:
|
||||
output = process.stdout.read().decode()
|
||||
print(output)
|
||||
# error: splash:Connection to proxy refused
|
||||
if 'Connection to proxy refused' in output:
|
||||
on_error_send_message_back_in_queue(type_hidden_service, domain, message)
|
||||
publisher.error('{} SPASH, PROXY DOWN OR BAD CONFIGURATION'.format(splash_url))
|
||||
print('------------------------------------------------------------------------')
|
||||
print(' \033[91m SPLASH: Connection to proxy refused')
|
||||
print('')
|
||||
print(' PROXY DOWN OR BAD CONFIGURATION\033[0m'.format(splash_url))
|
||||
print('------------------------------------------------------------------------')
|
||||
exit(-2)
|
||||
else:
|
||||
print(process.stdout.read())
|
||||
exit(-1)
|
||||
else:
|
||||
on_error_send_message_back_in_queue(type_hidden_service, domain, message)
|
||||
print('--------------------------------------')
|
||||
print(' \033[91m DOCKER SPLASH DOWN\033[0m')
|
||||
print(' {} DOWN'.format(splash_url))
|
||||
exit(1)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
||||
if len(sys.argv) != 3:
|
||||
print('usage:', 'Crawler.py', 'type_hidden_service (onion or i2p or regular)', 'splash_port')
|
||||
exit(1)
|
||||
|
||||
type_hidden_service = sys.argv[1]
|
||||
splash_port = sys.argv[2]
|
||||
|
||||
publisher.port = 6380
|
||||
publisher.channel = "Script"
|
||||
|
||||
publisher.info("Script Crawler started")
|
||||
|
||||
config_section = 'Crawler'
|
||||
|
||||
# Setup the I/O queues
|
||||
p = Process(config_section)
|
||||
|
||||
url_onion = "((http|https|ftp)?(?:\://)?([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.onion)(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*)"
|
||||
re.compile(url_onion)
|
||||
url_i2p = "((http|https|ftp)?(?:\://)?([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.i2p)(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*)"
|
||||
re.compile(url_i2p)
|
||||
|
||||
if type_hidden_service == 'onion':
|
||||
regex_hidden_service = url_onion
|
||||
splash_url = '{}:{}'.format( p.config.get("Crawler", "splash_url_onion"), splash_port)
|
||||
elif type_hidden_service == 'i2p':
|
||||
regex_hidden_service = url_i2p
|
||||
splash_url = '{}:{}'.format( p.config.get("Crawler", "splash_url_i2p"), splash_port)
|
||||
elif type_hidden_service == 'regular':
|
||||
regex_hidden_service = url_i2p
|
||||
splash_url = '{}:{}'.format( p.config.get("Crawler", "splash_url_onion"), splash_port)
|
||||
else:
|
||||
print('incorrect crawler type: {}'.format(type_hidden_service))
|
||||
exit(0)
|
||||
|
||||
print('splash url: {}'.format(splash_url))
|
||||
|
||||
crawler_depth_limit = p.config.getint("Crawler", "crawler_depth_limit")
|
||||
|
||||
PASTES_FOLDER = os.path.join(os.environ['AIL_HOME'], p.config.get("Directories", "pastes"))
|
||||
|
||||
r_serv_metadata = redis.StrictRedis(
|
||||
host=p.config.get("ARDB_Metadata", "host"),
|
||||
port=p.config.getint("ARDB_Metadata", "port"),
|
||||
db=p.config.getint("ARDB_Metadata", "db"),
|
||||
decode_responses=True)
|
||||
|
||||
r_cache = redis.StrictRedis(
|
||||
host=p.config.get("Redis_Cache", "host"),
|
||||
port=p.config.getint("Redis_Cache", "port"),
|
||||
db=p.config.getint("Redis_Cache", "db"),
|
||||
decode_responses=True)
|
||||
|
||||
r_onion = redis.StrictRedis(
|
||||
host=p.config.get("ARDB_Onion", "host"),
|
||||
port=p.config.getint("ARDB_Onion", "port"),
|
||||
db=p.config.getint("ARDB_Onion", "db"),
|
||||
decode_responses=True)
|
||||
|
||||
# load domains blacklist
|
||||
try:
|
||||
with open(os.environ['AIL_BIN']+'/torcrawler/blacklist_onion.txt', 'r') as f:
|
||||
r_onion.delete('blacklist_{}'.format(type_hidden_service))
|
||||
lines = f.read().splitlines()
|
||||
for line in lines:
|
||||
r_onion.sadd('blacklist_{}'.format(type_hidden_service), line)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
while True:
|
||||
|
||||
# Recovering the streamed message informations.
|
||||
message = r_onion.spop('{}_crawler_queue'.format(type_hidden_service))
|
||||
|
||||
if message is not None:
|
||||
|
||||
splitted = message.split(';')
|
||||
if len(splitted) == 2:
|
||||
url, paste = splitted
|
||||
paste = paste.replace(PASTES_FOLDER+'/', '')
|
||||
|
||||
url_list = re.findall(regex_hidden_service, url)[0]
|
||||
if url_list[1] == '':
|
||||
url= 'http://{}'.format(url)
|
||||
|
||||
link, s, credential, subdomain, domain, host, port, \
|
||||
resource_path, query_string, f1, f2, f3, f4 = url_list
|
||||
domain = url_list[4]
|
||||
r_onion.srem('{}_domain_crawler_queue'.format(type_hidden_service), domain)
|
||||
|
||||
domain_url = 'http://{}'.format(domain)
|
||||
|
||||
print('\033[92m------------------START CRAWLER------------------\033[0m')
|
||||
print('crawler type: {}'.format(type_hidden_service))
|
||||
print('\033[92m-------------------------------------------------\033[0m')
|
||||
print('url: {}'.format(url))
|
||||
print('domain: {}'.format(domain))
|
||||
print('domain_url: {}'.format(domain_url))
|
||||
|
||||
if not r_onion.sismember('blacklist_{}'.format(type_hidden_service), domain):
|
||||
|
||||
date = datetime.datetime.now().strftime("%Y%m%d")
|
||||
date_month = datetime.datetime.now().strftime("%Y%m")
|
||||
|
||||
if not r_onion.sismember('month_{}_up:{}'.format(type_hidden_service, date_month), domain) and not r_onion.sismember('{}_down:{}'.format(type_hidden_service, date), domain):
|
||||
|
||||
crawl_onion(url, domain, date, date_month, message)
|
||||
if url != domain_url:
|
||||
print(url)
|
||||
print(domain_url)
|
||||
crawl_onion(domain_url, domain, date, date_month, message)
|
||||
|
||||
# save down onion
|
||||
if not r_onion.sismember('{}_up:{}'.format(type_hidden_service, date), domain):
|
||||
r_onion.sadd('{}_down:{}'.format(type_hidden_service, date), domain)
|
||||
#r_onion.sadd('{}_down_link:{}'.format(type_hidden_service, date), url)
|
||||
#r_onion.hincrby('{}_link_down'.format(type_hidden_service), url, 1)
|
||||
if not r_onion.exists('{}_metadata:{}'.format(type_hidden_service, domain)):
|
||||
r_onion.hset('{}_metadata:{}'.format(type_hidden_service, domain), 'first_seen', date)
|
||||
r_onion.hset('{}_metadata:{}'.format(type_hidden_service,domain), 'last_seen', date)
|
||||
else:
|
||||
#r_onion.hincrby('{}_link_up'.format(type_hidden_service), url, 1)
|
||||
if r_onion.sismember('month_{}_up:{}'.format(type_hidden_service, date_month), domain) and r_serv_metadata.exists('paste_children:'+paste):
|
||||
msg = 'infoleak:automatic-detection="{}";{}'.format(type_hidden_service, paste)
|
||||
p.populate_set_out(msg, 'Tags')
|
||||
|
||||
# last check
|
||||
r_onion.hset('{}_metadata:{}'.format(type_hidden_service, domain), 'last_check', date)
|
||||
|
||||
# last_father
|
||||
r_onion.hset('{}_metadata:{}'.format(type_hidden_service, domain), 'paste_parent', paste)
|
||||
|
||||
# add onion screenshot history
|
||||
# add crawled days
|
||||
if r_onion.lindex('{}_history:{}'.format(type_hidden_service, domain), 0) != date:
|
||||
r_onion.lpush('{}_history:{}'.format(type_hidden_service, domain), date)
|
||||
# add crawled history by date
|
||||
r_onion.lpush('{}_history:{}:{}'.format(type_hidden_service, domain, date), paste) #add datetime here
|
||||
|
||||
|
||||
# check external onions links (full_scrawl)
|
||||
external_domains = set()
|
||||
for link in r_onion.smembers('domain_{}_external_links:{}'.format(type_hidden_service, domain)):
|
||||
external_domain = re.findall(url_onion, link)
|
||||
external_domain.extend(re.findall(url_i2p, link))
|
||||
if len(external_domain) > 0:
|
||||
external_domain = external_domain[0][4]
|
||||
else:
|
||||
continue
|
||||
if '.onion' in external_domain and external_domain != domain:
|
||||
external_domains.add(external_domain)
|
||||
elif '.i2p' in external_domain and external_domain != domain:
|
||||
external_domains.add(external_domain)
|
||||
if len(external_domains) >= 10:
|
||||
r_onion.sadd('{}_potential_source'.format(type_hidden_service), domain)
|
||||
r_onion.delete('domain_{}_external_links:{}'.format(type_hidden_service, domain))
|
||||
print(r_onion.smembers('domain_{}_external_links:{}'.format(type_hidden_service, domain)))
|
||||
|
||||
# update list, last crawled onions
|
||||
r_onion.lpush('last_{}'.format(type_hidden_service), domain)
|
||||
r_onion.ltrim('last_{}'.format(type_hidden_service), 0, 15)
|
||||
|
||||
else:
|
||||
continue
|
||||
else:
|
||||
time.sleep(1)
|
|
@ -27,6 +27,7 @@ islogged=`screen -ls | egrep '[0-9]+.Logging_AIL' | cut -d. -f1`
|
|||
isqueued=`screen -ls | egrep '[0-9]+.Queue_AIL' | cut -d. -f1`
|
||||
isscripted=`screen -ls | egrep '[0-9]+.Script_AIL' | cut -d. -f1`
|
||||
isflasked=`screen -ls | egrep '[0-9]+.Flask_AIL' | cut -d. -f1`
|
||||
iscrawler=`screen -ls | egrep '[0-9]+.Crawler_AIL' | cut -d. -f1`
|
||||
isfeeded=`screen -ls | egrep '[0-9]+.Feeder_Pystemon' | cut -d. -f1`
|
||||
|
||||
function helptext {
|
||||
|
@ -199,6 +200,35 @@ function launching_scripts {
|
|||
|
||||
}
|
||||
|
||||
function launching_crawler {
|
||||
if [[ ! $iscrawler ]]; then
|
||||
CONFIG=$AIL_BIN/packages/config.cfg
|
||||
lport=$(awk '/^\[Crawler\]/{f=1} f==1&&/^splash_onion_port/{print $3;exit}' "${CONFIG}")
|
||||
|
||||
IFS='-' read -ra PORTS <<< "$lport"
|
||||
if [ ${#PORTS[@]} -eq 1 ]
|
||||
then
|
||||
first_port=${PORTS[0]}
|
||||
last_port=${PORTS[0]}
|
||||
else
|
||||
first_port=${PORTS[0]}
|
||||
last_port=${PORTS[1]}
|
||||
fi
|
||||
|
||||
screen -dmS "Crawler_AIL"
|
||||
sleep 0.1
|
||||
|
||||
for ((i=first_port;i<=last_port;i++)); do
|
||||
screen -S "Crawler_AIL" -X screen -t "onion_crawler:$i" bash -c 'cd '${AIL_BIN}'; ./Crawler.py onion '$i'; read x'
|
||||
sleep 0.1
|
||||
done
|
||||
|
||||
echo -e $GREEN"\t* Launching Crawler_AIL scripts"$DEFAULT
|
||||
else
|
||||
echo -e $RED"\t* A screen is already launched"$DEFAULT
|
||||
fi
|
||||
}
|
||||
|
||||
function shutting_down_redis {
|
||||
redis_dir=${AIL_HOME}/redis/src/
|
||||
bash -c $redis_dir'redis-cli -p 6379 SHUTDOWN'
|
||||
|
@ -420,6 +450,9 @@ function launch_all {
|
|||
Flask)
|
||||
launch_flask;
|
||||
;;
|
||||
Crawler)
|
||||
launching_crawler;
|
||||
;;
|
||||
Killall)
|
||||
killall;
|
||||
;;
|
||||
|
@ -445,9 +478,9 @@ while [ "$1" != "" ]; do
|
|||
;;
|
||||
-k | --killAll ) killall;
|
||||
;;
|
||||
-c | --configUpdate ) checking_configuration "manual";
|
||||
-t | --thirdpartyUpdate ) update_thirdparty;
|
||||
;;
|
||||
-t | --thirdpartyUpdate ) update_thirdparty;
|
||||
-c | --crawler ) launching_crawler;
|
||||
;;
|
||||
-f | --launchFeeder ) launch_feeder;
|
||||
;;
|
||||
|
|
77
bin/Onion.py
77
bin/Onion.py
|
@ -21,7 +21,6 @@ Requirements
|
|||
*Need the ZMQ_Sub_Onion_Q Module running to be able to work properly.
|
||||
|
||||
"""
|
||||
import pprint
|
||||
import time
|
||||
from packages import Paste
|
||||
from pubsublogger import publisher
|
||||
|
@ -30,6 +29,7 @@ import os
|
|||
import base64
|
||||
import subprocess
|
||||
import redis
|
||||
import re
|
||||
|
||||
from Helper import Process
|
||||
|
||||
|
@ -97,6 +97,12 @@ if __name__ == "__main__":
|
|||
db=p.config.getint("Redis_Cache", "db"),
|
||||
decode_responses=True)
|
||||
|
||||
r_onion = redis.StrictRedis(
|
||||
host=p.config.get("ARDB_Onion", "host"),
|
||||
port=p.config.getint("ARDB_Onion", "port"),
|
||||
db=p.config.getint("ARDB_Onion", "db"),
|
||||
decode_responses=True)
|
||||
|
||||
# FUNCTIONS #
|
||||
publisher.info("Script subscribed to channel onion_categ")
|
||||
|
||||
|
@ -107,9 +113,21 @@ if __name__ == "__main__":
|
|||
message = p.get_from_set()
|
||||
prec_filename = None
|
||||
|
||||
# send to crawler:
|
||||
activate_crawler = p.config.get("Crawler", "activate_crawler")
|
||||
if activate_crawler == 'True':
|
||||
activate_crawler = True
|
||||
print('Crawler enabled')
|
||||
else:
|
||||
activate_crawler = False
|
||||
print('Crawler disabled')
|
||||
|
||||
# Thanks to Faup project for this regex
|
||||
# https://github.com/stricaud/faup
|
||||
url_regex = "((http|https|ftp)\://([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.onion)(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*)"
|
||||
url_regex = "((http|https|ftp)?(?:\://)?([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.onion)(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*)"
|
||||
i2p_regex = "((http|https|ftp)?(?:\://)?([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.i2p)(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*)"
|
||||
re.compile(url_regex)
|
||||
|
||||
|
||||
while True:
|
||||
if message is not None:
|
||||
|
@ -123,12 +141,32 @@ if __name__ == "__main__":
|
|||
PST = Paste.Paste(filename)
|
||||
|
||||
for x in PST.get_regex(url_regex):
|
||||
print(x)
|
||||
# Extracting url with regex
|
||||
url, s, credential, subdomain, domain, host, port, \
|
||||
resource_path, query_string, f1, f2, f3, f4 = x
|
||||
|
||||
domains_list.append(domain)
|
||||
urls.append(url)
|
||||
if '.onion' in url:
|
||||
print(url)
|
||||
domains_list.append(domain)
|
||||
urls.append(url)
|
||||
|
||||
'''
|
||||
for x in PST.get_regex(i2p_regex):
|
||||
# Extracting url with regex
|
||||
url, s, credential, subdomain, domain, host, port, \
|
||||
resource_path, query_string, f1, f2, f3, f4 = x
|
||||
|
||||
if '.i2p' in url:
|
||||
print('add i2p')
|
||||
print(domain)
|
||||
if not r_onion.sismember('i2p_domain', domain) and not r_onion.sismember('i2p_domain_crawler_queue', domain):
|
||||
r_onion.sadd('i2p_domain', domain)
|
||||
r_onion.sadd('i2p_link', url)
|
||||
r_onion.sadd('i2p_domain_crawler_queue', domain)
|
||||
msg = '{};{}'.format(url,PST.p_path)
|
||||
r_onion.sadd('i2p_crawler_queue', msg)
|
||||
'''
|
||||
|
||||
# Saving the list of extracted onion domains.
|
||||
PST.__setattr__(channel, domains_list)
|
||||
|
@ -149,12 +187,33 @@ if __name__ == "__main__":
|
|||
to_print = 'Onion;{};{};{};'.format(PST.p_source,
|
||||
PST.p_date,
|
||||
PST.p_name)
|
||||
for url in fetch(p, r_cache, urls, domains_list, path):
|
||||
publisher.info('{}Checked {};{}'.format(to_print, url, PST.p_path))
|
||||
p.populate_set_out('onion;{}'.format(PST.p_path), 'alertHandler')
|
||||
|
||||
msg = 'infoleak:automatic-detection="onion";{}'.format(PST.p_path)
|
||||
p.populate_set_out(msg, 'Tags')
|
||||
if activate_crawler:
|
||||
date_month = datetime.datetime.now().strftime("%Y%m")
|
||||
date = datetime.datetime.now().strftime("%Y%m%d")
|
||||
for url in urls:
|
||||
|
||||
domain = re.findall(url_regex, url)
|
||||
if len(domain) > 0:
|
||||
domain = domain[0][4]
|
||||
else:
|
||||
continue
|
||||
|
||||
if not r_onion.sismember('month_onion_up:{}'.format(date_month), domain) and not r_onion.sismember('onion_down:'+date , domain):
|
||||
if not r_onion.sismember('onion_domain_crawler_queue', domain):
|
||||
print('send to onion crawler')
|
||||
r_onion.sadd('onion_domain_crawler_queue', domain)
|
||||
msg = '{};{}'.format(url,PST.p_path)
|
||||
r_onion.sadd('onion_crawler_queue', msg)
|
||||
#p.populate_set_out(msg, 'Crawler')
|
||||
|
||||
else:
|
||||
for url in fetch(p, r_cache, urls, domains_list, path):
|
||||
publisher.info('{}Checked {};{}'.format(to_print, url, PST.p_path))
|
||||
p.populate_set_out('onion;{}'.format(PST.p_path), 'alertHandler')
|
||||
|
||||
msg = 'infoleak:automatic-detection="onion";{}'.format(PST.p_path)
|
||||
p.populate_set_out(msg, 'Tags')
|
||||
else:
|
||||
publisher.info('{}Onion related;{}'.format(to_print, PST.p_path))
|
||||
|
||||
|
|
|
@ -0,0 +1,183 @@
|
|||
#!/usr/bin/python3
|
||||
|
||||
"""
|
||||
The ``hiddenServices Class``
|
||||
===================
|
||||
|
||||
Use it to create an object from an existing paste or other random file.
|
||||
|
||||
Conditions to fulfill to be able to use this class correctly:
|
||||
-------------------------------------------------------------
|
||||
|
||||
1/ The paste need to be saved on disk somewhere (have an accessible path)
|
||||
2/ The paste need to be gziped.
|
||||
3/ The filepath need to look like something like this:
|
||||
/directory/source/year/month/day/paste.gz
|
||||
|
||||
"""
|
||||
|
||||
import os
|
||||
import gzip
|
||||
import redis
|
||||
import random
|
||||
|
||||
import configparser
|
||||
import sys
|
||||
sys.path.append(os.path.join(os.environ['AIL_BIN'], 'packages/'))
|
||||
from Date import Date
|
||||
|
||||
class HiddenServices(object):
|
||||
"""
|
||||
This class representing a hiddenServices as an object.
|
||||
When created, the object will have by default some "main attributes"
|
||||
|
||||
:Example:
|
||||
|
||||
PST = HiddenServices("xxxxxxxx.onion", "onion")
|
||||
|
||||
"""
|
||||
|
||||
def __init__(self, domain, type):
|
||||
|
||||
configfile = os.path.join(os.environ['AIL_BIN'], 'packages/config.cfg')
|
||||
if not os.path.exists(configfile):
|
||||
raise Exception('Unable to find the configuration file. \
|
||||
Did you set environment variables? \
|
||||
Or activate the virtualenv.')
|
||||
|
||||
cfg = configparser.ConfigParser()
|
||||
cfg.read(configfile)
|
||||
self.r_serv_onion = redis.StrictRedis(
|
||||
host=cfg.get("ARDB_Onion", "host"),
|
||||
port=cfg.getint("ARDB_Onion", "port"),
|
||||
db=cfg.getint("ARDB_Onion", "db"),
|
||||
decode_responses=True)
|
||||
|
||||
self.r_serv_metadata = redis.StrictRedis(
|
||||
host=cfg.get("ARDB_Metadata", "host"),
|
||||
port=cfg.getint("ARDB_Metadata", "port"),
|
||||
db=cfg.getint("ARDB_Metadata", "db"),
|
||||
decode_responses=True)
|
||||
|
||||
self.domain = domain
|
||||
self.type = type
|
||||
self.tags = {}
|
||||
|
||||
if type == 'onion':
|
||||
self.paste_directory = os.path.join(os.environ['AIL_HOME'], cfg.get("Directories", "pastes"))
|
||||
self.paste_crawled_directory = os.path.join(self.paste_directory, cfg.get("Directories", "crawled"))
|
||||
self.paste_crawled_directory_name = cfg.get("Directories", "crawled")
|
||||
self.screenshot_directory = os.path.join(os.environ['AIL_HOME'], cfg.get("Directories", "crawled_screenshot"))
|
||||
elif type == 'i2p':
|
||||
self.paste_directory = os.path.join(os.environ['AIL_HOME'], cfg.get("Directories", "crawled_screenshot"))
|
||||
self.screenshot_directory = os.path.join(os.environ['AIL_HOME'], cfg.get("Directories", "crawled_screenshot"))
|
||||
else:
|
||||
## TODO: # FIXME: add error
|
||||
pass
|
||||
|
||||
def get_origin_paste_name(self):
|
||||
origin_paste = self.r_serv_onion.hget('onion_metadata:{}'.format(self.domain), 'paste_parent')
|
||||
if origin_paste is None:
|
||||
return ''
|
||||
return origin_paste.replace(self.paste_directory+'/', '')
|
||||
|
||||
def get_domain_tags(self):
|
||||
return self.tags
|
||||
|
||||
def update_domain_tags(self, children):
|
||||
p_tags = self.r_serv_metadata.smembers('tag:'+children)
|
||||
for tag in p_tags:
|
||||
self.tags[tag] = self.tags.get(tag, 0) + 1
|
||||
|
||||
#todo use the right paste
|
||||
def get_last_crawled_pastes(self):
|
||||
paste_parent = self.r_serv_onion.hget('onion_metadata:{}'.format(self.domain), 'paste_parent')
|
||||
#paste_parent = paste_parent.replace(self.paste_directory, '')[1:]
|
||||
return self.get_all_pastes_domain(paste_parent)
|
||||
|
||||
def get_all_pastes_domain(self, father):
|
||||
if father is None:
|
||||
return []
|
||||
l_crawled_pastes = []
|
||||
paste_parent = father.replace(self.paste_directory+'/', '')
|
||||
paste_childrens = self.r_serv_metadata.smembers('paste_children:{}'.format(paste_parent))
|
||||
## TODO: # FIXME: remove me
|
||||
paste_children = self.r_serv_metadata.smembers('paste_children:{}'.format(father))
|
||||
paste_childrens = paste_childrens | paste_children
|
||||
for children in paste_childrens:
|
||||
if self.domain in children:
|
||||
l_crawled_pastes.append(children)
|
||||
self.update_domain_tags(children)
|
||||
l_crawled_pastes.extend(self.get_all_pastes_domain(children))
|
||||
return l_crawled_pastes
|
||||
|
||||
def get_domain_son(self, l_paste):
|
||||
if l_paste is None:
|
||||
return None
|
||||
|
||||
set_domain = set()
|
||||
for paste in l_paste:
|
||||
paste_full = paste.replace(self.paste_directory+'/', '')
|
||||
paste_childrens = self.r_serv_metadata.smembers('paste_children:{}'.format(paste_full))
|
||||
## TODO: # FIXME: remove me
|
||||
paste_children = self.r_serv_metadata.smembers('paste_children:{}'.format(paste))
|
||||
paste_childrens = paste_childrens | paste_children
|
||||
for children in paste_childrens:
|
||||
if not self.domain in children:
|
||||
print(children)
|
||||
set_domain.add((children.split('.onion')[0]+'.onion').split('/')[-1])
|
||||
|
||||
return set_domain
|
||||
|
||||
def get_all_domain_son(self, father):
|
||||
if father is None:
|
||||
return []
|
||||
l_crawled_pastes = []
|
||||
paste_parent = father.replace(self.paste_directory+'/', '')
|
||||
paste_childrens = self.r_serv_metadata.smembers('paste_children:{}'.format(paste_parent))
|
||||
## TODO: # FIXME: remove me
|
||||
paste_children = self.r_serv_metadata.smembers('paste_children:{}'.format(father))
|
||||
paste_childrens = paste_childrens | paste_children
|
||||
for children in paste_childrens:
|
||||
if not self.domain in children:
|
||||
l_crawled_pastes.append(children)
|
||||
#self.update_domain_tags(children)
|
||||
l_crawled_pastes.extend(self.get_all_domain_son(children))
|
||||
|
||||
return l_crawled_pastes
|
||||
|
||||
def get_domain_random_screenshot(self, l_crawled_pastes, num_screenshot = 1):
|
||||
l_screenshot_paste = []
|
||||
for paste in l_crawled_pastes:
|
||||
## FIXME: # TODO: remove me
|
||||
paste= paste.replace(self.paste_directory+'/', '')
|
||||
|
||||
paste = paste.replace(self.paste_crawled_directory_name, '')
|
||||
if os.path.isfile( '{}{}.png'.format(self.screenshot_directory, paste) ):
|
||||
l_screenshot_paste.append(paste[1:])
|
||||
|
||||
if len(l_screenshot_paste) > num_screenshot:
|
||||
l_random_screenshot = []
|
||||
for index in random.sample( range(0, len(l_screenshot_paste)), num_screenshot ):
|
||||
l_random_screenshot.append(l_screenshot_paste[index])
|
||||
return l_random_screenshot
|
||||
else:
|
||||
return l_screenshot_paste
|
||||
|
||||
def get_crawled_pastes_by_date(self, date):
|
||||
|
||||
pastes_path = os.path.join(self.paste_crawled_directory, date[0:4], date[4:6], date[6:8])
|
||||
paste_parent = self.r_serv_onion.hget('onion_metadata:{}'.format(self.domain), 'last_check')
|
||||
|
||||
l_crawled_pastes = []
|
||||
return l_crawled_pastes
|
||||
|
||||
def get_last_crawled_pastes_fileSearch(self):
|
||||
|
||||
last_check = self.r_serv_onion.hget('onion_metadata:{}'.format(self.domain), 'last_check')
|
||||
return self.get_crawled_pastes_by_date_fileSearch(last_check)
|
||||
|
||||
def get_crawled_pastes_by_date_fileSearch(self, date):
|
||||
pastes_path = os.path.join(self.paste_crawled_directory, date[0:4], date[4:6], date[6:8])
|
||||
l_crawled_pastes = [f for f in os.listdir(pastes_path) if self.domain in f]
|
||||
return l_crawled_pastes
|
|
@ -94,6 +94,7 @@ class Paste(object):
|
|||
|
||||
var = self.p_path.split('/')
|
||||
self.p_date = Date(var[-4], var[-3], var[-2])
|
||||
self.p_rel_path = os.path.join(var[-4], var[-3], var[-2], self.p_name)
|
||||
self.p_source = var[-5]
|
||||
self.supposed_url = 'https://{}/{}'.format(self.p_source.replace('_pro', ''), var[-1].split('.gz')[0])
|
||||
|
||||
|
@ -291,6 +292,9 @@ class Paste(object):
|
|||
else:
|
||||
return '[]'
|
||||
|
||||
def get_p_rel_path(self):
|
||||
return self.p_rel_path
|
||||
|
||||
def save_all_attributes_redis(self, key=None):
|
||||
"""
|
||||
Saving all the attributes in a "Redis-like" Database (Redis, LevelDB)
|
||||
|
|
|
@ -4,6 +4,8 @@ dicofilters = Dicos
|
|||
pastes = PASTES
|
||||
hash = HASHS
|
||||
base64 = BASE64
|
||||
crawled = crawled
|
||||
crawled_screenshot = CRAWLED_SCREENSHOT
|
||||
|
||||
wordtrending_csv = var/www/static/csv/wordstrendingdata
|
||||
wordsfile = files/wordfile
|
||||
|
@ -184,6 +186,11 @@ host = localhost
|
|||
port = 6382
|
||||
db = 8
|
||||
|
||||
[ARDB_Onion]
|
||||
host = localhost
|
||||
port = 6382
|
||||
db = 9
|
||||
|
||||
[Url]
|
||||
cc_critical = DE
|
||||
|
||||
|
@ -228,3 +235,9 @@ channel = FetchedOnion
|
|||
host = localhost
|
||||
port = 6381
|
||||
db = 0
|
||||
|
||||
[Crawler]
|
||||
activate_crawler = True
|
||||
crawler_depth_limit = 1
|
||||
splash_url_onion = http://127.0.0.1
|
||||
splash_onion_port = 8050-8052
|
||||
|
|
|
@ -61,7 +61,7 @@ publish = Redis_Duplicate,Redis_ModuleStats,Redis_alertHandler,Redis_Tags
|
|||
|
||||
[Onion]
|
||||
subscribe = Redis_Onion
|
||||
publish = Redis_ValidOnion,ZMQ_FetchedOnion,Redis_alertHandler,Redis_Tags
|
||||
publish = Redis_ValidOnion,ZMQ_FetchedOnion,Redis_alertHandler,Redis_Tags,Redis_Crawler
|
||||
#publish = Redis_Global,Redis_ValidOnion,ZMQ_FetchedOnion,Redis_alertHandler
|
||||
|
||||
[DumpValidOnion]
|
||||
|
@ -136,3 +136,8 @@ publish = Redis_Duplicate,Redis_alertHandler,Redis_Tags
|
|||
[submit_paste]
|
||||
subscribe = Redis
|
||||
publish = Redis_Mixer
|
||||
|
||||
[Crawler]
|
||||
subscribe = Redis_Crawler
|
||||
publish = Redis_Mixer,Redis_Tags
|
||||
|
||||
|
|
|
@ -0,0 +1,242 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
import os
|
||||
import sys
|
||||
import gzip
|
||||
import base64
|
||||
import uuid
|
||||
import datetime
|
||||
import base64
|
||||
import redis
|
||||
import json
|
||||
|
||||
from scrapy.spidermiddlewares.httperror import HttpError
|
||||
from twisted.internet.error import DNSLookupError
|
||||
from twisted.internet.error import TimeoutError
|
||||
|
||||
from scrapy import Spider
|
||||
from scrapy.linkextractors import LinkExtractor
|
||||
from scrapy.crawler import CrawlerProcess, Crawler
|
||||
|
||||
from scrapy_splash import SplashRequest
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
from Helper import Process
|
||||
|
||||
class TorSplashCrawler():
|
||||
|
||||
def __init__(self, splash_url, crawler_depth_limit):
|
||||
self.process = CrawlerProcess({'LOG_ENABLED': False})
|
||||
self.crawler = Crawler(self.TorSplashSpider, {
|
||||
'USER_AGENT': 'Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Firefox/24.0',
|
||||
'SPLASH_URL': splash_url,
|
||||
'ROBOTSTXT_OBEY': False,
|
||||
'DOWNLOADER_MIDDLEWARES': {'scrapy_splash.SplashCookiesMiddleware': 723,
|
||||
'scrapy_splash.SplashMiddleware': 725,
|
||||
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
|
||||
},
|
||||
'SPIDER_MIDDLEWARES': {'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,},
|
||||
'DUPEFILTER_CLASS': 'scrapy_splash.SplashAwareDupeFilter',
|
||||
'HTTPERROR_ALLOW_ALL': True,
|
||||
'DEPTH_LIMIT': crawler_depth_limit
|
||||
})
|
||||
|
||||
def crawl(self, type, url, domain, original_paste, super_father):
|
||||
self.process.crawl(self.crawler, type=type, url=url, domain=domain,original_paste=original_paste, super_father=super_father)
|
||||
self.process.start()
|
||||
|
||||
class TorSplashSpider(Spider):
|
||||
name = 'TorSplashSpider'
|
||||
|
||||
def __init__(self, type, url, domain,original_paste, super_father, *args, **kwargs):
|
||||
self.type = type
|
||||
self.original_paste = original_paste
|
||||
self.super_father = super_father
|
||||
self.start_urls = url
|
||||
self.domains = [domain]
|
||||
date = datetime.datetime.now().strftime("%Y/%m/%d")
|
||||
self.full_date = datetime.datetime.now().strftime("%Y%m%d")
|
||||
self.date_month = datetime.datetime.now().strftime("%Y%m")
|
||||
|
||||
config_section = 'Crawler'
|
||||
self.p = Process(config_section)
|
||||
|
||||
self.r_cache = redis.StrictRedis(
|
||||
host=self.p.config.get("Redis_Cache", "host"),
|
||||
port=self.p.config.getint("Redis_Cache", "port"),
|
||||
db=self.p.config.getint("Redis_Cache", "db"),
|
||||
decode_responses=True)
|
||||
|
||||
self.r_serv_log_submit = redis.StrictRedis(
|
||||
host=self.p.config.get("Redis_Log_submit", "host"),
|
||||
port=self.p.config.getint("Redis_Log_submit", "port"),
|
||||
db=self.p.config.getint("Redis_Log_submit", "db"),
|
||||
decode_responses=True)
|
||||
|
||||
self.r_serv_metadata = redis.StrictRedis(
|
||||
host=self.p.config.get("ARDB_Metadata", "host"),
|
||||
port=self.p.config.getint("ARDB_Metadata", "port"),
|
||||
db=self.p.config.getint("ARDB_Metadata", "db"),
|
||||
decode_responses=True)
|
||||
|
||||
self.r_serv_onion = redis.StrictRedis(
|
||||
host=self.p.config.get("ARDB_Onion", "host"),
|
||||
port=self.p.config.getint("ARDB_Onion", "port"),
|
||||
db=self.p.config.getint("ARDB_Onion", "db"),
|
||||
decode_responses=True)
|
||||
|
||||
self.crawler_path = os.path.join(self.p.config.get("Directories", "crawled"), date )
|
||||
|
||||
self.crawled_paste_filemame = os.path.join(os.environ['AIL_HOME'], self.p.config.get("Directories", "pastes"),
|
||||
self.p.config.get("Directories", "crawled"), date )
|
||||
|
||||
self.crawled_screenshot = os.path.join(os.environ['AIL_HOME'], self.p.config.get("Directories", "crawled_screenshot"), date )
|
||||
|
||||
def start_requests(self):
|
||||
yield SplashRequest(
|
||||
self.start_urls,
|
||||
self.parse,
|
||||
#errback=self.errback_catcher,
|
||||
endpoint='render.json',
|
||||
meta={'father': self.original_paste},
|
||||
args={ 'html': 1,
|
||||
'wait': 10,
|
||||
'render_all': 1,
|
||||
'har': 1,
|
||||
'png': 1}
|
||||
)
|
||||
|
||||
def parse(self,response):
|
||||
#print(response.headers)
|
||||
#print(response.status)
|
||||
if response.status == 504:
|
||||
# down ?
|
||||
print('504 detected')
|
||||
elif response.status != 200:
|
||||
print('other response: {}'.format(response.status))
|
||||
#print(error_log)
|
||||
#detect connection to proxy refused
|
||||
error_log = (json.loads(response.body.decode()))
|
||||
if(error_log['info']['text'] == 'Connection to proxy refused'):
|
||||
print('Connection to proxy refused')
|
||||
else:
|
||||
|
||||
UUID = self.domains[0]+str(uuid.uuid4())
|
||||
filename_paste = os.path.join(self.crawled_paste_filemame, UUID)
|
||||
relative_filename_paste = os.path.join(self.crawler_path, UUID)
|
||||
filename_screenshot = os.path.join(self.crawled_screenshot, UUID +'.png')
|
||||
|
||||
# save new paste on disk
|
||||
if self.save_crawled_paste(filename_paste, response.data['html']):
|
||||
|
||||
# add this paste to the domain crawled set # TODO: # FIXME: put this on cache ?
|
||||
#self.r_serv_onion.sadd('temp:crawled_domain_pastes:{}'.format(self.domains[0]), filename_paste)
|
||||
|
||||
self.r_serv_onion.sadd('{}_up:{}'.format(self.type, self.full_date), self.domains[0])
|
||||
self.r_serv_onion.sadd('full_{}_up'.format(self.type), self.domains[0])
|
||||
self.r_serv_onion.sadd('month_{}_up:{}'.format(self.type, self.date_month), self.domains[0])
|
||||
|
||||
# create onion metadata
|
||||
if not self.r_serv_onion.exists('{}_metadata:{}'.format(self.type, self.domains[0])):
|
||||
self.r_serv_onion.hset('{}_metadata:{}'.format(self.type, self.domains[0]), 'first_seen', self.full_date)
|
||||
self.r_serv_onion.hset('{}_metadata:{}'.format(self.type, self.domains[0]), 'last_seen', self.full_date)
|
||||
|
||||
#create paste metadata
|
||||
self.r_serv_metadata.hset('paste_metadata:'+filename_paste, 'super_father', self.super_father)
|
||||
self.r_serv_metadata.hset('paste_metadata:'+filename_paste, 'father', response.meta['father'])
|
||||
self.r_serv_metadata.hset('paste_metadata:'+filename_paste, 'domain', self.domains[0])
|
||||
self.r_serv_metadata.hset('paste_metadata:'+filename_paste, 'real_link', response.url)
|
||||
|
||||
self.r_serv_metadata.sadd('paste_children:'+response.meta['father'], filename_paste)
|
||||
|
||||
dirname = os.path.dirname(filename_screenshot)
|
||||
if not os.path.exists(dirname):
|
||||
os.makedirs(dirname)
|
||||
|
||||
size_screenshot = (len(response.data['png'])*3) /4
|
||||
|
||||
if size_screenshot < 5000000: #bytes
|
||||
with open(filename_screenshot, 'wb') as f:
|
||||
f.write(base64.standard_b64decode(response.data['png'].encode()))
|
||||
|
||||
with open(filename_screenshot+'har.txt', 'wb') as f:
|
||||
f.write(json.dumps(response.data['har']).encode())
|
||||
|
||||
# save external links in set
|
||||
#lext = LinkExtractor(deny_domains=self.domains, unique=True)
|
||||
#for link in lext.extract_links(response):
|
||||
# self.r_serv_onion.sadd('domain_{}_external_links:{}'.format(self.type, self.domains[0]), link.url)
|
||||
# self.r_serv_metadata.sadd('paste_{}_external_links:{}'.format(self.type, filename_paste), link.url)
|
||||
|
||||
le = LinkExtractor(allow_domains=self.domains, unique=True)
|
||||
for link in le.extract_links(response):
|
||||
yield SplashRequest(
|
||||
link.url,
|
||||
self.parse,
|
||||
#errback=self.errback_catcher,
|
||||
endpoint='render.json',
|
||||
meta={'father': relative_filename_paste},
|
||||
args={ 'html': 1,
|
||||
'png': 1,
|
||||
'render_all': 1,
|
||||
'har': 1,
|
||||
'wait': 10}
|
||||
)
|
||||
|
||||
'''
|
||||
def errback_catcher(self, failure):
|
||||
# catch all errback failures,
|
||||
self.logger.error(repr(failure))
|
||||
print('failure')
|
||||
#print(failure)
|
||||
print(failure.type)
|
||||
#print(failure.request.meta['item'])
|
||||
|
||||
#if isinstance(failure.value, HttpError):
|
||||
if failure.check(HttpError):
|
||||
# you can get the response
|
||||
response = failure.value.response
|
||||
print('HttpError')
|
||||
self.logger.error('HttpError on %s', response.url)
|
||||
|
||||
#elif isinstance(failure.value, DNSLookupError):
|
||||
elif failure.check(DNSLookupError):
|
||||
# this is the original request
|
||||
request = failure.request
|
||||
print(DNSLookupError)
|
||||
print('DNSLookupError')
|
||||
self.logger.error('DNSLookupError on %s', request.url)
|
||||
|
||||
#elif isinstance(failure.value, TimeoutError):
|
||||
elif failure.check(TimeoutError):
|
||||
request = failure.request
|
||||
print('TimeoutError')
|
||||
print(TimeoutError)
|
||||
self.logger.error('TimeoutError on %s', request.url)
|
||||
'''
|
||||
|
||||
def save_crawled_paste(self, filename, content):
|
||||
|
||||
if os.path.isfile(filename):
|
||||
print('File: {} already exist in submitted pastes'.format(filename))
|
||||
return False
|
||||
|
||||
try:
|
||||
gzipencoded = gzip.compress(content.encode())
|
||||
gzip64encoded = base64.standard_b64encode(gzipencoded).decode()
|
||||
except:
|
||||
print("file error: {}".format(filename))
|
||||
return False
|
||||
|
||||
# send paste to Global
|
||||
relay_message = "{0} {1}".format(filename, gzip64encoded)
|
||||
self.p.populate_set_out(relay_message, 'Mixer')
|
||||
|
||||
# increase nb of paste by feeder name
|
||||
self.r_serv_log_submit.hincrby("mixer_cache:list_feeder", "crawler", 1)
|
||||
|
||||
# tag crawled paste
|
||||
msg = 'infoleak:submission="crawler";{}'.format(filename)
|
||||
self.p.populate_set_out(msg, 'Tags')
|
||||
return True
|
|
@ -0,0 +1 @@
|
|||
www.facebookcorewwwi.onion
|
|
@ -0,0 +1,43 @@
|
|||
#!/bin/bash
|
||||
|
||||
usage() { echo "Usage: sudo $0 [-f <config_absolute_path>] [-p <port_start>] [-n <number_of_splash_servers>]" 1>&2;
|
||||
echo " -f: absolute path to splash docker proxy-profiles directory (used for proxy configuration)";
|
||||
echo " -p: number of the first splash server port number. This number is incremented for the others splash server";
|
||||
echo " -n: number of splash servers to start";
|
||||
echo "";
|
||||
echo "example:";
|
||||
echo "sudo ./launch_splash_crawler.sh -f /home/my_user/AIL-framework/configs/docker/splash_onion/etc/splash/proxy-profiles/ -p 8050 -n 3";
|
||||
exit 1;
|
||||
}
|
||||
|
||||
while getopts ":p:f:n:" o; do
|
||||
case "${o}" in
|
||||
p)
|
||||
p=${OPTARG}
|
||||
;;
|
||||
f)
|
||||
f=${OPTARG}
|
||||
;;
|
||||
n)
|
||||
n=${OPTARG}
|
||||
;;
|
||||
*)
|
||||
usage
|
||||
;;
|
||||
esac
|
||||
done
|
||||
shift $((OPTIND-1))
|
||||
|
||||
if [ -z "${p}" ] || [ -z "${f}" ] || [ -z "${n}" ]; then
|
||||
usage;
|
||||
fi
|
||||
|
||||
screen -dmS "Docker_Splash"
|
||||
sleep 0.1
|
||||
|
||||
for ((i=0;i<=$((${n} - 1));i++)); do
|
||||
port_number=$((${p} + $i))
|
||||
screen -S "Docker_Splash" -X screen -t "docker_splash:$port_number" bash -c 'sudo docker run -p '$port_number':8050 --cpus=1 --memory=4.5G -v '$f':/etc/splash/proxy-profiles/ --net="bridge" scrapinghub/splash; read x'
|
||||
sleep 0.1
|
||||
echo " Splash server launched on port $port_number"
|
||||
done
|
|
@ -0,0 +1,34 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
import os
|
||||
import sys
|
||||
import configparser
|
||||
from TorSplashCrawler import TorSplashCrawler
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
||||
if len(sys.argv) != 7:
|
||||
print('usage:', 'tor_crawler.py', 'splash_url', 'type', 'url', 'domain', 'paste', 'super_father')
|
||||
exit(1)
|
||||
|
||||
configfile = os.path.join(os.environ['AIL_BIN'], 'packages/config.cfg')
|
||||
if not os.path.exists(configfile):
|
||||
raise Exception('Unable to find the configuration file. \
|
||||
Did you set environment variables? \
|
||||
Or activate the virtualenv.')
|
||||
|
||||
cfg = configparser.ConfigParser()
|
||||
cfg.read(configfile)
|
||||
|
||||
splash_url = sys.argv[1]
|
||||
type = sys.argv[2]
|
||||
crawler_depth_limit = cfg.getint("Crawler", "crawler_depth_limit")
|
||||
|
||||
url = sys.argv[3]
|
||||
domain = sys.argv[4]
|
||||
paste = sys.argv[5]
|
||||
super_father = sys.argv[6]
|
||||
|
||||
crawler = TorSplashCrawler(splash_url, crawler_depth_limit)
|
||||
crawler.crawl(type, url, domain, paste, super_father)
|
|
@ -0,0 +1,4 @@
|
|||
[proxy]
|
||||
host=172.17.0.1
|
||||
port=9050
|
||||
type=SOCKS5
|
|
@ -0,0 +1,56 @@
|
|||
#!/bin/bash
|
||||
|
||||
install_docker() {
|
||||
# install docker
|
||||
sudo apt install docker.io;
|
||||
|
||||
# pull splah docker
|
||||
sudo docker pull scrapinghub/splash;
|
||||
}
|
||||
|
||||
install_python_requirement() {
|
||||
. ./AILENV/bin/activate;
|
||||
pip3 install -U -r crawler_requirements.txt;
|
||||
}
|
||||
|
||||
install_all() {
|
||||
read -p "Do you want to install docker? (use local splash server) [y/n] " -n 1 -r
|
||||
echo # (optional) move to a new line
|
||||
if [[ $REPLY =~ ^[Yy]$ ]]
|
||||
then
|
||||
install_docker;
|
||||
fi
|
||||
install_python_requirement;
|
||||
}
|
||||
|
||||
usage() {
|
||||
echo "Usage: crawler_hidden_services_install.sh [-y | -n]" 1>&2;
|
||||
echo " -y: install docker"
|
||||
echo " -n: don't install docker"
|
||||
echo ""
|
||||
echo "example:"
|
||||
echo "crawler_hidden_services_install.sh -y"
|
||||
exit 1;
|
||||
}
|
||||
|
||||
if [[ $1 == "" ]]; then
|
||||
install_all;
|
||||
exit;
|
||||
else
|
||||
key="$1"
|
||||
case $key in
|
||||
"")
|
||||
install_all;
|
||||
;;
|
||||
-y|--yes)
|
||||
install_docker;
|
||||
install_python_requirement;
|
||||
;;
|
||||
-n|--no)
|
||||
install_python_requirement;
|
||||
;;
|
||||
*) # unknown option
|
||||
usage;
|
||||
;;
|
||||
esac
|
||||
fi
|
|
@ -0,0 +1,2 @@
|
|||
scrapy
|
||||
scrapy-splash
|
|
@ -0,0 +1,4 @@
|
|||
[proxy]
|
||||
host=localhost
|
||||
port=9050
|
||||
type=SOCKS5
|
|
@ -1 +1,2 @@
|
|||
onion
|
||||
i2p
|
||||
|
|
|
@ -48,6 +48,9 @@ except IOError:
|
|||
f = open('templates/ignored_modules.txt', 'w')
|
||||
f.close()
|
||||
|
||||
activate_crawler = cfg.get("Crawler", "activate_crawler")
|
||||
if activate_crawler != 'True':
|
||||
toIgnoreModule.add('hiddenServices')
|
||||
|
||||
# Dynamically import routes and functions from modules
|
||||
# Also, prepare header.html
|
||||
|
|
|
@ -96,6 +96,12 @@ r_serv_statistics = redis.StrictRedis(
|
|||
db=cfg.getint("ARDB_Statistics", "db"),
|
||||
decode_responses=True)
|
||||
|
||||
r_serv_onion = redis.StrictRedis(
|
||||
host=cfg.get("ARDB_Onion", "host"),
|
||||
port=cfg.getint("ARDB_Onion", "port"),
|
||||
db=cfg.getint("ARDB_Onion", "db"),
|
||||
decode_responses=True)
|
||||
|
||||
|
||||
sys.path.append('../../configs/keys')
|
||||
# MISP #
|
||||
|
@ -150,6 +156,7 @@ bootstrap_label = ['primary', 'success', 'danger', 'warning', 'info']
|
|||
UPLOAD_FOLDER = os.path.join(os.environ['AIL_FLASK'], 'submitted')
|
||||
|
||||
PASTES_FOLDER = os.path.join(os.environ['AIL_HOME'], cfg.get("Directories", "pastes"))
|
||||
SCREENSHOT_FOLDER = os.path.join(os.environ['AIL_HOME'], cfg.get("Directories", "crawled_screenshot"))
|
||||
|
||||
max_dashboard_logs = int(cfg.get("Flask", "max_dashboard_logs"))
|
||||
|
||||
|
|
|
@ -0,0 +1,169 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
'''
|
||||
Flask functions and routes for the trending modules page
|
||||
'''
|
||||
import redis
|
||||
import datetime
|
||||
import sys
|
||||
import os
|
||||
from flask import Flask, render_template, jsonify, request, Blueprint
|
||||
|
||||
from Date import Date
|
||||
from HiddenServices import HiddenServices
|
||||
|
||||
# ============ VARIABLES ============
|
||||
import Flask_config
|
||||
|
||||
app = Flask_config.app
|
||||
cfg = Flask_config.cfg
|
||||
baseUrl = Flask_config.baseUrl
|
||||
r_serv_onion = Flask_config.r_serv_onion
|
||||
r_serv_metadata = Flask_config.r_serv_metadata
|
||||
bootstrap_label = Flask_config.bootstrap_label
|
||||
PASTES_FOLDER = Flask_config.PASTES_FOLDER
|
||||
|
||||
hiddenServices = Blueprint('hiddenServices', __name__, template_folder='templates')
|
||||
|
||||
# ============ FUNCTIONS ============
|
||||
def one():
|
||||
return 1
|
||||
|
||||
def get_date_range(num_day):
|
||||
curr_date = datetime.date.today()
|
||||
date = Date( '{}{}{}'.format(str(curr_date.year), str(curr_date.month).zfill(2), str(curr_date.day).zfill(2)) )
|
||||
date_list = []
|
||||
|
||||
for i in range(0, num_day):
|
||||
date_list.append(date.substract_day(i))
|
||||
|
||||
return list(reversed(date_list))
|
||||
|
||||
def unpack_paste_tags(p_tags):
|
||||
l_tags = []
|
||||
for tag in p_tags:
|
||||
complete_tag = tag
|
||||
tag = tag.split('=')
|
||||
if len(tag) > 1:
|
||||
if tag[1] != '':
|
||||
tag = tag[1][1:-1]
|
||||
# no value
|
||||
else:
|
||||
tag = tag[0][1:-1]
|
||||
# use for custom tags
|
||||
else:
|
||||
tag = tag[0]
|
||||
l_tags.append( (tag, complete_tag) )
|
||||
return l_tags
|
||||
|
||||
def get_onion_status(domain, date):
|
||||
if r_serv_onion.sismember('onion_up:'+date , domain):
|
||||
return True
|
||||
else:
|
||||
return False
|
||||
# ============= ROUTES ==============
|
||||
|
||||
@hiddenServices.route("/hiddenServices/", methods=['GET'])
|
||||
def hiddenServices_page():
|
||||
last_onions = r_serv_onion.lrange('last_onion', 0 ,-1)
|
||||
list_onion = []
|
||||
|
||||
now = datetime.datetime.now()
|
||||
date = '{}{}{}'.format(now.strftime("%Y"), now.strftime("%m"), now.strftime("%d"))
|
||||
statDomains = {}
|
||||
statDomains['domains_up'] = r_serv_onion.scard('onion_up:{}'.format(date))
|
||||
statDomains['domains_down'] = r_serv_onion.scard('onion_down:{}'.format(date))
|
||||
statDomains['total'] = statDomains['domains_up'] + statDomains['domains_down']
|
||||
statDomains['domains_queue'] = r_serv_onion.scard('onion_domain_crawler_queue')
|
||||
|
||||
for onion in last_onions:
|
||||
metadata_onion = {}
|
||||
metadata_onion['domain'] = onion
|
||||
metadata_onion['last_check'] = r_serv_onion.hget('onion_metadata:{}'.format(onion), 'last_check')
|
||||
metadata_onion['first_seen'] = r_serv_onion.hget('onion_metadata:{}'.format(onion), 'first_seen')
|
||||
if get_onion_status(onion, metadata_onion['last_check']):
|
||||
metadata_onion['status_text'] = 'UP'
|
||||
metadata_onion['status_color'] = 'Green'
|
||||
metadata_onion['status_icon'] = 'fa-check-circle'
|
||||
else:
|
||||
metadata_onion['status_text'] = 'DOWN'
|
||||
metadata_onion['status_color'] = 'Red'
|
||||
metadata_onion['status_icon'] = 'fa-times-circle'
|
||||
list_onion.append(metadata_onion)
|
||||
|
||||
return render_template("hiddenServices.html", last_onions=list_onion, statDomains=statDomains)
|
||||
|
||||
@hiddenServices.route("/hiddenServices/onion_domain", methods=['GET'])
|
||||
def onion_domain():
|
||||
onion_domain = request.args.get('onion_domain')
|
||||
if onion_domain is None or not r_serv_onion.exists('onion_metadata:{}'.format(onion_domain)):
|
||||
return '404'
|
||||
# # TODO: FIXME return 404
|
||||
|
||||
last_check = r_serv_onion.hget('onion_metadata:{}'.format(onion_domain), 'last_check')
|
||||
last_check = '{}/{}/{}'.format(last_check[0:4], last_check[4:6], last_check[6:8])
|
||||
first_seen = r_serv_onion.hget('onion_metadata:{}'.format(onion_domain), 'first_seen')
|
||||
first_seen = '{}/{}/{}'.format(first_seen[0:4], first_seen[4:6], first_seen[6:8])
|
||||
origin_paste = r_serv_onion.hget('onion_metadata:{}'.format(onion_domain), 'paste_parent')
|
||||
|
||||
h = HiddenServices(onion_domain, 'onion')
|
||||
l_pastes = h.get_last_crawled_pastes()
|
||||
if l_pastes:
|
||||
status = True
|
||||
else:
|
||||
status = False
|
||||
screenshot = h.get_domain_random_screenshot(l_pastes)
|
||||
if screenshot:
|
||||
screenshot = screenshot[0]
|
||||
else:
|
||||
screenshot = 'None'
|
||||
|
||||
domain_tags = h.get_domain_tags()
|
||||
|
||||
origin_paste_name = h.get_origin_paste_name()
|
||||
origin_paste_tags = unpack_paste_tags(r_serv_metadata.smembers('tag:{}'.format(origin_paste)))
|
||||
paste_tags = []
|
||||
path_name = []
|
||||
for path in l_pastes:
|
||||
path_name.append(path.replace(PASTES_FOLDER+'/', ''))
|
||||
p_tags = r_serv_metadata.smembers('tag:'+path)
|
||||
paste_tags.append(unpack_paste_tags(p_tags))
|
||||
|
||||
return render_template("showDomain.html", domain=onion_domain, last_check=last_check, first_seen=first_seen,
|
||||
l_pastes=l_pastes, paste_tags=paste_tags, bootstrap_label=bootstrap_label,
|
||||
path_name=path_name, origin_paste_tags=origin_paste_tags, status=status,
|
||||
origin_paste=origin_paste, origin_paste_name=origin_paste_name,
|
||||
domain_tags=domain_tags, screenshot=screenshot)
|
||||
|
||||
@hiddenServices.route("/hiddenServices/onion_son", methods=['GET'])
|
||||
def onion_son():
|
||||
onion_domain = request.args.get('onion_domain')
|
||||
|
||||
h = HiddenServices(onion_domain, 'onion')
|
||||
l_pastes = h.get_last_crawled_pastes()
|
||||
l_son = h.get_domain_son(l_pastes)
|
||||
print(l_son)
|
||||
return 'l_son'
|
||||
|
||||
# ============= JSON ==============
|
||||
@hiddenServices.route("/hiddenServices/domain_crawled_7days_json", methods=['GET'])
|
||||
def domain_crawled_7days_json():
|
||||
type = 'onion'
|
||||
## TODO: # FIXME: 404 error
|
||||
|
||||
date_range = get_date_range(7)
|
||||
json_domain_stats = []
|
||||
#try:
|
||||
for date in date_range:
|
||||
nb_domain_up = r_serv_onion.scard('{}_up:{}'.format(type, date))
|
||||
nb_domain_down = r_serv_onion.scard('{}_up:{}'.format(type, date))
|
||||
date = date[0:4] + '-' + date[4:6] + '-' + date[6:8]
|
||||
json_domain_stats.append({ 'date': date, 'value': int( nb_domain_up ), 'nb_domain_down': int( nb_domain_down )})
|
||||
#except:
|
||||
#return jsonify()
|
||||
|
||||
return jsonify(json_domain_stats)
|
||||
|
||||
# ========= REGISTRATION =========
|
||||
app.register_blueprint(hiddenServices, url_prefix=baseUrl)
|
|
@ -0,0 +1 @@
|
|||
<li id='page-hiddenServices'><a href="{{ url_for('hiddenServices.hiddenServices_page') }}"><i class="fa fa-user-secret"></i> hidden Services </a></li>
|
|
@ -0,0 +1,234 @@
|
|||
<!DOCTYPE html>
|
||||
<html>
|
||||
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
|
||||
<title>Hidden Service - AIL</title>
|
||||
<link rel="icon" href="{{ url_for('static', filename='image/ail-icon.png') }}">
|
||||
|
||||
<!-- Core CSS -->
|
||||
<link href="{{ url_for('static', filename='css/bootstrap.min.css') }}" rel="stylesheet">
|
||||
<link href="{{ url_for('static', filename='font-awesome/css/font-awesome.css') }}" rel="stylesheet">
|
||||
<link href="{{ url_for('static', filename='css/sb-admin-2.css') }}" rel="stylesheet">
|
||||
<!-- JS -->
|
||||
<script language="javascript" src="{{ url_for('static', filename='js/jquery.js')}}"></script>
|
||||
<script language="javascript" src="{{ url_for('static', filename='js/d3.min.js') }}"></script>
|
||||
|
||||
<style>
|
||||
div.tooltip {
|
||||
position: absolute;
|
||||
text-align: center;
|
||||
padding: 2px;
|
||||
font: 12px sans-serif;
|
||||
background: #ebf4fb;
|
||||
border: 2px solid #b7ddf2;
|
||||
border-radius: 8px;
|
||||
pointer-events: none;
|
||||
color: #000000;
|
||||
}
|
||||
.line_graph {
|
||||
fill: none;
|
||||
stroke: steelblue;
|
||||
stroke-width: 2px;
|
||||
stroke-linejoin: round;
|
||||
stroke-linecap: round;
|
||||
stroke-width: 1.5;
|
||||
}
|
||||
</style>
|
||||
|
||||
</head>
|
||||
<body>
|
||||
|
||||
{% include 'navbar.html' %}
|
||||
|
||||
<div id="page-wrapper">
|
||||
|
||||
<div class="row">
|
||||
<div class="col-md-6">
|
||||
|
||||
<div class="panel panel-primary">
|
||||
<div class="panel-heading">ONION</div>
|
||||
<div class="panel-body">
|
||||
<table class="table table-striped">
|
||||
<thead>
|
||||
<tr>
|
||||
<th>Domain</th>
|
||||
<th>First Seen</th>
|
||||
<th>Last Check</th>
|
||||
<th>Status</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
{% for metadata_onion in last_onions %}
|
||||
<tr>
|
||||
<td><a target="_blank" href="{{ url_for('hiddenServices.onion_domain') }}?onion_domain={{ metadata_onion['domain'] }}">{{ metadata_onion['domain'] }}</a></td>
|
||||
<td>{{'{}/{}/{}'.format(metadata_onion['first_seen'][0:4], metadata_onion['first_seen'][4:6], metadata_onion['first_seen'][6:8])}}</td>
|
||||
<td>{{'{}/{}/{}'.format(metadata_onion['last_check'][0:4], metadata_onion['last_check'][4:6], metadata_onion['last_check'][6:8])}}</td>
|
||||
<td><div style="color:{{metadata_onion['status_color']}}; display:inline-block">
|
||||
<i class="fa {{metadata_onion['status_icon']}} fa-2x"></i>
|
||||
{{metadata_onion['status_text']}}
|
||||
</div>
|
||||
</td>
|
||||
</tr>
|
||||
{% endfor %}
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
<div id="graph_line">
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="col-md-6">
|
||||
<div class="panel panel-info">
|
||||
<div class="panel-heading">
|
||||
<i class="fa fa-eye-slash"></i> Domains Crawled Today
|
||||
</div>
|
||||
|
||||
<table class="table table-hover table-striped">
|
||||
<tbody>
|
||||
<tr>
|
||||
<td>
|
||||
<div style="color:Green; display:inline-block">
|
||||
<i class="fa fa-check-circle fa-2x"></i>
|
||||
Domains UP
|
||||
</div>
|
||||
</td>
|
||||
<td>
|
||||
<div style="color:Green; display:inline-block">
|
||||
{{ statDomains['domains_up'] }}
|
||||
</div>
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
<div style="color:Red; display:inline-block">
|
||||
<i class="fa fa-times-circle fa-2x"></i>
|
||||
Domains DOWN
|
||||
</div>
|
||||
</td>
|
||||
<td>
|
||||
<div style="color:Red; display:inline-block">
|
||||
{{ statDomains['domains_down'] }}
|
||||
</div>
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Crawled Domains</td>
|
||||
<td>{{ statDomains['total'] }}</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Domains in Queue</td>
|
||||
<td>{{ statDomains['domains_queue'] }}</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
</div>
|
||||
<!-- /#page-wrapper -->
|
||||
|
||||
<script>
|
||||
var all_graph = {};
|
||||
$(document).ready(function(){
|
||||
activePage = "page-hiddenServices"
|
||||
$("#"+activePage).addClass("active");
|
||||
all_graph.line_chart = create_line_chart('graph_line', "{{ url_for('hiddenServices.domain_crawled_7days_json') }}?type=onion");
|
||||
});
|
||||
$(window).on("resize", function() {
|
||||
all_graph.onResize();
|
||||
});
|
||||
</script>
|
||||
|
||||
<script>
|
||||
|
||||
all_graph.onResize = function () {
|
||||
var aspect = 1000 / 500, all_graph = $("#graph_div");
|
||||
var targetWidth = all_graph.parent().width();
|
||||
all_graph.attr("width", targetWidth);
|
||||
all_graph.attr("height", targetWidth / aspect);
|
||||
}
|
||||
window.all_graph = all_graph;
|
||||
|
||||
function create_line_chart(id, url){
|
||||
var width = 650;
|
||||
var height = Math.round(width / 6);
|
||||
var margin = {top: 20, right: 55, bottom: 50, left: 40};
|
||||
var x = d3.scaleTime().range([0, width]);
|
||||
var y = d3.scaleLinear().rangeRound([height, 0]);
|
||||
var xAxis = d3.axisBottom(x);
|
||||
var yAxis = d3.axisLeft(y);
|
||||
var parseTime = d3.timeParse("%Y-%m-%d");
|
||||
var line = d3.line()
|
||||
.x(function(d) {
|
||||
return x(d.date);
|
||||
})
|
||||
.y(function(d) {
|
||||
return y(d.value);
|
||||
});
|
||||
var svg_line = d3.select('#'+id).append('svg')
|
||||
.attr("id", "graph_div")
|
||||
.attr("width", width + margin.left + margin.right)
|
||||
.attr("height", height + margin.top + margin.bottom)
|
||||
.append('g')
|
||||
.attr('transform', "translate("+ margin.left +","+ margin.top +")");
|
||||
var div = d3.select('body').append('div')
|
||||
.attr('class', 'tooltip')
|
||||
.style('opacity', 0);
|
||||
//add div tooltip
|
||||
d3.json(url)
|
||||
.then(function(data){
|
||||
data.forEach(function(d) {
|
||||
d.date_label = d.date;
|
||||
d.date = parseTime(d.date);
|
||||
d.value = +d.value;
|
||||
});
|
||||
// fit the data
|
||||
x.domain(d3.extent(data, function(d) { return d.date; }));
|
||||
//x.domain(data.map(function (d) { return d.date; })); //E
|
||||
y.domain([0, d3.max(data, function(d){ return d.value ; })]);
|
||||
//line
|
||||
svg_line.append("path")
|
||||
.data([data])
|
||||
.attr("class", "line_graph")
|
||||
.attr("d", line);
|
||||
// add X axis
|
||||
svg_line.append("g")
|
||||
.attr("transform", "translate(0," + height + ")")
|
||||
.call(d3.axisBottom(x))
|
||||
.selectAll("text")
|
||||
.style("text-anchor", "end")
|
||||
.attr("transform", "rotate(-45)" );
|
||||
// Add the Y Axis
|
||||
svg_line.append("g")
|
||||
.call(d3.axisLeft(y));
|
||||
//add a dot circle
|
||||
svg_line.selectAll('dot')
|
||||
.data(data).enter()
|
||||
.append('circle')
|
||||
.attr('r', 2)
|
||||
.attr('cx', function(d) { return x(d.date); })
|
||||
.attr('cy', function(d) { return y(d.value); })
|
||||
.on('mouseover', function(d) {
|
||||
div.transition().style('opacity', .9);
|
||||
div.html('' + d.date_label+ '<br/>' + d.value).style('left', (d3.event.pageX) + 'px')
|
||||
.style("left", (d3.event.pageX) + "px")
|
||||
.style("top", (d3.event.pageY - 28) + "px");
|
||||
})
|
||||
.on('mouseout', function(d)
|
||||
{
|
||||
div.transition().style('opacity', 0);
|
||||
});
|
||||
});
|
||||
}
|
||||
</script>
|
||||
</body>
|
||||
|
||||
</html>
|
|
@ -0,0 +1,213 @@
|
|||
<!DOCTYPE html>
|
||||
<html>
|
||||
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
|
||||
<title>Show Domain - AIL</title>
|
||||
<link rel="icon" href="{{ url_for('static', filename='image/ail-icon.png') }}">
|
||||
|
||||
<!-- Core CSS -->
|
||||
<link href="{{ url_for('static', filename='css/bootstrap.min.css') }}" rel="stylesheet">
|
||||
<link href="{{ url_for('static', filename='font-awesome/css/font-awesome.css') }}" rel="stylesheet">
|
||||
<link href="{{ url_for('static', filename='css/sb-admin-2.css') }}" rel="stylesheet">
|
||||
<link href="{{ url_for('static', filename='css/dygraph_gallery.css') }}" rel="stylesheet" type="text/css" />
|
||||
<!-- JS -->
|
||||
<script type="text/javascript" src="{{ url_for('static', filename='js/dygraph-combined.js') }}"></script>
|
||||
<script language="javascript" src="{{ url_for('static', filename='js/jquery.js')}}"></script>
|
||||
<script src="{{ url_for('static', filename='js/jquery.dataTables.min.js')}}"></script>
|
||||
<script src="{{ url_for('static', filename='js/dataTables.bootstrap.js')}}"></script>
|
||||
|
||||
<style>
|
||||
.test thead{
|
||||
background: #d91f2d;
|
||||
color: #fff;
|
||||
}
|
||||
</style>
|
||||
|
||||
</head>
|
||||
<body>
|
||||
|
||||
{% include 'navbar.html' %}
|
||||
|
||||
<div id="page-wrapper">
|
||||
|
||||
<div class="row">
|
||||
|
||||
<div class="col-md-6">
|
||||
<div class="row">
|
||||
<div class="panel panel-info">
|
||||
<div class="panel-heading">
|
||||
{% if status %}
|
||||
<div class="pull-right" style="color:Green;">
|
||||
<i class="fa fa-check-circle fa-2x"></i>
|
||||
UP
|
||||
</div>
|
||||
{% else %}
|
||||
<div class="pull-right" style="color:Red;">
|
||||
<i class="fa fa-times-circle fa-2x"></i>
|
||||
DOWN
|
||||
</div>
|
||||
{% endif %}
|
||||
<h3>{{ domain }} :</h3>
|
||||
<ul class="list-group">
|
||||
<li class="list-group-item">
|
||||
|
||||
<table class="table table-condensed">
|
||||
<thead>
|
||||
<tr>
|
||||
<th>First Seen</th>
|
||||
<th>Last Check</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr>
|
||||
<td class="panelText"><a href="#">{{ first_seen }}</a></td>
|
||||
<td class="panelText"><a href="#">{{ last_check }}</a></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
</li>
|
||||
<li class="list-group-item">
|
||||
Origin Paste: <a target="_blank" href="{{ url_for('showsavedpastes.showsavedpaste', paste=origin_paste) }}" />{{ origin_paste_name }}</a>
|
||||
<div>
|
||||
{% for tag in origin_paste_tags %}
|
||||
<a href="{{ url_for('Tags.get_tagged_paste') }}?ltags={{ tag[1] }}">
|
||||
<span class="label label-{{ bootstrap_label[loop.index0 % 5] }} pull-left">{{ tag[0] }}</span>
|
||||
</a>
|
||||
{% endfor %}
|
||||
<br>
|
||||
</div>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
<div>
|
||||
{% for tag in domain_tags %}
|
||||
<a href="{{ url_for('Tags.get_tagged_paste') }}?ltags={{ tag }}">
|
||||
<span class="label label-{{ bootstrap_label[loop.index0 % 5] }} pull-left">{{ tag }} <i>{{ domain_tags[tag] }}</i></span>
|
||||
</a>
|
||||
{% endfor %}
|
||||
<br>
|
||||
<br>
|
||||
</div>
|
||||
|
||||
<table class="test table table-striped table-bordered table-hover table-responsive " id="myTable_">
|
||||
<thead>
|
||||
<tr>
|
||||
<th style="max-width: 800px;">Crawled Pastes</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
|
||||
{% for path in l_pastes %}
|
||||
<tr>
|
||||
<td><a target="_blank" href="{{ url_for('showsavedpastes.showsavedpaste') }}?paste={{path}}">{{ path_name[loop.index0] }}</a>
|
||||
<div>
|
||||
{% for tag in paste_tags[loop.index0] %}
|
||||
<a href="{{ url_for('Tags.get_tagged_paste') }}?ltags={{ tag[1] }}">
|
||||
<span class="label label-{{ bootstrap_label[loop.index0 % 5] }} pull-left">{{ tag[0] }}</span>
|
||||
</a>
|
||||
{% endfor %}
|
||||
</div>
|
||||
</td>
|
||||
</tr>
|
||||
{% endfor %}
|
||||
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="col-md-6">
|
||||
<div class="panel panel-info" style="text-align:center;">
|
||||
<div class="panel-heading">
|
||||
<div class="row">
|
||||
<div class="col-md-8">
|
||||
<input class="center" id="blocks" type="range" min="1" max="50" value="13">
|
||||
</div>
|
||||
<div class="col-md-4">
|
||||
<button class="btn btn-primary btn-tags" onclick="blocks.value=50;pixelate();">
|
||||
<span class="glyphicon glyphicon-zoom-in"></span>
|
||||
<span class="label-icon">Full resolution</span>
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
<canvas id="canvas" style="width:100%;"></canvas>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
</div>
|
||||
<!-- /#page-wrapper -->
|
||||
|
||||
<script>
|
||||
$(document).ready(function(){
|
||||
activePage = "page-hiddenServices"
|
||||
$("#"+activePage).addClass("active");
|
||||
table = $('#myTable_').DataTable(
|
||||
{
|
||||
"aLengthMenu": [[5, 10, 15, 20, -1], [5, 10, 15, 20, "All"]],
|
||||
"iDisplayLength": 5,
|
||||
"order": [[ 0, "desc" ]]
|
||||
}
|
||||
);
|
||||
});
|
||||
</script>
|
||||
|
||||
<script>
|
||||
var ctx = canvas.getContext('2d'), img = new Image();
|
||||
|
||||
/// turn off image smoothing
|
||||
ctx.webkitImageSmoothingEnabled = false;
|
||||
ctx.imageSmoothingEnabled = false;
|
||||
|
||||
img.onload = pixelate;
|
||||
img.addEventListener("error", img_error);
|
||||
var draw_img = false;
|
||||
|
||||
img.src = "{{ url_for('showsavedpastes.screenshot', filename=screenshot) }}";
|
||||
|
||||
function pixelate() {
|
||||
|
||||
/// use slider value
|
||||
if( blocks.value == 50 ){
|
||||
size = 1;
|
||||
} else {
|
||||
var size = (blocks.value) * 0.01;
|
||||
}
|
||||
|
||||
canvas.width = img.width;
|
||||
canvas.height = img.height;
|
||||
|
||||
/// cache scaled width and height
|
||||
w = canvas.width * size;
|
||||
h = canvas.height * size;
|
||||
|
||||
/// draw original image to the scaled size
|
||||
ctx.drawImage(img, 0, 0, w, h);
|
||||
|
||||
/// pixelated
|
||||
ctx.drawImage(canvas, 0, 0, w, h, 0, 0, canvas.width, canvas.height);
|
||||
|
||||
}
|
||||
|
||||
blocks.addEventListener('change', pixelate, false);
|
||||
|
||||
function img_error() {
|
||||
img.onerror=null;
|
||||
img.src="{{ url_for('static', filename='image/AIL.png') }}";
|
||||
blocks.value = 50;
|
||||
pixelate;
|
||||
}
|
||||
</script>
|
||||
|
||||
</body>
|
||||
|
||||
</html>
|
|
@ -8,7 +8,7 @@ import redis
|
|||
import json
|
||||
import os
|
||||
import flask
|
||||
from flask import Flask, render_template, jsonify, request, Blueprint, make_response, redirect, url_for, Response
|
||||
from flask import Flask, render_template, jsonify, request, Blueprint, make_response, Response, send_from_directory, redirect, url_for
|
||||
import difflib
|
||||
import ssdeep
|
||||
|
||||
|
@ -25,6 +25,7 @@ r_serv_pasteName = Flask_config.r_serv_pasteName
|
|||
r_serv_metadata = Flask_config.r_serv_metadata
|
||||
r_serv_tags = Flask_config.r_serv_tags
|
||||
r_serv_statistics = Flask_config.r_serv_statistics
|
||||
r_serv_onion = Flask_config.r_serv_onion
|
||||
max_preview_char = Flask_config.max_preview_char
|
||||
max_preview_modal = Flask_config.max_preview_modal
|
||||
DiffMaxLineLength = Flask_config.DiffMaxLineLength
|
||||
|
@ -33,6 +34,7 @@ misp_event_url = Flask_config.misp_event_url
|
|||
hive_case_url = Flask_config.hive_case_url
|
||||
vt_enabled = Flask_config.vt_enabled
|
||||
PASTES_FOLDER = Flask_config.PASTES_FOLDER
|
||||
SCREENSHOT_FOLDER = Flask_config.SCREENSHOT_FOLDER
|
||||
|
||||
showsavedpastes = Blueprint('showsavedpastes', __name__, template_folder='templates')
|
||||
|
||||
|
@ -41,6 +43,8 @@ showsavedpastes = Blueprint('showsavedpastes', __name__, template_folder='templa
|
|||
def showpaste(content_range, requested_path):
|
||||
if PASTES_FOLDER not in requested_path:
|
||||
requested_path = os.path.join(PASTES_FOLDER, requested_path)
|
||||
# remove old full path
|
||||
#requested_path = requested_path.replace(PASTES_FOLDER, '')
|
||||
# escape directory transversal
|
||||
if os.path.commonprefix((os.path.realpath(requested_path),PASTES_FOLDER)) != PASTES_FOLDER:
|
||||
return 'path transversal detected'
|
||||
|
@ -175,6 +179,16 @@ def showpaste(content_range, requested_path):
|
|||
|
||||
l_64.append( (file_icon, estimated_type, hash, saved_path, nb_in_file, b64_vt, b64_vt_link, b64_vt_report) )
|
||||
|
||||
crawler_metadata = {}
|
||||
if 'infoleak:submission="crawler"' in l_tags:
|
||||
crawler_metadata['get_metadata'] = True
|
||||
crawler_metadata['domain'] = r_serv_metadata.hget('paste_metadata:'+requested_path, 'domain')
|
||||
crawler_metadata['paste_father'] = r_serv_metadata.hget('paste_metadata:'+requested_path, 'father')
|
||||
crawler_metadata['real_link'] = r_serv_metadata.hget('paste_metadata:'+requested_path,'real_link')
|
||||
crawler_metadata['screenshot'] = paste.get_p_rel_path()
|
||||
else:
|
||||
crawler_metadata['get_metadata'] = False
|
||||
|
||||
if Flask_config.pymisp is False:
|
||||
misp = False
|
||||
else:
|
||||
|
@ -202,6 +216,7 @@ def showpaste(content_range, requested_path):
|
|||
hive_url = hive_case_url.replace('id_here', hive_case)
|
||||
|
||||
return render_template("show_saved_paste.html", date=p_date, bootstrap_label=bootstrap_label, active_taxonomies=active_taxonomies, active_galaxies=active_galaxies, list_tags=list_tags, source=p_source, encoding=p_encoding, language=p_language, size=p_size, mime=p_mime, lineinfo=p_lineinfo, content=p_content, initsize=len(p_content), duplicate_list = p_duplicate_list, simil_list = p_simil_list, hashtype_list = p_hashtype_list, date_list=p_date_list,
|
||||
crawler_metadata=crawler_metadata,
|
||||
l_64=l_64, vt_enabled=vt_enabled, misp=misp, hive=hive, misp_eventid=misp_eventid, misp_url=misp_url, hive_caseid=hive_caseid, hive_url=hive_url)
|
||||
|
||||
# ============ ROUTES ============
|
||||
|
@ -250,6 +265,10 @@ def showDiff():
|
|||
the_html = htmlD.make_file(lines1, lines2)
|
||||
return the_html
|
||||
|
||||
@showsavedpastes.route('/screenshot/<path:filename>')
|
||||
def screenshot(filename):
|
||||
return send_from_directory(SCREENSHOT_FOLDER, filename+'.png', as_attachment=True)
|
||||
|
||||
@showsavedpastes.route('/send_file_to_vt/', methods=['POST'])
|
||||
def send_file_to_vt():
|
||||
b64_path = request.form['b64_path']
|
||||
|
|
|
@ -351,7 +351,6 @@
|
|||
<div class="panel-body" id="panel-body">
|
||||
|
||||
{% if duplicate_list|length == 0 %}
|
||||
<h3> No Duplicate </h3>
|
||||
{% else %}
|
||||
<h3> Duplicate list: </h3>
|
||||
<table id="tableDup" class="table table-striped table-bordered">
|
||||
|
@ -420,6 +419,57 @@
|
|||
</table>
|
||||
{% endif %}
|
||||
|
||||
{% if crawler_metadata['get_metadata'] %}
|
||||
<div class="row">
|
||||
|
||||
<div class="col-md-5">
|
||||
<div class="row">
|
||||
<div class="panel panel-info">
|
||||
<div class="panel-heading">
|
||||
<i class="fa fa-eye-slash"></i> Crawled Paste
|
||||
</div>
|
||||
|
||||
<table class="table table-hover table-striped">
|
||||
<tbody>
|
||||
<tr>
|
||||
<td>Domain</td>
|
||||
<td><a target="_blank" href="{{ url_for('hiddenServices.onion_domain') }}?onion_domain={{ crawler_metadata['domain'] }}" id='onion_domain'>{{ crawler_metadata['domain'] }}</a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Father</td>
|
||||
<td><a target="_blank" href="{{ url_for('showsavedpastes.showsavedpaste') }}?paste={{ crawler_metadata['paste_father'] }}" id='paste_father'>{{ crawler_metadata['paste_father'] }}</a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Source link</td>
|
||||
<td>{{ crawler_metadata['real_link'] }}</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="col-md-7">
|
||||
<div class="panel panel-info" style="text-align:center;">
|
||||
<div class="panel-heading">
|
||||
<div class="row">
|
||||
<div class="col-md-8">
|
||||
<input class="center" id="blocks" type="range" min="1" max="50" value="13">
|
||||
</div>
|
||||
<div class="col-md-4">
|
||||
<button class="btn btn-primary btn-tags" onclick="blocks.value=50;pixelate();">
|
||||
<span class="glyphicon glyphicon-zoom-in"></span>
|
||||
<span class="label-icon">Full resolution</span>
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
<canvas id="canvas" style="width:100%;"></canvas>
|
||||
</div>
|
||||
</div>
|
||||
{% endif %}
|
||||
|
||||
<h3> Content: </h3>
|
||||
<a href="{{ url_for('showsavedpastes.showsavedrawpaste') }}?paste={{ request.args.get('paste') }}" id='raw_paste' > [Raw content] </a>
|
||||
<p data-initsize="{{ initsize }}"> <pre id="paste-holder">{{ content }}</pre></p>
|
||||
|
@ -535,5 +585,51 @@
|
|||
{% endfor %}
|
||||
</script>
|
||||
|
||||
<script>
|
||||
var ctx = canvas.getContext('2d'), img = new Image();
|
||||
|
||||
/// turn off image smoothing
|
||||
ctx.webkitImageSmoothingEnabled = false;
|
||||
ctx.imageSmoothingEnabled = false;
|
||||
|
||||
img.onload = pixelate;
|
||||
img.addEventListener("error", img_error);
|
||||
var draw_img = false;
|
||||
|
||||
img.src = "{{ url_for('showsavedpastes.screenshot', filename=crawler_metadata['screenshot']) }}";
|
||||
|
||||
function pixelate() {
|
||||
|
||||
/// use slider value
|
||||
if( blocks.value == 50 ){
|
||||
size = 1;
|
||||
} else {
|
||||
var size = (blocks.value) * 0.01;
|
||||
}
|
||||
|
||||
canvas.width = img.width;
|
||||
canvas.height = img.height;
|
||||
|
||||
/// cache scaled width and height
|
||||
w = canvas.width * size;
|
||||
h = canvas.height * size;
|
||||
|
||||
/// draw original image to the scaled size
|
||||
ctx.drawImage(img, 0, 0, w, h);
|
||||
|
||||
/// pixelated
|
||||
ctx.drawImage(canvas, 0, 0, w, h, 0, 0, canvas.width, canvas.height);
|
||||
|
||||
}
|
||||
|
||||
function img_error() {
|
||||
img.onerror=null;
|
||||
img.src="{{ url_for('static', filename='image/AIL.png') }}";
|
||||
blocks.value = 50;
|
||||
pixelate;
|
||||
}
|
||||
|
||||
blocks.addEventListener('change', pixelate, false);
|
||||
</script>
|
||||
|
||||
</html>
|
||||
|
|
Loading…
Reference in New Issue