mirror of https://github.com/CIRCL/Circlean
110 lines
5.2 KiB
Plaintext
110 lines
5.2 KiB
Plaintext
Why/What
|
|
========
|
|
|
|
This project aims to be used in case you got an USB key you do not know what is
|
|
contains but still want to have a look.
|
|
|
|
Work in progress, contributions welcome:
|
|
|
|
The content of the first key will be copyed or/and converted to the second key
|
|
following theses rules (based on the mime type):
|
|
- direct copy of plain text files (mime type: text/*)
|
|
- direct copy of audio files (mime type: audio/*)
|
|
- direct copy of image files (mime type: image/*)
|
|
- direct copy of video files (mime type: video/*)
|
|
- direct copy of example files (mime type: example/*)
|
|
- direct copy of message files (mime type: message/*)
|
|
- direct copy of model files (mime type: model/*)
|
|
- direct copy of multipart files (mime type: multipart/*)
|
|
- Copying or converting the application files this way (mime type: application/*):
|
|
- pdf => HTML
|
|
- msword|vnd.openxmlformats-officedocument.*|vnd.ms-*|vnd.oasis.opendocument* => pdf => html
|
|
- *xml* => copy as a text file
|
|
- x-dosexec (executable) => prepend and append DANGEROUS to the filename
|
|
- x-gzip|x-tar|x-7z-compressed => compressed file
|
|
- octet-stream => direct copy
|
|
|
|
Compressed files (x-gzip|x-tar|x-7z-compressed):
|
|
- Unpacking of archives
|
|
- Recursively run the rules on the unpacked files
|
|
|
|
Usage
|
|
=====
|
|
|
|
0. Power off the device
|
|
1. Plug the untrusted key in the top usb slot of the Raspberry Pi
|
|
2. Plug your own key in the bottom usb slot
|
|
Note: This key should be bigger than the original one because the archives
|
|
will be copyed
|
|
3. Optional: connect the HDMI cable to a screen to see what happen
|
|
4. Connect the power to the micro USB
|
|
5. Wait until you do not see any blinking green light on the board, or if you
|
|
connected the HDMI cable, check the screen
|
|
it's slow and can take 30-60 minutes depending on how many document
|
|
conversions take place
|
|
6. Power off the device and disconnect the drives
|
|
|
|
Notes
|
|
=====
|
|
|
|
* don't plug in USB devices with a hub because there's no way to tell it which
|
|
is source and target - its the first drive enumerated (top port) that is the
|
|
source and the second (bootom port) is the target
|
|
* don't turn it off without shuting down the system, when grooming is done it
|
|
shuts down automatically: losing power while it's running can trash the OS
|
|
on the SD cards because SD cards don't always like dirty shutdowns (ie power loss)
|
|
* Using a target usb stick that has a status light as long as the device has
|
|
power is a really useful thing as there the other status lights on the groomer
|
|
are less than indicative at times: because teh 'OK' led on the rPi toggles on activity
|
|
it can be off for a long time while processing something and only comes back
|
|
on when that process finishes - hence why a USB that has some sort of LED activity
|
|
when jsut plugged in (even if not reading or writing but while the USB port is
|
|
powered) is helpful in determining when the process is finished - when
|
|
teh rPI is shutdown, the USB port power is shut off and that LED will also
|
|
then be off on the USB device
|
|
* Use a larger target device as all zip files get unpacked and processed onto
|
|
the target
|
|
* if you have an hdmi monitor plugged in you can watch what's happening for about
|
|
30 mintues until the rPI's power saving's kick in and turn off the monitor
|
|
* if only one usb stick is present at power up, it doesn't groom and looks like
|
|
a normal rPi
|
|
* if you want to ssh into the rPi username is 'pi' password 'raspberry' as per defaults
|
|
|
|
|
|
Technical notes
|
|
===============
|
|
|
|
* groomer script is in /opt/groomer/ with the other required files
|
|
* dependancies are libre-office and OpenJRE
|
|
* and the ip address is 192.168.1.89
|
|
* the groomer process is kicked off in /etc/rc.local
|
|
* the heavy lifting takes place or is dispatched from /opt/groomer/groomer.sh
|
|
in that script file is what file types get processed (or if not listed there,
|
|
get ignored)
|
|
* there are two ways pdf's can get handled -right now they have their text extracted
|
|
to the target device, the otherway copies it and extracts the text
|
|
* the pdf text extraction isn't perfect and is the slowest part of it, but should
|
|
be able to handle unicode stuff and currently doesn't do image extraction from
|
|
pdf's but could do that too
|
|
|
|
|
|
Discussion
|
|
==========
|
|
|
|
* however image exports of pdf pages only have the images and no text so it's not
|
|
like saving each page to a jpg which would be a really handy and safe way of
|
|
converting pdf's
|
|
* spread sheets and presentations get converted to pdfs to kill off any embedded
|
|
macros and it's assumed that it's not producing evil pdf's on export but does
|
|
nothing to sanitize any embedded links within those documents
|
|
* for spreadsheets, if they are longer than a page, only a page worth from that
|
|
sheet is exported right from the middle of the sheet (ie the top and bottom of
|
|
that sheet will get cut off and only the contents in teh middle exported to pdf)
|
|
dumb but i figure if you want to go back to the source because it's interesting
|
|
enough on teh groomed side of it, then you can take the extra precautions
|
|
* the groomed target only copies "safe" files, and does its best to convert any
|
|
potentiall unsafe files to a safer format
|
|
* safe files being one that i know of that can't contain malicious embedded macros
|
|
or other crap like that, and those than can get converted to something that wont
|
|
contain code after conversion
|