mirror of https://github.com/CIRCL/Circlean
				
				
				
			|  1e494adbe5 | ||
|---|---|---|
| fs | ||
| old | ||
| .gitignore | ||
| README | ||
| TODO | ||
| copy_to_final.sh | ||
| create_user.sh | ||
| prepare_rPI.sh | ||
| prepare_rPI_builder.sh | ||
| proper_chroot.sh | ||
| resize_img.md | ||
| update_builder.sh | ||
		
			
				
				README
			
		
		
			
			
		
	
	Why/What ======== This project aims to be used in case you got an USB key you do not know what is contains but still want to have a look. Work in progress, contributions welcome: The content of the first key will be copyed or/and converted to the second key following theses rules (based on the mime type): - direct copy of plain text files (mime type: text/*) - direct copy of audio files (mime type: audio/*) - direct copy of image files (mime type: image/*) - direct copy of video files (mime type: video/*) - direct copy of example files (mime type: example/*) - direct copy of message files (mime type: message/*) - direct copy of model files (mime type: model/*) - direct copy of multipart files (mime type: multipart/*) - Copying or converting the application files this way (mime type: application/*): - pdf => HTML - msword|vnd.openxmlformats-officedocument.*|vnd.ms-*|vnd.oasis.opendocument* => pdf => html - *xml* => copy as a text file - x-dosexec (executable) => prepend and append DANGEROUS to the filename - x-gzip|x-tar|x-7z-compressed => compressed file - octet-stream => direct copy Compressed files (x-gzip|x-tar|x-7z-compressed): - Unpacking of archives - Recursively run the rules on the unpacked files Usage ===== 0. Power off the device 1. Plug the untrusted key in the top usb slot of the Raspberry Pi 2. Plug your own key in the bottom usb slot Note: This key should be bigger than the original one because the archives will be copyed 3. Optional: connect the HDMI cable to a screen to see what happen 4. Connect the power to the micro USB 5. Wait until you do not see any blinking green light on the board, or if you connected the HDMI cable, check the screen it's slow and can take 30-60 minutes depending on how many document conversions take place 6. Power off the device and disconnect the drives Notes ===== * don't plug in USB devices with a hub because there's no way to tell it which is source and target - its the first drive enumerated (top port) that is the source and the second (bootom port) is the target * don't turn it off without shuting down the system, when grooming is done it shuts down automatically: losing power while it's running can trash the OS on the SD cards because SD cards don't always like dirty shutdowns (ie power loss) * Using a target usb stick that has a status light as long as the device has power is a really useful thing as there the other status lights on the groomer are less than indicative at times: because teh 'OK' led on the rPi toggles on activity it can be off for a long time while processing something and only comes back on when that process finishes - hence why a USB that has some sort of LED activity when jsut plugged in (even if not reading or writing but while the USB port is powered) is helpful in determining when the process is finished - when teh rPI is shutdown, the USB port power is shut off and that LED will also then be off on the USB device * Use a larger target device as all zip files get unpacked and processed onto the target * if you have an hdmi monitor plugged in you can watch what's happening for about 30 mintues until the rPI's power saving's kick in and turn off the monitor * if only one usb stick is present at power up, it doesn't groom and looks like a normal rPi * if you want to ssh into the rPi username is 'pi' password 'raspberry' as per defaults Technical notes =============== * groomer script is in /opt/groomer/ with the other required files * dependancies are libre-office and OpenJRE * and the ip address is 192.168.1.89 * the groomer process is kicked off in /etc/rc.local * the heavy lifting takes place or is dispatched from /opt/groomer/groomer.sh in that script file is what file types get processed (or if not listed there, get ignored) * there are two ways pdf's can get handled -right now they have their text extracted to the target device, the otherway copies it and extracts the text * the pdf text extraction isn't perfect and is the slowest part of it, but should be able to handle unicode stuff and currently doesn't do image extraction from pdf's but could do that too Discussion ========== * however image exports of pdf pages only have the images and no text so it's not like saving each page to a jpg which would be a really handy and safe way of converting pdf's * spread sheets and presentations get converted to pdfs to kill off any embedded macros and it's assumed that it's not producing evil pdf's on export but does nothing to sanitize any embedded links within those documents * for spreadsheets, if they are longer than a page, only a page worth from that sheet is exported right from the middle of the sheet (ie the top and bottom of that sheet will get cut off and only the contents in teh middle exported to pdf) dumb but i figure if you want to go back to the source because it's interesting enough on teh groomed side of it, then you can take the extra precautions * the groomed target only copies "safe" files, and does its best to convert any potentiall unsafe files to a safer format * safe files being one that i know of that can't contain malicious embedded macros or other crap like that, and those than can get converted to something that wont contain code after conversion