mirror of https://github.com/CIRCL/Circlean
				
				
				
			
		
			
				
	
	
		
			64 lines
		
	
	
		
			3.4 KiB
		
	
	
	
		
			Plaintext
		
	
	
			
		
		
	
	
			64 lines
		
	
	
		
			3.4 KiB
		
	
	
	
		
			Plaintext
		
	
	
| Notes
 | |
| =====
 | |
| 
 | |
| * don't plug in USB devices with a hub because there's no way to tell it which
 | |
|   is source and target - its the first drive enumerated (top port) that is the
 | |
|   source and the second (bootom port) is the target
 | |
| * don't turn it off without shuting down the system, when grooming is done it
 | |
|   shuts down automatically: losing power while it's running can trash the OS
 | |
|   on the SD cards because SD cards don't always like dirty shutdowns (ie power loss)
 | |
| * Using a target usb stick that has a status light as long as the device has
 | |
|   power is a really useful thing as there the other status lights on the groomer
 | |
|   are less than indicative at times: because teh 'OK' led on the rPi toggles on activity
 | |
|   it can be off for a long time while processing something and only comes back
 | |
|   on when that process finishes - hence why a USB that has some sort of LED activity
 | |
|   when jsut plugged in (even if not reading or writing but while the USB port is
 | |
|   powered) is helpful in determining when the process is finished - when
 | |
|   teh rPI is shutdown, the USB port power is shut off and that LED will also
 | |
|   then be off on the USB device
 | |
| * Use a larger target device as all zip files get unpacked and processed onto
 | |
|   the target
 | |
| * if you have an hdmi monitor plugged in you can watch what's happening for about
 | |
|   30 mintues until the rPI's power saving's kick in and turn off the monitor
 | |
| * if only one usb stick is present at power up, it doesn't groom and looks like
 | |
|   a normal rPi
 | |
| * if you want to ssh into the rPi username is 'pi' password 'raspberry' as per defaults
 | |
| 
 | |
| 
 | |
| Technical notes
 | |
| ===============
 | |
| 
 | |
| * groomer script is in /opt/groomer/ with the other required files
 | |
| * dependancies are libre-office and OpenJRE
 | |
| * and the ip address is 192.168.1.89
 | |
| * the groomer process is kicked off in /etc/rc.local
 | |
| * the heavy lifting takes place or is dispatched from /opt/groomer/groomer.sh
 | |
|   in that script file is what file types get processed (or if not listed there,
 | |
|   get ignored)
 | |
| * there are two ways pdf's can get handled -right now they have their text extracted
 | |
|   to the target device, the otherway copies it and extracts the text
 | |
| * the pdf text extraction isn't perfect and is the slowest part of it, but should
 | |
|   be able to handle unicode stuff and currently doesn't do image extraction from
 | |
|   pdf's but could do that too
 | |
| 
 | |
| 
 | |
| Discussion
 | |
| ==========
 | |
| 
 | |
| * however image exports of pdf pages only have the images and no text so it's not
 | |
|   like saving each page to a jpg which would be a really handy and safe way of
 | |
|   converting pdf's
 | |
| * spread sheets and presentations get converted to pdfs to kill off any embedded
 | |
|   macros and it's assumed that it's not producing evil pdf's on export but does
 | |
|   nothing to sanitize any embedded links within those documents
 | |
| * for spreadsheets, if they are longer than a page, only a page worth from that
 | |
|   sheet is exported right from the middle of the sheet (ie the top and bottom of
 | |
|   that sheet will get cut off and only the contents in teh middle exported to pdf)
 | |
|   dumb but i figure if you want to go back to the source because it's interesting
 | |
|   enough on teh groomed side of it, then you can take the extra precautions
 | |
| * the groomed target only copies "safe" files, and does its best to convert any
 | |
|   potentiall unsafe files to a safer format
 | |
| * safe files being one that i know of that can't contain malicious embedded macros
 | |
|   or other crap like that, and those than can get converted to something that wont
 | |
|   contain code after conversion
 |