PyCIRCLean/bin/README.md

71 lines
2.1 KiB
Markdown
Raw Normal View History

2016-11-30 19:53:41 +01:00
Example scripts
===============
2015-11-24 15:03:57 +01:00
2016-11-30 19:53:41 +01:00
These are a series of example scripts designed to demonstrate PyCIRCLean's capabilities. Feel free to
adapt or modify any of them to suit your requirements. In order to use any of these scripts, you will need to
install the PyCIRCLean dependencies (preferably in a virtualenv):
2015-11-24 15:03:57 +01:00
```
2016-11-30 19:53:41 +01:00
pip install git+https://github.com/ahupp/python-magic.git # we cannot use the PyPi package for now due to a bug
python setup.py install # from the root of the repository
2015-11-24 15:03:57 +01:00
```
2016-11-30 19:53:41 +01:00
Requirements per script
=======================
2015-11-24 15:03:57 +01:00
filecheck.py
------------
*WARNING*: Only works with Python 2.7 (oletools and olefile aren't ported to Python3 for now)
Requirements by type of document:
2015-11-24 18:03:51 +01:00
* Microsoft office: oletools, olefile
* OOXML: officedissector
* PDF: pdfid
* Archives: p7zip-full, p7zip-rar
2015-11-24 15:03:57 +01:00
```
2016-05-09 17:38:32 +02:00
sudo apt-get install p7zip-full p7zip-rar libxml2-dev libxslt1-dev
2015-11-24 15:03:57 +01:00
pip install lxml officedissector git+https://github.com/ahupp/python-magic.git oletools olefile
2016-05-09 17:38:32 +02:00
pip install git+https://github.com/Rafiot/officedissector.git
2015-11-24 15:03:57 +01:00
# pdfid is not a package, installing manually
wget https://didierstevens.com/files/software/pdfid_v0_2_1.zip
unzip pdfid_v0_2_1.zip
python setup.py -q install
```
generic.py
----------
Requirements by type of document:
2015-11-24 18:03:51 +01:00
* Office and all text files: unoconv, libreoffice
* PDF: ghostscript, pdf2htmlEX
2015-11-24 15:03:57 +01:00
```
# required for pdf2htmlEX
sudo add-apt-repository ppa:fontforge/fontforge --yes
sudo add-apt-repository ppa:coolwanglu/pdf2htmlex --yes
sudo apt-get update -qq
sudo apt-get install -qq libpoppler-dev libpoppler-private-dev libspiro-dev libcairo-dev libpango1.0-dev libfreetype6-dev libltdl-dev libfontforge-dev python-imaging python-pip firefox xvfb
# install pdf2htmlEX
git clone https://github.com/coolwanglu/pdf2htmlEX.git
pushd pdf2htmlEX
cmake -DCMAKE_INSTALL_PREFIX:PATH=/usr -DENABLE_SVG=ON .
make
sudo make install
popd
# Installing the rest
sudo apt-get install ghostscript p7zip-full p7zip-rar libreoffice unoconv
```
pier9.py
--------
No external dependencies required.
specific.py
-----------
No external dependencies required.