2015-11-24 15:03:57 +01:00
|
|
|
Requirements per script
|
|
|
|
=======================
|
|
|
|
|
|
|
|
*Note*: in order to use any of those script, you need to install then (in a virtualenv or system wide)
|
|
|
|
|
|
|
|
```
|
|
|
|
pip install git+https://github.com/ahupp/python-magic.git # we cannot use the PyPi package for now due to a bug
|
|
|
|
python setup.py install # from the root of the repository
|
|
|
|
```
|
|
|
|
|
|
|
|
filecheck.py
|
|
|
|
------------
|
|
|
|
|
|
|
|
*WARNING*: Only works with Python 2.7 (oletools and olefile aren't ported to Python3 for now)
|
|
|
|
|
|
|
|
Requirements by type of document:
|
2015-11-24 18:03:51 +01:00
|
|
|
* Microsoft office: oletools, olefile
|
|
|
|
* OOXML: officedissector
|
|
|
|
* PDF: pdfid
|
|
|
|
* Archives: p7zip-full, p7zip-rar
|
2015-11-24 15:03:57 +01:00
|
|
|
|
|
|
|
|
|
|
|
```
|
2016-05-09 17:38:32 +02:00
|
|
|
sudo apt-get install p7zip-full p7zip-rar libxml2-dev libxslt1-dev
|
2015-11-24 15:03:57 +01:00
|
|
|
pip install lxml officedissector git+https://github.com/ahupp/python-magic.git oletools olefile
|
2016-05-09 17:38:32 +02:00
|
|
|
pip install git+https://github.com/Rafiot/officedissector.git
|
2015-11-24 15:03:57 +01:00
|
|
|
# pdfid is not a package, installing manually
|
|
|
|
wget https://didierstevens.com/files/software/pdfid_v0_2_1.zip
|
|
|
|
unzip pdfid_v0_2_1.zip
|
|
|
|
python setup.py -q install
|
|
|
|
```
|
|
|
|
|
|
|
|
generic.py
|
|
|
|
----------
|
|
|
|
|
|
|
|
Requirements by type of document:
|
2015-11-24 18:03:51 +01:00
|
|
|
* Office and all text files: unoconv, libreoffice
|
|
|
|
* PDF: ghostscript, pdf2htmlEX
|
2015-11-24 15:03:57 +01:00
|
|
|
|
|
|
|
```
|
|
|
|
# required for pdf2htmlEX
|
|
|
|
sudo add-apt-repository ppa:fontforge/fontforge --yes
|
|
|
|
sudo add-apt-repository ppa:coolwanglu/pdf2htmlex --yes
|
|
|
|
sudo apt-get update -qq
|
|
|
|
sudo apt-get install -qq libpoppler-dev libpoppler-private-dev libspiro-dev libcairo-dev libpango1.0-dev libfreetype6-dev libltdl-dev libfontforge-dev python-imaging python-pip firefox xvfb
|
|
|
|
# install pdf2htmlEX
|
|
|
|
git clone https://github.com/coolwanglu/pdf2htmlEX.git
|
|
|
|
pushd pdf2htmlEX
|
|
|
|
cmake -DCMAKE_INSTALL_PREFIX:PATH=/usr -DENABLE_SVG=ON .
|
|
|
|
make
|
|
|
|
sudo make install
|
|
|
|
popd
|
|
|
|
# Installing the rest
|
|
|
|
sudo apt-get install ghostscript p7zip-full p7zip-rar libreoffice unoconv
|
|
|
|
```
|
|
|
|
|
|
|
|
pier9.py
|
|
|
|
--------
|
|
|
|
|
|
|
|
No external dependencies required.
|
|
|
|
|
|
|
|
specific.py
|
|
|
|
-----------
|
|
|
|
|
|
|
|
No external dependencies required.
|