Merge pull request #21 from dputtick/dev

Changes for v2.2
2017-10-02 10:43:17 +02:00 · 2017-10-02 10:43:17 +02:00 · d94b3fd1a3
parent 338bd5a018 1af32b5414
commit d94b3fd1a3
23 changed files with 316 additions and 193 deletions
--- a/.travis.yml
+++ b/.travis.yml
@ -5,8 +5,8 @@ python:
    - 3.4
    - 3.5
    - 3.6
-    - "3.6-dev"
+    # - "3.6-dev"
-    - nightly
+    # - nightly
 sudo: required
 # https://docs.travis-ci.com/user/ci-environment/#Virtualization-environments
@ -31,8 +31,7 @@ install:
    - wget https://didierstevens.com/files/software/pdfid_v0_2_1.zip
    - unzip pdfid_v0_2_1.zip
    - pip install -U pip
-    - pip install lxml exifread pillow olefile
+    - pip install lxml exifread pillow olefile oletools
    - pip install git+https://github.com/decalage2/oletools.git
    - pip install git+https://github.com/grierforensics/officedissector.git
    # PyCIRCLean dependencies
    - pip install -r dev-requirements.txt
@ -45,7 +44,8 @@ install:
    - pushd theZoo/malwares/Binaries
    - python unpackall.py
    - popd
-    - mv theZoo/malwares/Binaries/out tests/uncategorized/
+    - mkdir tests/uncategorized/the_zoo/
    - mv theZoo/malwares/Binaries/out tests/uncategorized/the_zoo/
    # Path traversal attacks
    - git clone https://github.com/jwilk/path-traversal-samples
    - pushd path-traversal-samples
@ -56,23 +56,30 @@ install:
    - make
    - popd
    - popd
-    - mv path-traversal-samples/zip/*.zip tests/uncategorized/
+    - mkdir tests/uncategorized/path_traversal_zip/
-    - mv path-traversal-samples/rar/*.rar tests/uncategorized/
+    - mkdir tests/uncategorized/path_traversal_rar/
    - mv path-traversal-samples/zip/*.zip tests/uncategorized/path_traversal_zip
    - mv path-traversal-samples/rar/*.rar tests/uncategorized/path_traversal_rar
    # Office docs
    - git clone https://github.com/eea/odfpy.git
-    - mv odfpy/tests/examples/* tests/uncategorized/
+    - mkdir tests/uncategorized/odfpy/
-    - pushd tests/uncategorized/
+    - mv odfpy/tests/examples/* tests/uncategorized/odfpy/
    - mkdir tests/uncategorized/olefile
    - pushd tests/uncategorized/olefile
    - wget https://bitbucket.org/decalage/olefileio_pl/raw/3073963b640935134ed0da34906fea8e506460be/Tests/images/test-ole-file.doc
    - popd
    - mkdir tests/uncategorized/fraunhofer && pushd tests/uncategorized/fraunhofer
    - wget --no-check-certificate https://www.officedissector.com/corpus/fraunhoferlibrary.zip
    - unzip -o fraunhoferlibrary.zip
    - rm fraunhoferlibrary.zip
    - popd
    # Turned off unzipping 42.zip because it isn't included in the file catalog and archivebomb.zip ends up testing the same thing
    # - pushd tests/dangerous/
    # - 7z x -p42 42.zip
    # - popd
 script:
-    - travis_wait py.test --cov=kittengroomer --cov=bin tests/
+    - travis_wait py.test --cov=kittengroomer --cov=filecheck tests/
 notifications:
    email:
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -1,17 +1,25 @@
 Changelog
 =========
-2.2.0 (in progress)
+2.2.0
 ---
 New features:
 - Filecheck.py configuration information is now conveniently held in a Config
 object instead of in globals
 - New easier to read text-based logger (removed twiggy dependency)
 - Various filetypes in filecheck.py now have improved descriptions for log
- Improved the interface for adding file descriptions to files
+- Improved the PyCIRCLean API interface for adding file descriptions to files
 - New integration test harness using a sample file catalog
 Fixes:
-
+- Switched back to released version of oletools
 - Use set of malicious extensions from Chrome
 - Check for XML Forms Architectures in PDFs
 - Symlinks were being followed
 - Prevent copying MacOS hidden files
 - Fixes for several filetypes that were incorrectly being identified as dangerous
 - Fix support for .rar archives
 - Turn off executable bit on copied files
 2.1.0
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@ -10,6 +10,12 @@ This project is in active development, so any contributions are welcome!
 Setting up a dev environment
 ============================
 * PyCIRCLean requires a working Python 3.3+ install. Before beginning install, it is recommended
 to set up a virtualenv to contain Python dependencies. If you don't have experience managing Python virtualenvs,
 [pyenv](https://github.com/pyenv/pyenv) and [pyenv-virtualenv](https://github.com/pyenv/pyenv-virtualenv) are great
 tools. If you're running MacOS or Windows and would like to contribute to filecheck.py, you will need access to a VM using
 either a cloud service or something like Virtualbox.
 * First, you'll want to get a local copy of PyCIRCLean. If you'd like to make a pull request
 with your changes at some point, you should fork the project on github, and then `git clone`
 your fork.
@ -18,22 +24,17 @@ your fork.
 you can use `pip install dev-requirements.txt` to ensure you download any testing dependencies as well.
 We recommend that you use a virtualenv when installing dependencies. Note: python-magic has a non-Python
 dependency, libmagic. It is typically included in Linux distributions, but you might have to install
-it with homebrew (`brew install libmagic`) on macOS.
+it with homebrew (`brew install libmagic`) on MacOS.
-* Some of the example scripts have additional dependencies for handling various filetypes. You'll have to
+* To install the dependencies for filecheck.py on Linux, you can run `make install` or view the [Makefile](./Makefile) and
-install these seperately if you want to try out the examples or modify them for your own purposes.
+install the dependencies manually. Note that `pip install lxml` can only be run after `apt-get libxml2-dev`.
 Please open an issue if you have suggestions of good alternatives for the libraries we use for file handling
 or if you have an example you'd like to contribute.
 Running the tests
 =================
 * Running the tests is fairly straightforward.
 * First, make sure you've installed the project and testing dependencies.
 * Then, run `python -m pytest` or just `pytest` in the top level directory of the module.
 * Each integration test that runs will generate a timestamped copy of the log for that run
 in the tests/testlogs directory.
 * If you'd like to get information about code coverage, run the tests using
 `pytest --cov=kittengroomer`.
 * You can test with multiple versions of Python if you have them installed
--- a/8
+++ b/8
@ -0,0 +1,8 @@
 dev-install:
 	sudo apt-get update
 	sudo apt-get -y p7zip-full p7zip-rar libxml2-dev libxslt1-dev
 	pip install -r dev-requirements.txt
 	pip install lxml exifread pillow olefile oletools
 	pip install git+https://github.com/grierforensics/officedissector.git
 	wget https://didierstevens.com/files/software/pdfid_v0_2_1.zip
 	unzip pdfid_v0_2_1.zip
--- a/README.md
+++ b/README.md
@ -6,7 +6,9 @@
 PyCIRCLean is the core Python code used by [CIRCLean](https://github.com/CIRCL/Circlean/), an open-source
 USB key and document sanitizer created by [CIRCL](https://www.circl.lu/). This module has been separated from the
 device-specific scripts and can be used for dedicated security applications to sanitize documents from hostile environments
-to trusted environments. PyCIRCLean is currently Python 3.3+ compatible.
+to trusted environments. PyCIRCLean is currently Python 3.3+ compatible. Also, while [kittengroomer](./kittengroomer) can
 run on any platform supported by python-magic/libmagic, [filecheck.py](./filecheck/filecheck.py) has some dependencies that
 are Linux-only, and running the full test suite will require access to a Linux box or VM.
 # Installation
@ -27,11 +29,11 @@ PyCIRCLean is designed to be extended to cover specific checking
 and sanitization workflows in different organizations such as industrial
 environments or restricted/classified ICT environments. A series of practical examples utilizing PyCIRCLean can be found
 in the [./examples](./examples) directory. Note: for commits beyond version 2.2.0 these
-examples are not guaranteed to work with the PyCIRCLean API. Please check [helpers.py](./kittengroomer/helpers.py) or
+examples are out of date and not guaranteed to work with the PyCIRCLean API. Please check [helpers.py](./kittengroomer/
-[filecheck.py](./bin/filecheck.py) to see the new API interface.
+helpers.py) or [filecheck.py](./filecheck/filecheck.py) to see the new API interface.
-The following simple example using PyCIRCLean will only copy files with a .conf extension matching the 'text/plain' MIME
+The following simple example using PyCIRCLean will only copy files with a .conf extension matching the 'text/plain'
-type. If any other file is found in the source directory, the files won't be copied to the destination directory.
+mimetype. If any other file is found in the source directory, the files won't be copied to the destination directory.
 ~~~python
 #!/usr/bin/env python
@ -53,8 +55,6 @@ class FileSpec(FileBase):
        """Init file object, set the extension."""
        super(FileSpec, self).__init__(src_path, dst_path)
        self.valid_files = {}
        a, self.extension = os.path.splitext(self.src_path)
        self.mimetype = magic.from_file(self.src_path, mime=True).decode("utf-8")
        # The initial version will only accept the file extensions/mimetypes listed here.
        self.valid_files.update(Config.configfiles)
@ -69,18 +69,10 @@ class FileSpec(FileBase):
            # Unexpected mimetype => disallowed
            valid = False
            compare_mime = 'Mime: {} - Expected: {}'.format(self.cur_file.mimetype, expected_mime)
        self.add_log_details('valid', valid)
        if valid:
            self.cur_file.log_string = 'Extension: {} - MimeType: {}'.format(self.cur_file.extension, self.cur_file.mimetype)
        else:
            self.should_copy = False
            if compare_ext is not None:
                self.add_log_string(compare_ext)
            else:
                self.add_log_string(compare_mime)
        if self.should_copy:
            self.safe_copy()
        self.write_log()
 class KittenGroomerSpec(KittenGroomerBase):
@ -97,7 +89,7 @@ class KittenGroomerSpec(KittenGroomerBase):
        """Main function doing the processing."""
        to_copy = []
        error = []
-        for srcpath in self._list_all_files(self.src_root_dir):
+        for srcpath in self.list_all_files(self.src_root_dir):
            dstpath = srcpath.replace(self.src_root_dir, self.dst_root_dir)
            cur_file = FileSpec(srcpath, dstpath)
            cur_file.check()
@ -110,7 +102,7 @@ if __name__ == '__main__':
 # How to contribute
-We welcome contributions (including bug fixes, new example file processing
+We welcome contributions (including bug fixes and new example file processing
 workflows) via pull requests. We are particularly interested in any new workflows
 that can be used to improve security in different organizations. If you see any
 potential enhancements required to support your sanitization workflow, please feel
--- a/dev-requirements.txt
+++ b/dev-requirements.txt
@ -2,3 +2,4 @@ python-magic
 pytest
 pytest-cov
 PyYAML
 tox
--- a/filecheck/README.md
+++ b/filecheck/README.md
--- a/filecheck/init.py
+++ b/filecheck/init.py
--- a/filecheck/filecheck.py
+++ b/filecheck/filecheck.py
@ -19,31 +19,42 @@ from pdfid import PDFiD, cPDFiD
 from kittengroomer import FileBase, KittenGroomerBase, Logging
 SEVENZ_PATH = '/usr/bin/7z'
 class Config:
-    """Configuration information for Filecheck."""
+    """Configuration information for filecheck.py."""
-
+    # MIMES
    # Application subtypes (mimetype: 'application/<subtype>')
-    mimes_ooxml = ['vnd.openxmlformats-officedocument.']
+    mimes_ooxml = ('vnd.openxmlformats-officedocument.',)
-    mimes_office = ['msword', 'vnd.ms-']
+    mimes_office = ('msword', 'vnd.ms-',)
-    mimes_libreoffice = ['vnd.oasis.opendocument']
+    mimes_libreoffice = ('vnd.oasis.opendocument',)
-    mimes_rtf = ['rtf', 'richtext']
+    mimes_rtf = ('rtf', 'richtext',)
-    mimes_pdf = ['pdf', 'postscript']
+    mimes_pdf = ('pdf', 'postscript',)
-    mimes_xml = ['xml']
+    mimes_xml = ('xml',)
-    mimes_ms = ['dosexec']
+    mimes_ms = ('dosexec',)
-    mimes_compressed = ['zip', 'rar', 'bzip2', 'lzip', 'lzma', 'lzop',
+    mimes_compressed = ('zip', 'rar', 'x-rar', 'bzip2', 'lzip', 'lzma', 'lzop',
-                        'xz', 'compress', 'gzip', 'tar']
+                        'xz', 'compress', 'gzip', 'tar',)
-    mimes_data = ['octet-stream']
+    mimes_data = ('octet-stream',)
    mimes_audio = ('ogg',)
    # Image subtypes
-    mimes_exif = ['image/jpeg', 'image/tiff']
+    mimes_exif = ('image/jpeg', 'image/tiff',)
-    mimes_png = ['image/png']
+    mimes_png = ('image/png',)
    # Mimetypes with metadata
-    mimes_metadata = ['image/jpeg', 'image/tiff', 'image/png']
+    mimes_metadata = ('image/jpeg', 'image/tiff', 'image/png',)
    # Mimetype aliases
    aliases = {
        # Win executables
        'application/x-msdos-program': 'application/x-dosexec',
        'application/x-dosexec': 'application/x-msdos-program',
        # Other apps with confusing mimetypes
        'application/rtf': 'text/rtf',
        'application/rar': 'application/x-rar',
        'application/ogg': 'audio/ogg',
        'audio/ogg': 'application/ogg'
    }
    # EXTS
    # Commonly used malicious extensions
    # Sources: http://www.howtogeek.com/137270/50-file-extensions-that-are-potentially-dangerous-on-windows/
    # https://github.com/wiregit/wirecode/blob/master/components/core-settings/src/main/java/org/limewire/core/settings/FilterSettings.java
@ -98,22 +109,14 @@ class Config:
        ".sparseimage", ".toast", ".udif",
    )
    # Aliases
    aliases = {
        # Win executables
        'application/x-msdos-program': 'application/x-dosexec',
        'application/x-dosexec': 'application/x-msdos-program',
        # Other apps with confusing mimetypes
        'application/rtf': 'text/rtf',
    }
    # Sometimes, mimetypes.guess_type gives unexpected results, such as for .tar.gz files:
    # In [12]: mimetypes.guess_type('toot.tar.gz', strict=False)
    # Out[12]: ('application/x-tar', 'gzip')
    # It works as expected if you do mimetypes.guess_type('application/gzip', strict=False)
    override_ext = {'.gz': 'application/gzip'}
-    ignored_mimes = ['inode', 'model', 'multipart', 'example']
+
 SEVENZ_PATH = '/usr/bin/7z'
 class File(FileBase):
@ -124,10 +127,9 @@ class File(FileBase):
    filetype-specific processing methods.
    """
-    def __init__(self, src_path, dst_path, logger):
+    def __init__(self, src_path, dst_path):
        super(File, self).__init__(src_path, dst_path)
        self.is_archive = False
        self.logger = logger
        self.tempdir_path = self.dst_path + '_temp'
        subtypes_apps = (
@ -140,6 +142,7 @@ class File(FileBase):
            (Config.mimes_ms, self._executables),
            (Config.mimes_compressed, self._archive),
            (Config.mimes_data, self._binary_app),
            (Config.mimes_audio, self.audio)
        )
        self.app_subtype_methods = self._make_method_dict(subtypes_apps)
@ -162,13 +165,8 @@ class File(FileBase):
            'inode': self.inode,
        }
-    def _check_dangerous(self):
+    def __repr__(self):
-        if not self.has_mimetype:
+        return "<filecheck.File object: {{{}}}>".format(self.filename)
            self.make_dangerous('File has no mimetype')
        if not self.has_extension:
            self.make_dangerous('File has no extension')
        if self.extension in Config.malicious_exts:
            self.make_dangerous('Extension identifies file as potentially dangerous')
    def _check_extension(self):
        """
@ -179,6 +177,9 @@ class File(FileBase):
        mimetype based on its extension differs from the mimetype determined
        by libmagic, then mark the file as dangerous.
        """
        if not self.has_extension:
            self.make_dangerous('File has no extension')
        else:
            if self.extension in Config.override_ext:
                expected_mimetype = Config.override_ext[self.extension]
            else:
@ -188,7 +189,7 @@ class File(FileBase):
                    expected_mimetype = Config.aliases[expected_mimetype]
            is_known_extension = self.extension in mimetypes.types_map.keys()
            if is_known_extension and expected_mimetype != self.mimetype:
-            self.make_dangerous('Mimetype does not match expected mimetype for this extension')
+                self.make_dangerous('Mimetype does not match expected mimetype ({}) for this extension'.format(expected_mimetype))
    def _check_mimetype(self):
        """
@ -197,6 +198,9 @@ class File(FileBase):
        Determine whether the extension that are normally associated with
        the mimetype include the file's actual extension.
        """
        if not self.has_mimetype:
            self.make_dangerous('File has no mimetype')
        else:
            if self.mimetype in Config.aliases:
                mimetype = Config.aliases[self.mimetype]
            else:
@ -205,7 +209,7 @@ class File(FileBase):
                                                                 strict=False)
            if expected_extensions:
                if self.has_extension and self.extension not in expected_extensions:
-                self.make_dangerous('Extension does not match expected extensions for this mimetype')
+                    self.make_dangerous('Extension does not match expected extensions ({}) for this mimetype'.format(expected_extensions))
    def _check_filename(self):
        """
@ -219,7 +223,7 @@ class File(FileBase):
                '.Trashes', '._.Trashes', '.DS_Store', '.fseventsd', '.Spotlight-V100'
            )
            if self.filename in macos_hidden_files:
-                self.add_description('MacOS hidden metadata file.')
+                self.add_description('MacOS metadata file, added by MacOS to USB drives and some directories')
                self.should_copy = False
        right_to_left_override = u"\u202E"
        if right_to_left_override in self.filename:
@ -227,35 +231,28 @@ class File(FileBase):
            new_filename = self.filename.replace(right_to_left_override, '')
            self.set_property('filename', new_filename)
    def _check_malicious_exts(self):
        """Check that the file's extension isn't contained in a blacklist"""
        if self.extension in Config.malicious_exts:
            self.make_dangerous('Extension identifies file as potentially dangerous')
    def check(self):
        """
-        Main file processing method
+        Main file processing method.
-        Delegates to various helper methods including filetype-specific checks.
+        First, checks for basic properties that might indicate a dangerous file.
        If the file isn't dangerous, then delegates to various helper methods
        for filetype-specific checks based on the file's mimetype.
        """
-        if self.maintype in Config.ignored_mimes:
+        # Any of these methods can call make_dangerous():
-            self.should_copy = False
+        self._check_malicious_exts()
            self.mime_processing_options.get(self.maintype, self.unknown)()
        else:
            self._check_dangerous()
            self._check_filename()
            if self.has_extension:
                self._check_extension()
            if self.has_mimetype:
        self._check_mimetype()
        self._check_extension()
        self._check_filename()  # can mutate self.filename
        if not self.is_dangerous:
            self.mime_processing_options.get(self.maintype, self.unknown)()
    def write_log(self):
        """Pass information about the file to self.logger"""
        props = self.get_all_props()
        if not self.is_archive:
            if os.path.exists(self.tempdir_path):
                # FIXME: Hack to make images appear at the correct tree depth in log
                self.logger.add_file(self.src_path, props, in_tempdir=True)
                return
        self.logger.add_file(self.src_path, props)
    # ##### Helper functions #####
    def _make_method_dict(self, list_of_tuples):
        """Returns a dictionary with mimetype: method pairs."""
@ -287,18 +284,22 @@ class File(FileBase):
            self.add_description('File is a symlink to {}'.format(symlink_path))
        else:
            self.add_description('File is an inode (empty file)')
        self.should_copy = False
    def unknown(self):
        """Main type should never be unknown."""
        self.add_description('Unknown mimetype')
        self.should_copy = False
    def example(self):
        """Used in examples, should never be returned by libmagic."""
        self.add_description('Example file')
        self.should_copy = False
    def multipart(self):
        """Used in web apps, should never be returned by libmagic"""
        self.add_description('Multipart file - usually found in web apps')
        self.should_copy = False
    # ##### Treated as malicious, no reason to have it on a USB key ######
    def message(self):
@ -315,12 +316,10 @@ class File(FileBase):
        for mt in Config.mimes_rtf:
            if mt in self.subtype:
                self.add_description('Rich Text (rtf) file')
                # TODO: need a way to convert it to plain text
                self.force_ext('.txt')
                return
        for mt in Config.mimes_ooxml:
            if mt in self.subtype:
                self.add_description('OOXML (openoffice) file')
                self._ooxml()
                return
        self.add_description('Plain text file')
@ -328,16 +327,14 @@ class File(FileBase):
    def application(self):
        """Process an application specific file according to its subtype."""
-        if self.subtype in self.app_subtype_methods:
+        for subtype, method in self.app_subtype_methods.items():
-            method = self.app_subtype_methods[self.subtype]
+            if subtype in self.subtype:  # checking for partial matches
                method()
-            # TODO: should these application methods return a value?
+                return
-        else:
+        self._unknown_app()  # if none of the methods match
            self._unknown_app()
    def _executables(self):
        """Process an executable file."""
        # LOG: change the processing_type property to some other name or include in file_string
        self.make_dangerous('Executable file')
    def _winoffice(self):
@ -372,6 +369,7 @@ class File(FileBase):
    def _ooxml(self):
        """Process an ooxml file."""
        self.add_description('OOXML (openoffice) file')
        try:
            doc = officedissector.doc.Document(self.src_path)
        except Exception:
@ -388,8 +386,6 @@ class File(FileBase):
            self.make_dangerous('Ooxml file with embedded objects')
        if len(doc.features.embedded_packages) > 0:
            self.make_dangerous('Ooxml file with embedded packages')
        if not self.is_dangerous:
            self.add_description('Ooxml file')
    def _libreoffice(self):
        """Process a libreoffice file."""
@ -411,7 +407,6 @@ class File(FileBase):
        """Process a PDF file."""
        xmlDoc = PDFiD(self.src_path)
        oPDFiD = cPDFiD(xmlDoc, True)
        # TODO: are there other pdf characteristics which should be dangerous?
        if oPDFiD.encrypt.count > 0:
            self.make_dangerous('Encrypted pdf')
        if oPDFiD.js.count > 0 or oPDFiD.javascript.count > 0:
@ -536,7 +531,6 @@ class File(FileBase):
        using PIL.Image, saves it to the temporary directory, and copies it to
        the destination.
        """
        # TODO: make sure this method works for png, gif, tiff
        if self.has_metadata:
            self.extract_metadata()
        tempdir_path = self.make_tempdir()
@ -587,32 +581,47 @@ class GroomerLogger(object):
            lf.write(b'\n')
    def add_file(self, file_path, file_props, in_tempdir=False):
-        """Add a file to the log. Takes a dict of file properties."""
+        """Add a file to the log. Takes a path and a dict of file properties."""
        depth = self._get_path_depth(file_path)
-        description_string = ', '.join(file_props['description_string'])
+        try:
            file_hash = Logging.computehash(file_path)[:6]
-        if file_props['is_dangerous']:
+        except IsADirectoryError:
-            description_category = "Dangerous"
+            file_hash = 'directory'
        except FileNotFoundError:
            file_hash = '------'
        if file_props['is_symlink']:
            symlink_template = "+- NOT COPIED: symbolic link to {name} ({sha_hash})"
            log_string = symlink_template.format(
                name=file_props['symlink_path'],
                sha_hash=file_hash
            )
        else:
-            description_category = "Normal"
+            if file_props['is_dangerous']:
                category = "Dangerous"
            else:
                category = "Normal"
            size_string = self._format_file_size(file_props['file_size'])
-        file_template = "+- {name} ({sha_hash}): {size}, type: {mt}/{st}. {desc}: {desc_str}"
+            if not file_props['copied']:
-        file_string = file_template.format(
+                copied_string = 'NOT COPIED: '
            else:
                copied_string = ''
            file_template = "+- {copied}{name} ({sha_hash}): {size}, type: {mt}/{st}. {cat}: {desc_str}"
            log_string = file_template.format(
                copied=copied_string,
                name=file_props['filename'],
                sha_hash=file_hash,
                size=size_string,
                mt=file_props['maintype'],
                st=file_props['subtype'],
-            desc=description_category,
+                cat=category,
-            desc_str=description_string,
+                desc_str=file_props['description_string'],
            )
-        # TODO: finish adding Errors and check that they appear properly
+        if file_props['errors']:
-        # if file_props['errors']:
+            error_string = ', '.join([str(key) for key in file_props['errors']])
-        #     error_string = ', '.join([str(key) for key in file_props['errors']])
+            log_string += (' Errors: ' + error_string)
        #     file_string.append(' Errors: ' + error_string)
        if in_tempdir:
            depth -= 1
-        self._write_line_to_log(file_string, depth)
+        self._write_line_to_log(log_string, depth)
    def add_dir(self, dir_path):
        """Add a directory to the log"""
@ -664,6 +673,11 @@ class KittenGroomerFileCheck(KittenGroomerBase):
        self.max_recursive_depth = max_recursive_depth
        self.logger = GroomerLogger(root_src, root_dst, debug)
    def __repr__(self):
        return "filecheck.KittenGroomerFileCheck object: {{{}}}".format(
            os.path.basename(self.src_root_path)
        )
    def process_dir(self, src_dir, dst_dir):
        """Process a directory on the source key."""
        for srcpath in self.list_files_dirs(src_dir):
@ -671,7 +685,7 @@ class KittenGroomerFileCheck(KittenGroomerBase):
                self.logger.add_dir(srcpath)
            else:
                dstpath = os.path.join(dst_dir, os.path.basename(srcpath))
-                cur_file = File(srcpath, dstpath, self.logger)
+                cur_file = File(srcpath, dstpath)
                self.process_file(cur_file)
    def process_file(self, file):
@ -682,12 +696,13 @@ class KittenGroomerFileCheck(KittenGroomerBase):
        the file to the destionation key, and clean up temporary directory.
        """
        file.check()
        if file.is_archive:
            self.process_archive(file)
        else:
            if file.should_copy:
                file.safe_copy()
                file.set_property('copied', True)
-            file.write_log()
+            self.write_file_to_log(file)
        if file.is_archive:
            self.process_archive(file)
        # TODO: Can probably handle cleaning up the tempdir better
        if hasattr(file, 'tempdir_path'):
            self.safe_rmtree(file.tempdir_path)
@ -705,10 +720,11 @@ class KittenGroomerFileCheck(KittenGroomerBase):
        else:
            tempdir_path = file.make_tempdir()
            command_str = '{} -p1 x "{}" -o"{}" -bd -aoa'
            # -p1=password, x=extract, -o=output location, -bd=no % indicator, -aoa=overwrite existing files
            unpack_command = command_str.format(SEVENZ_PATH,
                                                file.src_path, tempdir_path)
            self._run_process(unpack_command)
-            file.write_log()
+            self.write_file_to_log(file)
            self.process_dir(tempdir_path, file.dst_path)
            self.safe_rmtree(tempdir_path)
        self.recursive_archive_depth -= 1
@ -723,6 +739,14 @@ class KittenGroomerFileCheck(KittenGroomerBase):
                return
        return True
    def write_file_to_log(self, file):
        """Pass information about `file` to self.logger."""
        props = file.get_all_props()
        if not file.is_archive:
            # FIXME: in_tempdir is a hack to make image files appear at the correct tree depth in log
            in_tempdir = os.path.exists(file.tempdir_path)
            self.logger.add_file(file.src_path, props, in_tempdir)
    def list_files_dirs(self, root_dir_path):
        """
        Returns a list of all files and directories
--- a/kittengroomer/helpers.py
+++ b/kittengroomer/helpers.py
@ -12,6 +12,7 @@ import os
 import hashlib
 import shutil
 import argparse
 import stat
 import magic
@ -36,7 +37,7 @@ class FileBase(object):
        self.is_dangerous = False
        self.copied = False
        self.symlink_path = None
-        self.description_string = []  # array of descriptions to be joined
+        self._description_string = []  # array of descriptions to be joined
        self._errors = {}
        self._user_defined = {}
        self.should_copy = True
@ -90,18 +91,24 @@ class FileBase(object):
    @property
    def description_string(self):
-        return self.__description_string
+        if len(self._description_string) == 0:
            return 'No description'
        elif len(self._description_string) == 1:
            return self._description_string[0]
        else:
            ret_string = ', '.join(self._description_string)
            return ret_string.strip(', ')
    @description_string.setter
    def description_string(self, value):
        if hasattr(self, 'description_string'):
            if isinstance(value, str):
-                if value not in self.__description_string:
+                if value not in self._description_string:
-                    self.__description_string.append(value)
+                    self._description_string.append(value)
            else:
                raise TypeError("Description_string can only include strings")
        else:
-            self.__description_string = value
+            self._description_string = value
    def set_property(self, prop_string, value):
        """
@ -139,6 +146,7 @@ class FileBase(object):
            'subtype': self.subtype,
            'extension': self.extension,
            'is_dangerous': self.is_dangerous,
            'is_symlink': self.is_symlink,
            'symlink_path': self.symlink_path,
            'copied': self.copied,
            'description_string': self.description_string,
@ -173,7 +181,11 @@ class FileBase(object):
            self.add_description(reason_string)
    def safe_copy(self, src=None, dst=None):
-        """Copy file and create destination directories if needed."""
+        """
        Copy file and create destination directories if needed.
        Sets all exec bits to '0'.
        """
        if src is None:
            src = self.src_path
        if dst is None:
@ -181,6 +193,10 @@ class FileBase(object):
        try:
            os.makedirs(self.dst_dir, exist_ok=True)
            shutil.copy(src, dst)
            current_perms = self._get_file_permissions(dst)
            only_exec_bits = 0o0111
            perms_no_exec = current_perms & (~only_exec_bits)
            os.chmod(dst, perms_no_exec)
        except IOError as e:
            # Probably means we can't write in the dest dir
            self.add_error(e, '')
@ -234,16 +250,14 @@ class FileBase(object):
        else:
            try:
                mt = magic.from_file(file_path, mime=True)
-                # libmagic will always return something, even if it's just 'data'
+                # libmagic always returns something, even if it's just 'data'
            except UnicodeEncodeError as e:
                # FIXME: The encoding of the file that triggers this is broken (possibly it's UTF-16 and Python expects utf8)
                # Note: one of the Travis files will trigger this exception
                self.add_error(e, '')
                mt = None
            try:
                mimetype = mt.decode("utf-8")
            except:
-                # FIXME: what should the exception be here if mimetype isn't utf-8?
+                # FIXME: what should the exception be if mimetype isn't utf-8?
                mimetype = mt
        return mimetype
@ -262,6 +276,15 @@ class FileBase(object):
            size = 0
        return size
    def _remove_exec_bit(self, file_path):
        current_perms = self._get_file_permissions(file_path)
        perms_no_exec = current_perms & (~stat.S_IEXEC)
        os.chmod(file_path, perms_no_exec)
    def _get_file_permissions(self, file_path):
        full_mode = os.stat(file_path, follow_symlinks=False).st_mode
        return stat.S_IMODE(full_mode)
 class Logging(object):
@ -304,7 +327,6 @@ class KittenGroomerBase(object):
    def list_all_files(self, directory_path):
        """Generator yielding path to all of the files in a directory tree."""
        for root, dirs, files in os.walk(directory_path):
            # files is a list anyway so we don't get much from using a generator here
            for filename in files:
                filepath = os.path.join(root, filename)
                yield filepath
@ -329,7 +351,12 @@ class ImplementationRequired(KittenGroomerError):
    pass
-def main(kg_implementation, description='Call a KittenGroomer implementation to process files present in the source directory and copy them to the destination directory.'):
+def main(
        kg_implementation,
        description=("Call a KittenGroomer implementation to process files "
                     "present in the source directory and copy them to the "
                     "destination directory.")):
    print(description)
    parser = argparse.ArgumentParser(prog='KittenGroomer', description=description)
    parser.add_argument('-s', '--source', type=str, help='Source directory')
    parser.add_argument('-d', '--destination', type=str, help='Destination directory')
--- a/scripts/run_filecheck_single_file.py
+++ b/scripts/run_filecheck_single_file.py
@ -0,0 +1,29 @@
 import sys
 from filecheck.filecheck import File
 PATH = 'tests/dangerous/bypass.docx'
 # PATH = 'tests/normal/word_docx.docx'
 def main():
    try:
        file = File(sys.argv[1], '')
    except IndexError:
        file = File(PATH, '')
    file.check()
    print(
        "Name: " + file.filename,
        "Desc: " + file.description_string,
        "Mime: " + file.mimetype,
        "Desc list: " + repr(file._description_string),
        "Size: " + str(file.size),
        "Src path: " + file.src_path,
        "Is dangerous: " + str(file.is_dangerous),
        sep='\n'
    )
 if __name__ == '__main__':
    main()
--- a/setup.py
+++ b/setup.py
@ -4,7 +4,7 @@ from setuptools import setup
 setup(
    name='kittengroomer',
-    version='2.1.0',
+    version='2.2.0',
    author='Raphaël Vinot',
    author_email='raphael.vinot@circl.lu',
    maintainer='Raphaël Vinot',
@ -12,7 +12,7 @@ setup(
    description='Standalone CIRCLean/KittenGroomer code.',
    packages=['kittengroomer'],
    scripts=[
-        'bin/filecheck.py'
+        'filecheck/filecheck.py'
    ],
    classifiers=[
        'License :: OSI Approved :: BSD License',
--- a/tests/dangerous/Example.svg
+++ b/tests/dangerous/Example.svg
--- a/tests/file_catalog.yaml
+++ b/tests/file_catalog.yaml
@ -12,12 +12,8 @@ normal:
  Example.ogg: # Added: 27-06-2017, source: https://en.wikipedia.org/wiki/File:Example.ogg
    description: Ogg vorbis sound file
    mimetype: audio/ogg
    xfail: True
  Example.png: # Added: 27-06-2017, source: https://en.wikipedia.org/wiki/File:Example.png
    mimetype: image/png
  Example.svg: # Added: 27-06-2017, source: https://en.wikipedia.org/wiki/File:Example.svg
    mimetype: image/svg+xml
    xfail: True
  pdf-sample.pdf: # Added: 27-06-2017, source: http://che.org.il/wp-content/uploads/2016/12/pdf-sample.pdf
    mimetype: application/pdf
  plaintext.txt: # Added: 27-06-2017, source: hand-generated
@ -25,11 +21,13 @@ normal:
  rar_archive.rar: # Added: 27-06-2017, Rar archive. Source: hand-generated
    description: rar archive
    mimetype: application/x-rar
    xfail: True
  rich_text.rtf: # Added 27-06-2017), source: hand-generated
    mimetype: text/rtf
  sample_mpeg4.mp4: # Added 28-06-2017, source: https://support.apple.com/en-us/HT201549
    mimetype: video/mp4
  word_docx.docx: # Added 24-07-2017, source: hand-generated using MacOS Microsoft Word 2011
    description: normal word document
    mimetype: application/vnd.openxmlformats-officedocument.wordprocessingml.document
  zip_archive.zip: # Added 27-06-2017, source: hand-generated
    mimetype: application/zip
@ -48,6 +46,9 @@ dangerous:
  config_file.conf: # Added 27-06-2017, source: hand-generated
    description: config file
    mimetype: text/plain
  Example.svg: # Added: 27-06-2017, source: https://en.wikipedia.org/wiki/File:Example.svg
    description: normal svg file, should probably be replaced by a dangerous svg
    mimetype: image/svg+xml
  message.msg: # Added 27-06-2017, source: ????
    description: message file, used by Outlook etc
    mimetype: message/rfc822
--- a/tests/logging/.keepdir
+++ b/tests/logging/.keepdir
--- a/tests/logging/dir1/dir2/test.txt
+++ b/tests/logging/dir1/dir2/test.txt
@ -0,0 +1 @@
 test
--- a/tests/logging/symlink_test
+++ b/tests/logging/symlink_test
@ -0,0 +1 @@
 test.txt
--- a/tests/logging/test.conf
+++ b/tests/logging/test.conf
@ -0,0 +1 @@
 test
--- a/tests/logging/test.txt
+++ b/tests/logging/test.txt
@ -0,0 +1 @@
 test
--- a/tests/normal/word_docx.docx
+++ b/tests/normal/word_docx.docx
--- a/tests/test_filecheck.py
+++ b/tests/test_filecheck.py
@ -2,13 +2,12 @@
 # -*- coding: utf-8 -*-
 import os
 import unittest.mock as mock
 import pytest
 import yaml
 try:
-    from bin.filecheck import KittenGroomerFileCheck, File, GroomerLogger
+    from filecheck.filecheck import KittenGroomerFileCheck, File
    NODEPS = False
 except ImportError:
    NODEPS = True
@ -95,6 +94,7 @@ def get_filename(sample_file):
 def src_dir_path(tmpdir_factory):
    return tmpdir_factory.mktemp('src').strpath
@fixture(scope='module')
 def dest_dir_path(tmpdir_factory):
    return tmpdir_factory.mktemp('dest').strpath
@ -106,21 +106,18 @@ def groomer(dest_dir_path):
    return KittenGroomerFileCheck(dummy_src_path, dest_dir_path, debug=True)
@fixture
 def mock_logger(dest_dir_path):
    return mock.MagicMock(spec=GroomerLogger)
@parametrize(
    argnames="sample_file",
    argvalues=gather_sample_files(),
    ids=get_filename)
-def test_sample_files(mock_logger, sample_file, groomer, dest_dir_path):
+def test_sample_files(sample_file, groomer, dest_dir_path):
    if sample_file.xfail:
        pytest.xfail(reason='Marked xfail in file catalog')
    file_dest_path = os.path.join(dest_dir_path, sample_file.filename)
-    file = File(sample_file.path, file_dest_path, mock_logger)
+    file = File(sample_file.path, file_dest_path)
    groomer.process_file(file)
    print(file.description_string)
    print(file.mimetype)
    assert file.is_dangerous == sample_file.exp_dangerous
--- a/tests/test_filecheck_logging.py
+++ b/tests/test_filecheck_logging.py
@ -2,22 +2,30 @@
 # -*- coding: utf-8 -*-
 import os
-import datetime
+from datetime import datetime
 import pytest
 try:
    from filecheck.filecheck import KittenGroomerFileCheck
    NODEPS = False
 except ImportError:
    NODEPS = True
 pytestmark = pytest.mark.skipif(NODEPS, reason="Dependencies aren't installed")
 def save_logs(groomer, test_description):
    divider = ('=' * 10 + '{}' + '=' * 10 + '\n')
-    test_log_path = 'tests/test_logs/{}.log'.format(test_description)
+    test_log_path = 'tests/{}.log'.format(test_description)
    time_now = str(datetime.now().time()) + '\n'
    with open(test_log_path, 'wb+') as test_log:
-        log_header = divider.format('TEST LOG')
+        test_log_header = divider.format('TEST LOG')
-        test_log.write(bytes(log_header, encoding='utf-8'))
+        test_log.write(bytes(test_log_header, encoding='utf-8'))
        test_log.write(bytes(time_now, encoding='utf-8'))
        test_log.write(bytes(test_description, encoding='utf-8'))
        test_log.write(b'\n')
-        test_log.write(b'-' * 20 + b'\n')
+        log_header = divider.format('STD LOG')
        test_log.write(bytes(log_header, encoding='utf-8'))
        with open(groomer.logger.log_path, 'rb') as logfile:
            log = logfile.read()
            test_log.write(log)
@ -31,3 +39,9 @@ def save_logs(groomer, test_description):
            with open(groomer.logger.log_debug_out, 'rb') as debug_out:
                out = debug_out.read()
                test_log.write(out)
 def test_logging(tmpdir):
    groomer = KittenGroomerFileCheck('tests/logging/', tmpdir.strpath)
    groomer.run()
    save_logs(groomer, "visual_logging_test")
--- a/tests/test_kittengroomer.py
+++ b/tests/test_kittengroomer.py
@ -172,13 +172,18 @@ class TestFileBase:
    def test_add_new_description(self, text_file):
        """Adding a new description should add it to the list of description strings."""
        text_file.add_description('thing')
-        assert text_file.get_property('description_string') == ['thing']
+        assert text_file.get_property('description_string') == 'thing'
    def test_add_description_exists(self, text_file):
        """Adding a description that already exists shouldn't duplicate it."""
        text_file.add_description('thing')
        text_file.add_description('thing')
-        assert text_file.get_property('description_string') == ['thing']
+        assert text_file.get_property('description_string') == 'thing'
    def test_add_multiple_descriptions(self, text_file):
        text_file.add_description('thing')
        text_file.add_description('foo')
        assert text_file.get_property('description_string') == 'thing, foo'
    def test_add_description_not_string(self, text_file):
        """Adding a description that isn't a string should raise an error."""
@ -205,7 +210,7 @@ class TestFileBase:
        """Marking a file as dangerous and passing in a description should add
        that description to the file."""
        text_file.make_dangerous('thing')
-        assert text_file.get_property('description_string') == ['thing']
+        assert text_file.get_property('description_string') == 'thing'
    def test_dangerous_file_mark_dangerous(self, text_file):
        """Marking a dangerous file as dangerous should do nothing, and the
@ -251,6 +256,11 @@ class TestFileBase:
            file.safe_copy()
            mock_copy.assert_called_once_with(file_path, dst_path)
    def test_safe_copy_removes_exec_perms(self):
        """`safe_copy` should create a file that doesn't have any of the
        executable bits set."""
        pass
    def test_safe_copy_makedir_doesnt_exist(self):
        """Calling safe_copy should create intermediate directories in the path
        if they don't exist."""