Commit Graph

111 Commits (60bd5362c119e9ce0db36b9f955e1f4dac0284d1)

Author SHA1 Message Date
Dan Puttick 60bd5362c1 Add normal word doc to file_catalog 2017-07-27 17:06:15 -04:00
Dan Puttick 159bc9cee2 Add __repr__s to File and KittenGroomerFileCheck 2017-07-20 18:40:16 -04:00
Dan Puttick cc5d1e5117 Fix issue with hashing symlinks to directories 2017-07-20 18:40:16 -04:00
Dan Puttick 4205d57dec Fix logging for errors and symlinks 2017-07-20 18:40:16 -04:00
Dan Puttick fe82a5ac0d Change rest of lists in Config to tuples 2017-07-20 18:40:16 -04:00
Dan Puttick 35eb8ea8ab Fixups for PR #16
* ObjectPool winoffice files are now make_dangerous
* safe_copy now catches IOErrors only
* Use os.makedirs(exist_ok=True) instead of checking for existence in safe_copy
and create_metadata_file
* Added stubs for two tests related to safe_copy
2017-07-17 14:52:22 -04:00
Dan Puttick 0c35885f17 Prevent copying MacOS hidden files 2017-07-17 10:22:46 -04:00
Dan Puttick 270597586e Remove self.cur_file from filecheck 2017-07-16 18:36:22 -04:00
Dan Puttick 36c4493cd6 Fix handling of symlinks 2017-07-16 14:05:48 -04:00
Dan Puttick 7363a16318 Remove calls to make_binary and make_unknown 2017-07-14 21:57:29 -04:00
Dan Puttick 41abe7e5d6 Prevent following arbitrarily nested symlinks 2017-07-14 17:52:21 -04:00
Dan Puttick e19064c83f Refactor FileBase to store props as attributes
* This is kind of a big refactoring - I realized that storing file
props in a dict was causing some subtle problems, and that just having
them as attributes makes things a lot more simple
* I considered making a separate FileProps object and nesting it
inside FileBase, but almost all FileBase methods concern manipulating
file props, so it didn't really make sense.
* Tests are almost passing with this commit, but need a few more changes
and fixes for full test coverage + all passing.
2017-07-12 17:58:39 -04:00
Dan Puttick e51a503c33 Remove PIL.PngImagePlugin import
* This dependency isn't being used anymore: when I test, pngs are still
processed normally
2017-06-27 12:13:29 -04:00
Dan Puttick fa1df8e67f Remove rtl character from filename in log
* Previously, filecheck.py removed rtl character from the destination path only.
* Now, the rtl character is replaced in file.filename and the filename file
property.
2017-06-27 12:13:29 -04:00
Dan Puttick a76b0df543 Add more docstrings to filecheck.py 2017-06-27 12:13:29 -04:00
Dan Puttick 76467e420e Add work-in-progress support for errors in log 2017-06-27 11:54:57 -04:00
Dan Puttick 3c1fcda29e Add different file size units to log
* Format file sizes with B, KB, MB, or GB depending on size
* Clean up logic and variable names in add_file()
2017-06-27 11:54:57 -04:00
Dan Puttick e27d397496 Change is_recursive to is_archive
* is_archive is clearer and more descriptive than is_recursive
2017-06-27 11:54:57 -04:00
Raphaël Vinot 079e8d30a3 Add support for ObjectStream in PDF
ObjectStream isn't necessarely malicious, but can be. This patch could
be improved by unpacking the content of the stream, but it requires 3rd
party libraries we don't have for now.

Final fix for PCL-01-002
2017-06-19 11:24:47 +02:00
Raphaël Vinot 7d38ec3d32 Remove extensions processed by the script 2017-06-16 17:57:07 +02:00
Raphaël Vinot f44719b83e Add list of malicious extensions used in Google Chrome
Fix PCL-01-009
2017-06-16 17:26:39 +02:00
Raphaël Vinot 40f71e758f Fix logging and symlinks
Fix PCL-01-006
2017-06-16 14:47:53 +02:00
Raphaël Vinot 8c007e28cf Add support for XFA structure un PDF
Partial fix for PCL-01-002
2017-06-16 14:47:19 +02:00
Dan Puttick 45d71cb362 Fix unicode filename issues using fsencode
* Same problem we've had before - linux filenames can have non-unicode chars
in them
* We need to write the filename as raw bytes to the log
* os.fsencode lets us convert a utf-8 encoded string to bytes and ignore those
that can't be printed as unicode
* Still not clear if the log generated this way will be human-readable
2017-04-10 13:39:28 +02:00
Dan Puttick 3f49612a23 Add new logger, move logging to filecheck
* Wrote a new text-based logger that displays all file information in the tree
instead of using two separate logs
* Stopped using twiggy since it wasn't giving us anything useful
* Moved a lot of the logging code to filecheck, since it didn't really seem
appropriate as an API. Left a Logging stub in kittengroomer to hold methods
that might be useful for implementing other loggers.
* For the new logger, had to change the way that we traverse the items in the
source file tree.
2017-04-10 13:22:20 +02:00
Dan Puttick f0e7607a3f Improve description strings in filecheck
* Description strings that appear in the log improved in filecheck for various
file types
* Added various comments
2017-04-10 13:00:34 +02:00
Dan Puttick 6f9e36a578 Change filecheck for new file description method
* self.add_file_string -> self.add_description
2017-03-22 12:04:22 -04:00
Dan Puttick 3e7b38c5d4 Improve doc strings on FileBase 2017-03-21 18:58:17 -04:00
Dan Puttick 6851461755 Change from two separate logs to one 2017-03-20 16:10:57 -04:00
Dan Puttick 51760ebbb1 Move default log setup back into filecheck
* Realized that the API consumer might want to write their own logging tool.
* FileBase and KittenGroomerBase will have no logging code.
* If the API consumer likes, they can import GroomerLogger and use it in their
implementation.
2017-03-20 16:10:57 -04:00
Dan Puttick 71bcc79c20 Remove rtl override char from file dst_path
The unicode right to left override character can be used for various attacks.
This commit:
* Detects this character in the filename on the source key
* Strips it from the path before copying it to the dest key
* Marks the file as dangerous (this character doesn't belong in a filename)
2017-03-16 12:22:26 -04:00
Dan Puttick 4d8a1d1daf Add/update docstrings for filecheck and helpers 2017-03-15 22:56:00 -04:00
Dan Puttick ac94cf5d6d Change the way test dst dirs are handled
* Each test folder now copies files into its own test directory
* Change gitignore due to dst dir changes
* Make sure logger.tree is called for every directory
2017-03-15 22:56:00 -04:00
Dan Puttick 0175ee48e5 Add TODOs and clarify various logging messages 2017-03-15 22:56:00 -04:00
Dan Puttick 963a2feef4 Change various methods to properties 2017-03-15 22:55:51 -04:00
Dan Puttick 59cde8cfd5 Move safe_copy to FileBase 2017-03-15 21:06:07 -04:00
Dan Puttick e73721e95f Fix bug with safe_copy 2017-03-15 21:06:07 -04:00
Dan Puttick 484c71fc86 Turn off copying for certain mimes in filecheck 2017-03-15 21:06:07 -04:00
Dan Puttick 18857da7ca Several small bugfixes
* Fix issue with main/subtypes in init
* Fix bug in File.check() in filecheck.py
* Fix FileBase.size for symlinks
2017-03-15 21:06:07 -04:00
Dan Puttick 3fe8c7c223 Adjust order of property initialization
Tests were failing due to values being set before file_props dict
was created
2017-03-15 21:06:07 -04:00
Dan Puttick 0038d3ef66 Switch to using file properties
* make_dangerous now takes a description string
* add_file_string takes strings describing the file
2017-03-15 21:06:07 -04:00
Dan Puttick fc8923fddd Change Groomer private methods to public
* Changed safe_rmtree, safe_copy, safe_remove, and safe_mkdir to public methods
* If something is being used in a subclass it probably shouldn't be a private
method
2017-03-15 21:06:07 -04:00
Dan Puttick 12d5624b4d Change FileBase.log_details to Filebase._file_props
* _file_props is a dict that will hold all information about the file
* Updated filecheck.py to reflect this
* Potentially will change contents of file_props to being attributes on the
file in the future. This change would be easy since all access to _file_props
is now via set_property and get_property methods.
* Add filename to _file_props
2017-03-15 21:06:06 -04:00
Dan Puttick 9832101c85 Identify TODOs that are log related 2017-03-15 21:06:06 -04:00
Dan Puttick 8d7dd1197f Move run_process back to Groomer object 2017-03-15 21:06:06 -04:00
Dan Puttick 781d0a76af First working version with methods in File object
- All tests now passing with file handling methods on File object
instead of Groomer object.
- Logging functionality still isn't finished.
2017-03-15 21:06:06 -04:00
Dan Puttick 9aafe6e518 Remove cur_file from methods in File object 2017-03-15 21:06:06 -04:00
Dan Puttick 53c1598af8 Move file processing methods into File object
- It seems like filecheck will be easier to reason about if all of
the file processing stuff happens in the File object. The Groomer
object will now be responsible only for enumerating the files to
be processed.
- Tests won't pass for this commit, but wanted to make the diff
cleaner but committing this before making changes.
2017-03-15 21:04:57 -04:00
Dan Puttick 3d36c90d66 Make list_all_files a public method 2017-03-15 21:04:57 -04:00
Dan Puttick c6ecc5e3a3 Fix process_dir bug in filecheck tests 2017-03-15 21:04:57 -04:00