Commit Graph

100 Commits (b77451ae7ae807b90bb9e157842b8f5bede56461)

Author SHA1 Message Date
Dan Puttick e19064c83f Refactor FileBase to store props as attributes
* This is kind of a big refactoring - I realized that storing file
props in a dict was causing some subtle problems, and that just having
them as attributes makes things a lot more simple
* I considered making a separate FileProps object and nesting it
inside FileBase, but almost all FileBase methods concern manipulating
file props, so it didn't really make sense.
* Tests are almost passing with this commit, but need a few more changes
and fixes for full test coverage + all passing.
2017-07-12 17:58:39 -04:00
Dan Puttick e51a503c33 Remove PIL.PngImagePlugin import
* This dependency isn't being used anymore: when I test, pngs are still
processed normally
2017-06-27 12:13:29 -04:00
Dan Puttick fa1df8e67f Remove rtl character from filename in log
* Previously, filecheck.py removed rtl character from the destination path only.
* Now, the rtl character is replaced in file.filename and the filename file
property.
2017-06-27 12:13:29 -04:00
Dan Puttick a76b0df543 Add more docstrings to filecheck.py 2017-06-27 12:13:29 -04:00
Dan Puttick 76467e420e Add work-in-progress support for errors in log 2017-06-27 11:54:57 -04:00
Dan Puttick 3c1fcda29e Add different file size units to log
* Format file sizes with B, KB, MB, or GB depending on size
* Clean up logic and variable names in add_file()
2017-06-27 11:54:57 -04:00
Dan Puttick e27d397496 Change is_recursive to is_archive
* is_archive is clearer and more descriptive than is_recursive
2017-06-27 11:54:57 -04:00
Raphaël Vinot 079e8d30a3 Add support for ObjectStream in PDF
ObjectStream isn't necessarely malicious, but can be. This patch could
be improved by unpacking the content of the stream, but it requires 3rd
party libraries we don't have for now.

Final fix for PCL-01-002
2017-06-19 11:24:47 +02:00
Raphaël Vinot 7d38ec3d32 Remove extensions processed by the script 2017-06-16 17:57:07 +02:00
Raphaël Vinot f44719b83e Add list of malicious extensions used in Google Chrome
Fix PCL-01-009
2017-06-16 17:26:39 +02:00
Raphaël Vinot 40f71e758f Fix logging and symlinks
Fix PCL-01-006
2017-06-16 14:47:53 +02:00
Raphaël Vinot 8c007e28cf Add support for XFA structure un PDF
Partial fix for PCL-01-002
2017-06-16 14:47:19 +02:00
Dan Puttick 45d71cb362 Fix unicode filename issues using fsencode
* Same problem we've had before - linux filenames can have non-unicode chars
in them
* We need to write the filename as raw bytes to the log
* os.fsencode lets us convert a utf-8 encoded string to bytes and ignore those
that can't be printed as unicode
* Still not clear if the log generated this way will be human-readable
2017-04-10 13:39:28 +02:00
Dan Puttick 3f49612a23 Add new logger, move logging to filecheck
* Wrote a new text-based logger that displays all file information in the tree
instead of using two separate logs
* Stopped using twiggy since it wasn't giving us anything useful
* Moved a lot of the logging code to filecheck, since it didn't really seem
appropriate as an API. Left a Logging stub in kittengroomer to hold methods
that might be useful for implementing other loggers.
* For the new logger, had to change the way that we traverse the items in the
source file tree.
2017-04-10 13:22:20 +02:00
Dan Puttick f0e7607a3f Improve description strings in filecheck
* Description strings that appear in the log improved in filecheck for various
file types
* Added various comments
2017-04-10 13:00:34 +02:00
Dan Puttick 6f9e36a578 Change filecheck for new file description method
* self.add_file_string -> self.add_description
2017-03-22 12:04:22 -04:00
Dan Puttick 3e7b38c5d4 Improve doc strings on FileBase 2017-03-21 18:58:17 -04:00
Dan Puttick 6851461755 Change from two separate logs to one 2017-03-20 16:10:57 -04:00
Dan Puttick 51760ebbb1 Move default log setup back into filecheck
* Realized that the API consumer might want to write their own logging tool.
* FileBase and KittenGroomerBase will have no logging code.
* If the API consumer likes, they can import GroomerLogger and use it in their
implementation.
2017-03-20 16:10:57 -04:00
Dan Puttick 71bcc79c20 Remove rtl override char from file dst_path
The unicode right to left override character can be used for various attacks.
This commit:
* Detects this character in the filename on the source key
* Strips it from the path before copying it to the dest key
* Marks the file as dangerous (this character doesn't belong in a filename)
2017-03-16 12:22:26 -04:00
Dan Puttick 4d8a1d1daf Add/update docstrings for filecheck and helpers 2017-03-15 22:56:00 -04:00
Dan Puttick ac94cf5d6d Change the way test dst dirs are handled
* Each test folder now copies files into its own test directory
* Change gitignore due to dst dir changes
* Make sure logger.tree is called for every directory
2017-03-15 22:56:00 -04:00
Dan Puttick 0175ee48e5 Add TODOs and clarify various logging messages 2017-03-15 22:56:00 -04:00
Dan Puttick 963a2feef4 Change various methods to properties 2017-03-15 22:55:51 -04:00
Dan Puttick 59cde8cfd5 Move safe_copy to FileBase 2017-03-15 21:06:07 -04:00
Dan Puttick e73721e95f Fix bug with safe_copy 2017-03-15 21:06:07 -04:00
Dan Puttick 484c71fc86 Turn off copying for certain mimes in filecheck 2017-03-15 21:06:07 -04:00
Dan Puttick 18857da7ca Several small bugfixes
* Fix issue with main/subtypes in init
* Fix bug in File.check() in filecheck.py
* Fix FileBase.size for symlinks
2017-03-15 21:06:07 -04:00
Dan Puttick 3fe8c7c223 Adjust order of property initialization
Tests were failing due to values being set before file_props dict
was created
2017-03-15 21:06:07 -04:00
Dan Puttick 0038d3ef66 Switch to using file properties
* make_dangerous now takes a description string
* add_file_string takes strings describing the file
2017-03-15 21:06:07 -04:00
Dan Puttick fc8923fddd Change Groomer private methods to public
* Changed safe_rmtree, safe_copy, safe_remove, and safe_mkdir to public methods
* If something is being used in a subclass it probably shouldn't be a private
method
2017-03-15 21:06:07 -04:00
Dan Puttick 12d5624b4d Change FileBase.log_details to Filebase._file_props
* _file_props is a dict that will hold all information about the file
* Updated filecheck.py to reflect this
* Potentially will change contents of file_props to being attributes on the
file in the future. This change would be easy since all access to _file_props
is now via set_property and get_property methods.
* Add filename to _file_props
2017-03-15 21:06:06 -04:00
Dan Puttick 9832101c85 Identify TODOs that are log related 2017-03-15 21:06:06 -04:00
Dan Puttick 8d7dd1197f Move run_process back to Groomer object 2017-03-15 21:06:06 -04:00
Dan Puttick 781d0a76af First working version with methods in File object
- All tests now passing with file handling methods on File object
instead of Groomer object.
- Logging functionality still isn't finished.
2017-03-15 21:06:06 -04:00
Dan Puttick 9aafe6e518 Remove cur_file from methods in File object 2017-03-15 21:06:06 -04:00
Dan Puttick 53c1598af8 Move file processing methods into File object
- It seems like filecheck will be easier to reason about if all of
the file processing stuff happens in the File object. The Groomer
object will now be responsible only for enumerating the files to
be processed.
- Tests won't pass for this commit, but wanted to make the diff
cleaner but committing this before making changes.
2017-03-15 21:04:57 -04:00
Dan Puttick 3d36c90d66 Make list_all_files a public method 2017-03-15 21:04:57 -04:00
Dan Puttick c6ecc5e3a3 Fix process_dir bug in filecheck tests 2017-03-15 21:04:57 -04:00
Dan Puttick cfeccc2561 Move main() into filecheck
- Also change the name of processdir to process_dir
2017-03-15 21:04:57 -04:00
Dan Puttick 61aa14c98d Change _write_log to _print_log 2017-03-15 21:04:57 -04:00
Dan Puttick a450fe6b96 Add config object to filecheck
- Grouped all configuration options for filecheck into a Config object
- Makes the code easier to read since no longer many references to different
configuration globals
2017-03-15 21:04:57 -04:00
Dan Puttick 7d62238270 Hacks to make tests pass before fixing 2017-03-15 21:04:57 -04:00
Dan Puttick 1cf8a62f46 First commit with Logger object
- Made logger object
- Moved some logger related code from Groomer to Logger
- Changed logging related tests
- Filecheck tests still do not pass
2017-03-15 21:04:57 -04:00
Dan Puttick 92d1b1cd93 Refactor metadata processing code 2017-03-15 21:01:28 -04:00
Dan Puttick e2af701ac9 Remove several pieces of unused code
* Remove python 2 KittenGroomerBase.tree
* Remove default source and dest from KittenGroomerFileCheck
* Remove unused sys import
2017-03-15 21:01:28 -04:00
Raphaël Vinot a3cad2c21e Fix forgotten copy 2017-03-14 10:47:20 +01:00
Dan Puttick fd30fb3e08 Change _run_process() to use builtin timeout parameter
NOTE: this change breaks Python 2 compatability: subprocess.check_call does not
take a timeout argument in Python 2.7
2017-01-19 17:00:10 -05:00
Dan Puttick 21cc175867 Move non-filecheck.py binaries into examples directory
Tests for these scripts also removed from /tests and from .travis.yml
Two .zip archives accidentally deleted from /tests/src_invalid, re-added them
and changed .gitignore to prevent the problem
2017-01-19 15:25:08 -05:00
Dan Puttick 3dad4faa61 Reorganize tests making them easier to run
- The tests now automatically run depending on whether you have the dependencies
installed, instead of failing and throwing exceptions.
- CONTRIBUTING.md has more information on how to run the tests.
- When the tests run, they will save their logs to /test_logs instead
of printing them so you can read them later.
- Change names of source file directories to make them more descriptive
2017-01-18 15:51:54 -05:00