* This is kind of a big refactoring - I realized that storing file
props in a dict was causing some subtle problems, and that just having
them as attributes makes things a lot more simple
* I considered making a separate FileProps object and nesting it
inside FileBase, but almost all FileBase methods concern manipulating
file props, so it didn't really make sense.
* Tests are almost passing with this commit, but need a few more changes
and fixes for full test coverage + all passing.
* Previously, filecheck.py removed rtl character from the destination path only.
* Now, the rtl character is replaced in file.filename and the filename file
property.
ObjectStream isn't necessarely malicious, but can be. This patch could
be improved by unpacking the content of the stream, but it requires 3rd
party libraries we don't have for now.
Final fix for PCL-01-002
* Same problem we've had before - linux filenames can have non-unicode chars
in them
* We need to write the filename as raw bytes to the log
* os.fsencode lets us convert a utf-8 encoded string to bytes and ignore those
that can't be printed as unicode
* Still not clear if the log generated this way will be human-readable
* Wrote a new text-based logger that displays all file information in the tree
instead of using two separate logs
* Stopped using twiggy since it wasn't giving us anything useful
* Moved a lot of the logging code to filecheck, since it didn't really seem
appropriate as an API. Left a Logging stub in kittengroomer to hold methods
that might be useful for implementing other loggers.
* For the new logger, had to change the way that we traverse the items in the
source file tree.
* Realized that the API consumer might want to write their own logging tool.
* FileBase and KittenGroomerBase will have no logging code.
* If the API consumer likes, they can import GroomerLogger and use it in their
implementation.
The unicode right to left override character can be used for various attacks.
This commit:
* Detects this character in the filename on the source key
* Strips it from the path before copying it to the dest key
* Marks the file as dangerous (this character doesn't belong in a filename)
* Each test folder now copies files into its own test directory
* Change gitignore due to dst dir changes
* Make sure logger.tree is called for every directory
* Changed safe_rmtree, safe_copy, safe_remove, and safe_mkdir to public methods
* If something is being used in a subclass it probably shouldn't be a private
method
* _file_props is a dict that will hold all information about the file
* Updated filecheck.py to reflect this
* Potentially will change contents of file_props to being attributes on the
file in the future. This change would be easy since all access to _file_props
is now via set_property and get_property methods.
* Add filename to _file_props
- It seems like filecheck will be easier to reason about if all of
the file processing stuff happens in the File object. The Groomer
object will now be responsible only for enumerating the files to
be processed.
- Tests won't pass for this commit, but wanted to make the diff
cleaner but committing this before making changes.
- Grouped all configuration options for filecheck into a Config object
- Makes the code easier to read since no longer many references to different
configuration globals
Tests for these scripts also removed from /tests and from .travis.yml
Two .zip archives accidentally deleted from /tests/src_invalid, re-added them
and changed .gitignore to prevent the problem
- The tests now automatically run depending on whether you have the dependencies
installed, instead of failing and throwing exceptions.
- CONTRIBUTING.md has more information on how to run the tests.
- When the tests run, they will save their logs to /test_logs instead
of printing them so you can read them later.
- Change names of source file directories to make them more descriptive