* Same problem we've had before - linux filenames can have non-unicode chars
in them
* We need to write the filename as raw bytes to the log
* os.fsencode lets us convert a utf-8 encoded string to bytes and ignore those
that can't be printed as unicode
* Still not clear if the log generated this way will be human-readable
* src_root_path and dst_root_path are now converted to abspaths when
initializing kittengroomer to avoid any weird issues with relative or misformed
paths
* Wrote a new text-based logger that displays all file information in the tree
instead of using two separate logs
* Stopped using twiggy since it wasn't giving us anything useful
* Moved a lot of the logging code to filecheck, since it didn't really seem
appropriate as an API. Left a Logging stub in kittengroomer to hold methods
that might be useful for implementing other loggers.
* For the new logger, had to change the way that we traverse the items in the
source file tree.
* Realized that the API consumer might want to write their own logging tool.
* FileBase and KittenGroomerBase will have no logging code.
* If the API consumer likes, they can import GroomerLogger and use it in their
implementation.
The unicode right to left override character can be used for various attacks.
This commit:
* Detects this character in the filename on the source key
* Strips it from the path before copying it to the dest key
* Marks the file as dangerous (this character doesn't belong in a filename)
* Each test folder now copies files into its own test directory
* Change gitignore due to dst dir changes
* Make sure logger.tree is called for every directory
* Changed safe_rmtree, safe_copy, safe_remove, and safe_mkdir to public methods
* If something is being used in a subclass it probably shouldn't be a private
method
* _file_props is a dict that will hold all information about the file
* Updated filecheck.py to reflect this
* Potentially will change contents of file_props to being attributes on the
file in the future. This change would be easy since all access to _file_props
is now via set_property and get_property methods.
* Add filename to _file_props
- Instead of one large function that mutates FileBase properties,
mimetype and main/subtype are determined by two separate methods
that return mimetypes.
- The API is not changed.
- Absence of mimetype is now None instead of an empty string.