When an emails contains headers that use Unicode without properly crafing
them to comform to RFC-6323 the email import module would crash.
(See issue #119 & issue #93)
To address this I have added additional layers of encoding/decoding to
any possibly internationalized email headers. This decodes properly
formed and malformed UTF-8, UTF-16, and UTF-32 headers appropriately.
When an unknown encoding is encountered it is returned as an 'encoded-word'
per RFC2047.
This commit also adds unit-tests that tests properly formed and malformed
UTF-8, UTF-16, UTF-32, and CJK encoded strings in all header fields; UTF-8,
UTF-16, and UTF-32 encoded message bodies; and emoji testing for headers
and attachment file names.
Let this commit serve as a warning about the perils of duck typing.
Word documents (docx,odt,etc) were being uncompressed when they were
attached to emails. The email importer now checks a list of well known
extensions and will not attempt to unzip them.
It is stuck using a list of extensions instead of using file magic because
many of these formats produce an application/zip mimetype when scanned.
Added additional attribute parsing and corresponding unit-tests.
E-mail attachment and url extraction added in this commit. This includes
unpacking zipfiles and simple password cracking of encrypted zipfiles.
This email meta-data import module collects basic meta-data from an e-mail
and populates an event with it. It populates the email subject, source
addresses, destination addresses, subject, and any attachment file names.
This commit also contains unit-tests for this module as well as updates to
the readme. Readme updates are additions aimed to make it easier for
outsiders to build modules.