Commit Graph

13 Commits (431c1511a37d700fd6cd1e70f23fb18493bd6b4e)

Author SHA1 Message Date
seamus tuohy 40c71af637 Added support for malformed internationalized email headers
When an emails contains headers that use Unicode without properly crafing
them to comform to RFC-6323 the email import module would crash.
(See issue #119 & issue #93)

To address this I have added additional layers of encoding/decoding to
any possibly internationalized email headers. This decodes properly
formed and malformed UTF-8, UTF-16, and UTF-32 headers appropriately.
When an unknown encoding is encountered it is returned as an 'encoded-word'
per RFC2047.

This commit also adds unit-tests that tests properly formed and malformed
UTF-8, UTF-16, UTF-32, and CJK encoded strings in all header fields; UTF-8,
UTF-16, and UTF-32 encoded message bodies; and emoji testing for headers
and attachment file names.
2017-07-02 18:03:14 -04:00
seamus tuohy 83a9d695ea Email import no longer unzips major compressed text document formats.
Let this commit serve as a warning about the perils of duck typing.
Word documents (docx,odt,etc) were being uncompressed when they were
attached to emails. The email importer now checks a list of well known
extensions and will not attempt to unzip them.

It is stuck using a list of extensions instead of using file magic because
many of these formats produce an application/zip mimetype when scanned.
2017-01-10 09:55:33 -05:00
Raphaël Vinot 1051e2210b Keep zip content as binary 2017-01-07 19:30:00 -05:00
Raphaël Vinot 9f84db3659 Fix tests, cleanup 2017-01-07 18:36:08 -05:00
Raphaël Vinot 2db845c45c Improve support of email attachments
Related to #90
2017-01-07 14:39:52 -05:00
Raphaël Vinot b51806ac9f Improve support of email importer if headers are missing
Fix #88
2017-01-07 10:25:38 -05:00
Raphaël Vinot 02f5e95a98 Fix python 3.6 support 2017-01-06 20:36:09 -05:00
Raphaël Vinot 93a49c3c1d Make PEP8 happy 2017-01-06 19:01:19 -05:00
Raphaël Vinot 3f83357a2d Fix failing test (bug in the mail parser?) 2017-01-06 18:56:29 -05:00
seamus tuohy 1a7973bc06 Add additional email parsing and tests
Added additional attribute parsing and corresponding unit-tests.
E-mail attachment and url extraction added in this commit. This includes
unpacking zipfiles and simple password cracking of encrypted zipfiles.
2017-01-04 10:21:36 -08:00
seamus tuohy 0ff270a3be Fixed basic errors 2016-12-26 14:33:10 -08:00
seamus tuohy 86ae72c444 Added attachment and url support 2016-12-26 13:55:54 -08:00
seamus tuohy 5033b1a9ca Added email meta-data import module.
This email meta-data import module collects basic meta-data from an e-mail
and populates an event with it. It populates the email subject, source
addresses, destination addresses, subject, and any attachment file names.
This commit also contains unit-tests for this module as well as updates to
the readme. Readme updates are additions aimed to make it easier for
outsiders to build modules.
2016-10-22 17:13:20 -04:00