- Previously: if the header field is not an attribute type, then
it was added as an attribute field.
PyMISP then used to skip it if needed
- Now: Those fields are discarded before they are put in an attribute
- If more than 1 misp type is recognized, for each one an
attribute is created
- Needs to have header set by user as parameters of the module atm
- Review needed to see the feasibility with fields that can create
confusion and be interpreted both as misp type or attribute field
(for instance comment is a misp type and an attribute field)
When an emails contains headers that use Unicode without properly crafing
them to comform to RFC-6323 the email import module would crash.
(See issue #119 & issue #93)
To address this I have added additional layers of encoding/decoding to
any possibly internationalized email headers. This decodes properly
formed and malformed UTF-8, UTF-16, and UTF-32 headers appropriately.
When an unknown encoding is encountered it is returned as an 'encoded-word'
per RFC2047.
This commit also adds unit-tests that tests properly formed and malformed
UTF-8, UTF-16, UTF-32, and CJK encoded strings in all header fields; UTF-8,
UTF-16, and UTF-32 encoded message bodies; and emoji testing for headers
and attachment file names.
Let this commit serve as a warning about the perils of duck typing.
Word documents (docx,odt,etc) were being uncompressed when they were
attached to emails. The email importer now checks a list of well known
extensions and will not attempt to unzip them.
It is stuck using a list of extensions instead of using file magic because
many of these formats produce an application/zip mimetype when scanned.