MISP and taxonomies

MISP is an information sharing platform which are mainly designed for sharing cyber security indicators, fraud information or threat details. A critical aspect to support efficient sharing is the ability to tag, classify or reference these informations with common languages understood by the respective communities. MISP introduced a common set of vocabularies that are machine-readable called misp-taxonomies. The format used by the misp-taxonomies is a triple tag representation (aka machine tags). After an evaluation of the various formats that could describe vocabularies (RDF, RDFs, Turtle), the triple tag representation was selected due to the following advantages:

Simplicity (can be done in less than 1 hour with a small vocabulary) to create and transpose existing vocabularies into a taxonomy.
The tag representation doesn't need a complex underlying data structure (e.g. relational database can be used but also simple key-value storage).
The impact of introducing taxonomies in an user-interface should be limited.

The current library of misp-taxonomies includes more than 34 vocabularies including national document classification, incident classification, malware type, estimative languages. The diversity of vocabularies is inherent to the various use-cases and community relying on the MISP core software.

Research opportunities

Tagging Recommendation

MISP user tends to select the tags following two main approaches:

Using the vocabulary defined internally in their organizations
or relying on the sharing examples (e.g OSINT) to classify/tag the events like other organizations do.

Those two approaches are often combined when an organization decides to share the information outside their organization. This approach can be time consuming for a user irregularly classifying events in MISP. This also leads to potential errors or selecting a wrong classification. The research topic would be to improve the classification by proposing recommended taxonomies and tag to a user. There are two potential approaches in this domain:

Proposing related tags to the first tag added by a user.
Proposing related tags if existing information has been added to support the tagging recommendation.

This research could include multiple schemes to support the recommendations. Collecting existing use of tags within MISP communities and building recommendations based on the communities practices. Or creating cluster of related tags from the existing libraries.

Rule-based Language and Tagging

As the MISP taxonomies steadily grows, vocabularies or even tag can be in contradiction or incompatible. The current taxonomies are described in separate JSON files. The objective is to create something similar to the machine tags (with simplicity in mind) to allow to create rules about the potential incompatibility or exclusivity of tags compared to others.

Contraction of taxonomies or tags can usually be described by humans and these contradictions can be very subjective. So a collaborative technique might a possible line of approach to reduce such subjectiveness.

Mapping and Alignment of Taxonomies

The 34+ vocabularies in MISP taxonomies showed some alignments or overlap. There are many ontology and vocabulary matching tools relying on user feedback or not. The current cyber security taxonomies were not evaluated with existing ontology matcher tools. The current MISP taxonomy could be evaluated with the existing matcher tools. Additional work would need to be done in order to rely on the JSON MISP taxonomy format.

Some work has been done in alignment with the new common name mapping in misp-taxonomy.

MISP and taxonomies

Research opportunities

Tagging Recommendation

Rule-based Language and Tagging

Mapping and Alignment of Taxonomies

References