{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "nbsphinx": "hidden" }, "outputs": [], "source": [ "# Delete this cell to re-enable tracebacks\n", "import sys\n", "ipython = get_ipython()\n", "\n", "def hide_traceback(exc_tuple=None, filename=None, tb_offset=None,\n", " exception_only=False, running_compiled_code=False):\n", " etype, value, tb = sys.exc_info()\n", " value.__cause__ = None # suppress chained exceptions\n", " return ipython._showtraceback(etype, value, ipython.InteractiveTB.get_exception_only(etype, value))\n", "\n", "ipython.showtraceback = hide_traceback" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "nbsphinx": "hidden" }, "outputs": [], "source": [ "# JSON output syntax highlighting\n", "from __future__ import print_function\n", "from pygments import highlight\n", "from pygments.lexers import JsonLexer, TextLexer\n", "from pygments.formatters import HtmlFormatter\n", "from IPython.display import display, HTML\n", "from IPython.core.interactiveshell import InteractiveShell\n", "\n", "InteractiveShell.ast_node_interactivity = \"all\"\n", "\n", "def json_print(inpt):\n", " string = str(inpt)\n", " formatter = HtmlFormatter()\n", " if string[0] == '{':\n", " lexer = JsonLexer()\n", " else:\n", " lexer = TextLexer()\n", " return HTML('{}'.format(\n", " formatter.get_style_defs('.highlight'),\n", " highlight(string, lexer, formatter)))\n", "\n", "globals()['print'] = json_print" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Checking Object Similarity and Equivalence\n", "\n", "The [Environment](../api/stix2.environment.rst#stix2.environment.Environment) has functions for checking if two STIX Objects are very similar or identical. The functions differentiate between equivalence, which is a binary concept (two things are either equivalent or they are not), and similarity, which is a continuum (an object can be more similar to one object than to another). The similarity function answers the question, “How similar are these two objects?” while the equivalence function uses the similarity function to answer the question, “Are these two objects equivalent?”\n", "\n", "For each supported object type, the [object_similarity()](../api/stix2.environment.rst#stix2.environment.Environment.object_similarity) function checks if the values for a specific set of properties match. Then each matching property is weighted since every property does not represent the same level of importance for semantic similarity. The result will be the sum of these weighted values, in the range of 0 to 100. A result of 0 means that the two objects are not equivalent, and a result of 100 means that they are equivalent. Values in between mean the two objects are more or less similar and can be used to determine if they should be considered equivalent or not. The [object_equivalence()](../api/stix2.environment.rst#stix2.environment.Environment.object_equivalence) calls [object_similarity()](../api/stix2.environment.rst#stix2.environment.Environment.object_similarity) and compares the result to a threshold to determine if the objects are equivalent. Different organizations or users may use different thresholds.\n", "\n", "TODO: Add a link to the committee note when it is released.\n", "\n", "There are a number of use cases for which calculating semantic equivalence may be helpful. It can be used for echo detection, in which a STIX producer who consumes content from other producers wants to make sure they are not creating content they have already seen or consuming content they have already created.\n", "\n", "Another use case for this functionality is to identify identical or near-identical content, such as a vulnerability shared under three different nicknames by three different STIX producers. A third use case involves a feed that aggregates data from multiple other sources. It will want to make sure that it is not publishing duplicate data.\n", "\n", "Below we will show examples of the semantic similarity results of various objects. Unless otherwise specified, the ID of each object will be generated by the library, so the two objects will not have the same ID. This demonstrates that the semantic similarity algorithm only looks at specific properties for each object type. Each example also shows the result of calling the equivalence function, with a threshold value of `90`.\n", "\n", "**Please note** that you will need to install a few extra dependencies in order to use the semantic equivalence functions. You can do this using:\n", "\n", "```pip install stix2[semantic]```\n", "\n", "### Attack Pattern Example\n", "\n", "For Attack Patterns, the only properties that contribute to semantic similarity are `name` and `external_references`, with weights of 30 and 70, respectively. In this example, both attack patterns have the same external reference but the second has a slightly different yet still similar name." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
91.81818181818181\n",
"
True\n",
"
30.0\n",
"
False\n",
"
100.0\n",
"
True\n",
"
20.0\n",
"
False\n",
"
67.20663955882583\n",
"
False\n",
"
90.0\n",
"
True\n",
"
6.66666666666667\n",
"
False\n",
"
100.0\n",
"
True\n",
"
100.0\n",
"
True\n",
"
0\n",
"
100.0\n",
"
6.66666666666667\n",
"
Semantic equivalence score using standard weights: 16.666666666666668\n",
"
{'name': {'weight': 60, 'contributing_score': 6.666666666666669}, 'threat_actor_types': {'weight': 20, 'contributing_score': 10.0}, 'aliases': {'weight': 20, 'contributing_score': 0.0}, 'matching_score': 16.666666666666668, 'sum_weights': 100.0}\n",
"
Prop: name | weight: 60 | contributing_score: 6.666666666666669\n",
"
Prop: threat_actor_types | weight: 20 | contributing_score: 10.0\n",
"
Prop: aliases | weight: 20 | contributing_score: 0.0\n",
"
matching_score: 16.666666666666668\n",
"
sum_weights: 100.0\n",
"
Using standard weights: 16.666666666666668\n",
"
Using custom weights: 28.33333333333334\n",
"
{'name': {'weight': 45, 'contributing_score': 5.000000000000002}, 'threat_actor_types': {'weight': 10, 'contributing_score': 5.0}, 'aliases': {'weight': 45, 'contributing_score': 0.0}, 'matching_score': 10.000000000000002, 'sum_weights': 100.0}\n",
"
Using custom string comparison: 5.0\n",
"
Using standard weights: 16.666666666666668\n",
"
Using a custom method: 6.66666666666667\n",
"
71.42857142857143\n",
"
{'name': (60, 60.0), 'color': (40, 11.428571428571427), 'matching_score': 71.42857142857143, 'sum_weights': 100.0}\n",
"
59.68831168831168\n",
"
False\n",
"
{\n",
" "matching_score": 835.6363636363635,\n",
" "len_pairs": 14,\n",
" "summary": {\n",
" "threat-actor--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f": {\n",
" "lhs": "threat-actor--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f",\n",
" "rhs": "threat-actor--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f",\n",
" "prop_score": {\n",
" "name": {\n",
" "weight": 60,\n",
" "contributing_score": 6.666666666666669\n",
" },\n",
" "threat_actor_types": {\n",
" "weight": 20,\n",
" "contributing_score": 0.0\n",
" },\n",
" "aliases": {\n",
" "weight": 20,\n",
" "contributing_score": 0.0\n",
" },\n",
" "matching_score": 6.666666666666669,\n",
" "sum_weights": 100.0\n",
" },\n",
" "value": 6.66666666666667\n",
" },\n",
" "campaign--02eb6d99-15d3-4534-99ce-d5f946ca52fe": {\n",
" "lhs": "campaign--02eb6d99-15d3-4534-99ce-d5f946ca52fe",\n",
" "rhs": "campaign--d7fecca0-d020-43ae-977d-8d226df84c36",\n",
" "prop_score": {\n",
" "name": {\n",
" "weight": 60,\n",
" "contributing_score": 18.0\n",
" },\n",
" "matching_score": 18.0,\n",
" "sum_weights": 60.0\n",
" },\n",
" "value": 30.0\n",
" },\n",
" "campaign--d7fecca0-d020-43ae-977d-8d226df84c36": {\n",
" "lhs": "campaign--d7fecca0-d020-43ae-977d-8d226df84c36",\n",
" "rhs": "campaign--02eb6d99-15d3-4534-99ce-d5f946ca52fe",\n",
" "prop_score": {\n",
" "name": {\n",
" "weight": 60,\n",
" "contributing_score": 18.0\n",
" },\n",
" "matching_score": 18.0,\n",
" "sum_weights": 60.0\n",
" },\n",
" "value": 30.0\n",
" },\n",
" "indicator--d17a1296-d6c9-4119-9fbf-433c7f1f11af": {\n",
" "lhs": "indicator--d17a1296-d6c9-4119-9fbf-433c7f1f11af",\n",
" "rhs": "indicator--d2e7d0b6-4229-447d-9c44-2b0f7d93797b",\n",
" "prop_score": {\n",
" "indicator_types": {\n",
" "weight": 15,\n",
" "contributing_score": 15.0\n",
" },\n",
" "pattern": {\n",
" "weight": 80,\n",
" "contributing_score": 0\n",
" },\n",
" "valid_from": {\n",
" "weight": 5,\n",
" "contributing_score": 5.0\n",
" },\n",
" "matching_score": 20.0,\n",
" "sum_weights": 100.0\n",
" },\n",
" "value": 20.0\n",
" },\n",
" "indicator--d2e7d0b6-4229-447d-9c44-2b0f7d93797b": {\n",
" "lhs": "indicator--d2e7d0b6-4229-447d-9c44-2b0f7d93797b",\n",
" "rhs": "indicator--d17a1296-d6c9-4119-9fbf-433c7f1f11af",\n",
" "prop_score": {\n",
" "indicator_types": {\n",
" "weight": 15,\n",
" "contributing_score": 15.0\n",
" },\n",
" "pattern": {\n",
" "weight": 80,\n",
" "contributing_score": 0\n",
" },\n",
" "valid_from": {\n",
" "weight": 5,\n",
" "contributing_score": 5.0\n",
" },\n",
" "matching_score": 20.0,\n",
" "sum_weights": 100.0\n",
" },\n",
" "value": 20.0\n",
" },\n",
" "relationship--b399060e-0cdb-4e41-a30e-5894ae3627e8": {\n",
" "lhs": "relationship--b399060e-0cdb-4e41-a30e-5894ae3627e8",\n",
" "rhs": "relationship--b97e59e9-5e0d-47ef-a3f9-6a6e4fcefaab",\n",
" "prop_score": {\n",
" "relationship_type": {\n",
" "weight": 20,\n",
" "contributing_score": 20.0\n",
" },\n",
" "source_ref": {\n",
" "weight": 40,\n",
" "contributing_score": 2.666666666666668\n",
" },\n",
" "target_ref": {\n",
" "weight": 40,\n",
" "contributing_score": 36.0\n",
" },\n",
" "matching_score": 58.66666666666667,\n",
" "sum_weights": 100.0\n",
" },\n",
" "value": 58.666666666666664\n",
" },\n",
" "relationship--b97e59e9-5e0d-47ef-a3f9-6a6e4fcefaab": {\n",
" "lhs": "relationship--b97e59e9-5e0d-47ef-a3f9-6a6e4fcefaab",\n",
" "rhs": "relationship--b399060e-0cdb-4e41-a30e-5894ae3627e8",\n",
" "prop_score": {\n",
" "relationship_type": {\n",
" "weight": 20,\n",
" "contributing_score": 20.0\n",
" },\n",
" "source_ref": {\n",
" "weight": 40,\n",
" "contributing_score": 2.666666666666668\n",
" },\n",
" "target_ref": {\n",
" "weight": 40,\n",
" "contributing_score": 36.0\n",
" },\n",
" "matching_score": 58.66666666666667,\n",
" "sum_weights": 100.0\n",
" },\n",
" "value": 58.666666666666664\n",
" },\n",
" "report--87a26bd6-2870-44de-980f-e4cc6b63e1d5": {\n",
" "lhs": "report--87a26bd6-2870-44de-980f-e4cc6b63e1d5",\n",
" "rhs": "report--a71101c7-6064-4b8f-a9b4-ff49ff65e524",\n",
" "prop_score": {\n",
" "name": {\n",
" "weight": 30,\n",
" "contributing_score": 30.0\n",
" },\n",
" "published": {\n",
" "weight": 10,\n",
" "contributing_score": 10.0\n",
" },\n",
" "object_refs": {\n",
" "weight": 60,\n",
" "contributing_score": 29.0\n",
" },\n",
" "matching_score": 69.0,\n",
" "sum_weights": 100.0\n",
" },\n",
" "value": 69.0\n",
" },\n",
" "report--a71101c7-6064-4b8f-a9b4-ff49ff65e524": {\n",
" "lhs": "report--a71101c7-6064-4b8f-a9b4-ff49ff65e524",\n",
" "rhs": "report--87a26bd6-2870-44de-980f-e4cc6b63e1d5",\n",
" "prop_score": {\n",
" "name": {\n",
" "weight": 30,\n",
" "contributing_score": 30.0\n",
" },\n",
" "published": {\n",
" "weight": 10,\n",
" "contributing_score": 10.0\n",
" },\n",
" "object_refs": {\n",
" "weight": 60,\n",
" "contributing_score": 29.0\n",
" },\n",
" "matching_score": 69.0,\n",
" "sum_weights": 100.0\n",
" },\n",
" "value": 69.0\n",
" },\n",
" "identity--2b40ba3f-aa22-4e11-bd9d-e4843927ad32": {\n",
" "lhs": "identity--2b40ba3f-aa22-4e11-bd9d-e4843927ad32",\n",
" "rhs": "identity--4d8b54e3-d584-47c6-858f-673fffa45e96",\n",
" "prop_score": {\n",
" "name": {\n",
" "weight": 60,\n",
" "contributing_score": 60.0\n",
" },\n",
" "identity_class": {\n",
" "weight": 20,\n",
" "contributing_score": 20.0\n",
" },\n",
" "matching_score": 80.0,\n",
" "sum_weights": 80.0\n",
" },\n",
" "value": 100.0\n",
" },\n",
" "identity--4d8b54e3-d584-47c6-858f-673fffa45e96": {\n",
" "lhs": "identity--4d8b54e3-d584-47c6-858f-673fffa45e96",\n",
" "rhs": "identity--2b40ba3f-aa22-4e11-bd9d-e4843927ad32",\n",
" "prop_score": {\n",
" "name": {\n",
" "weight": 60,\n",
" "contributing_score": 60.0\n",
" },\n",
" "identity_class": {\n",
" "weight": 20,\n",
" "contributing_score": 20.0\n",
" },\n",
" "matching_score": 80.0,\n",
" "sum_weights": 80.0\n",
" },\n",
" "value": 100.0\n",
" },\n",
" "attack-pattern--57bc38b5-feda-4710-b613-441717c0062c": {\n",
" "lhs": "attack-pattern--57bc38b5-feda-4710-b613-441717c0062c",\n",
" "rhs": "attack-pattern--d9de40c6-a9a0-4e6f-ae59-d90a91e4f0e8",\n",
" "prop_score": {\n",
" "name": {\n",
" "weight": 30,\n",
" "contributing_score": 21.818181818181817\n",
" },\n",
" "external_references": {\n",
" "weight": 70,\n",
" "contributing_score": 70.0\n",
" },\n",
" "matching_score": 91.81818181818181,\n",
" "sum_weights": 100.0\n",
" },\n",
" "value": 91.81818181818181\n",
" },\n",
" "attack-pattern--d9de40c6-a9a0-4e6f-ae59-d90a91e4f0e8": {\n",
" "lhs": "attack-pattern--d9de40c6-a9a0-4e6f-ae59-d90a91e4f0e8",\n",
" "rhs": "attack-pattern--57bc38b5-feda-4710-b613-441717c0062c",\n",
" "prop_score": {\n",
" "name": {\n",
" "weight": 30,\n",
" "contributing_score": 21.818181818181817\n",
" },\n",
" "external_references": {\n",
" "weight": 70,\n",
" "contributing_score": 70.0\n",
" },\n",
" "matching_score": 91.81818181818181,\n",
" "sum_weights": 100.0\n",
" },\n",
" "value": 91.81818181818181\n",
" },\n",
" "malware--9c4638ec-f1de-4ddb-abf4-1b760417654e": {\n",
" "lhs": "malware--9c4638ec-f1de-4ddb-abf4-1b760417654e",\n",
" "rhs": "malware--9c4638ec-f1de-4ddb-abf4-1b760417654e",\n",
" "prop_score": {\n",
" "malware_types": {\n",
" "weight": 20,\n",
" "contributing_score": 10.0\n",
" },\n",
" "name": {\n",
" "weight": 80,\n",
" "contributing_score": 80.0\n",
" },\n",
" "matching_score": 90.0,\n",
" "sum_weights": 100.0\n",
" },\n",
" "value": 90.0\n",
" }\n",
" }\n",
"}\n",
"