{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "nbsphinx": "hidden" }, "outputs": [], "source": [ "# Delete this cell to re-enable tracebacks\n", "import sys\n", "ipython = get_ipython()\n", "\n", "def hide_traceback(exc_tuple=None, filename=None, tb_offset=None,\n", " exception_only=False, running_compiled_code=False):\n", " etype, value, tb = sys.exc_info()\n", " value.__cause__ = None # suppress chained exceptions\n", " return ipython._showtraceback(etype, value, ipython.InteractiveTB.get_exception_only(etype, value))\n", "\n", "ipython.showtraceback = hide_traceback" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "nbsphinx": "hidden" }, "outputs": [], "source": [ "# JSON output syntax highlighting\n", "from __future__ import print_function\n", "from pygments import highlight\n", "from pygments.lexers import JsonLexer, TextLexer\n", "from pygments.formatters import HtmlFormatter\n", "from IPython.display import display, HTML\n", "from IPython.core.interactiveshell import InteractiveShell\n", "\n", "InteractiveShell.ast_node_interactivity = \"all\"\n", "\n", "def json_print(inpt):\n", " string = str(inpt)\n", " formatter = HtmlFormatter()\n", " if string[0] == '{':\n", " lexer = JsonLexer()\n", " else:\n", " lexer = TextLexer()\n", " return HTML('{}'.format(\n", " formatter.get_style_defs('.highlight'),\n", " highlight(string, lexer, formatter)))\n", "\n", "globals()['print'] = json_print" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Checking Semantic Equivalence\n", "\n", "The [Environment](../api/stix2.environment.rst#stix2.environment.Environment) has a function for checking if two STIX Objects are semantically equivalent. For each supported object type, the algorithm checks if the values for a specific set of properties match. Then each matching property is weighted since every property doesn't represent the same level of importance for semantic equivalence. The result will be the sum of these weighted values, in the range of 0 to 100. A result of 0 means that the the two objects are not equivalent, and a result of 100 means that they are equivalent.\n", "\n", "TODO: Add a link to the committee note when it is released.\n", "\n", "There are a number of use cases for which calculating semantic equivalence may be helpful. It can be used for echo detection, in which a STIX producer who consumes content from other producers wants to make sure they are not creating content they have already seen or consuming content they have already created.\n", "\n", "Another use case for this functionality is to identify identical or near-identical content, such as a vulnerability shared under three different nicknames by three different STIX producers. A third use case involves a feed that aggregates data from multiple other sources. It will want to make sure that it is not publishing duplicate data.\n", "\n", "Below we will show examples of the semantic equivalence results of various objects. Unless otherwise specified, the ID of each object will be generated by the library, so the two objects will not have the same ID. This demonstrates that the semantic equivalence algorithm only looks at specific properties for each object type.\n", "\n", "**Please note** that you will need to install a few extra dependencies in order to use the semantic equivalence functions. You can do this using:\n", "\n", "```pip install stix2[semantic]```\n", "\n", "### Attack Pattern Example\n", "\n", "For Attack Patterns, the only properties that contribute to semantic equivalence are `name` and `external_references`, with weights of 30 and 70, respectively. In this example, both attack patterns have the same external reference but the second has a slightly different yet still similar name." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
91.9\n",
"
30.0\n",
"
100.0\n",
"
20.0\n",
"
67.20663955882583\n",
"
90.0\n",
"
6.6000000000000005\n",
"
100.0\n",
"
100.0\n",
"
0\n",
"
100.0\n",
"
6.6000000000000005\n",
"
Semantic equivalence score using standard weights: 16.6\n",
"
{'name': {'weight': 60, 'contributing_score': 6.6}, 'threat_actor_types': {'weight': 20, 'contributing_score': 10.0}, 'aliases': {'weight': 20, 'contributing_score': 0.0}, 'matching_score': 16.6, 'sum_weights': 100.0}\n",
"
Prop: name | weight: 60 | contributing_score: 6.6\n",
"
Prop: threat_actor_types | weight: 20 | contributing_score: 10.0\n",
"
Prop: aliases | weight: 20 | contributing_score: 0.0\n",
"
matching_score: 16.6\n",
"
sum_weights: 100.0\n",
"
Using standard weights: 16.6\n",
"
Using custom weights: 28.300000000000004\n",
"
{'name': {'weight': 45, 'contributing_score': 4.95}, 'threat_actor_types': {'weight': 10, 'contributing_score': 5.0}, 'aliases': {'weight': 45, 'contributing_score': 0.0}, 'matching_score': 9.95, 'sum_weights': 100.0}\n",
"
Using custom string comparison: 5.0\n",
"
Using standard weights: 16.6\n",
"
Using a custom method: 6.6000000000000005\n",
"
71.6\n",
"
{'name': (60, 60.0), 'color': (40, 11.6), 'matching_score': 71.6, 'sum_weights': 100.0}\n",
"