{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "nbsphinx": "hidden" }, "outputs": [], "source": [ "# Delete this cell to re-enable tracebacks\n", "import sys\n", "ipython = get_ipython()\n", "\n", "def hide_traceback(exc_tuple=None, filename=None, tb_offset=None,\n", " exception_only=False, running_compiled_code=False):\n", " etype, value, tb = sys.exc_info()\n", " return ipython._showtraceback(etype, value, ipython.InteractiveTB.get_exception_only(etype, value))\n", "\n", "ipython.showtraceback = hide_traceback" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "nbsphinx": "hidden" }, "outputs": [], "source": [ "# JSON output syntax highlighting\n", "from __future__ import print_function\n", "from pygments import highlight\n", "from pygments.lexers import JsonLexer, TextLexer\n", "from pygments.formatters import HtmlFormatter\n", "from IPython.display import display, HTML\n", "from IPython.core.interactiveshell import InteractiveShell\n", "\n", "InteractiveShell.ast_node_interactivity = \"all\"\n", "\n", "def json_print(inpt):\n", " string = str(inpt)\n", " formatter = HtmlFormatter()\n", " if string[0] == '{':\n", " lexer = JsonLexer()\n", " else:\n", " lexer = TextLexer()\n", " return HTML('{}'.format(\n", " formatter.get_style_defs('.highlight'),\n", " highlight(string, lexer, formatter)))\n", "\n", "globals()['print'] = json_print" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Checking Semantic Equivalence\n", "\n", "The [Environment](../api/stix2.environment.rst#stix2.environment.Environment) has a function for checking if two STIX Objects are semantically equivalent. For each supported object type, the algorithm checks if the values for a specific set of properties match. Then each matching property is weighted since every property doesn't represent the same level of importance for semantic equivalence. The result will be the sum of these weighted values, in the range of 0 to 100. A result of 0 means that the the two objects are not equivalent, and a result of 100 means that they are equivalent.\n", "\n", "TODO: Add a link to the committee note when it is released.\n", "\n", "Below we will show examples of the semantic equivalence results of various objects. Unless otherwise specified, the ID of each object will be generated by the library, so the two objects will not have the same ID. This demonstrates that the semantic equivalence algorithm only looks at specific properties for each object type.\n", "\n", "### Attack Pattern Example\n", "\n", "For Attack Patterns, the only properties that contribute to semantic equivalence are `name` and `external_references`, with weights of 30 and 70, respectively. In this example, both attack patterns have the same external reference but the second has a slightly different yet still similar name." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
85.3\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import stix2\n", "from stix2 import Environment, MemoryStore\n", "from stix2.v21 import AttackPattern\n", "\n", "env = Environment(store=MemoryStore())\n", "\n", "ap1 = AttackPattern(\n", " name=\"Phishing\",\n", " external_references=[\n", " {\n", " \"url\": \"https://example2\",\n", " \"source_name\": \"some-source2\",\n", " },\n", " ],\n", ")\n", "ap2 = AttackPattern(\n", " name=\"Spear phishing\",\n", " external_references=[\n", " {\n", " \"url\": \"https://example2\",\n", " \"source_name\": \"some-source2\",\n", " },\n", " ],\n", ")\n", "print(env.semantically_equivalent(ap1, ap2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Campaign Example\n", "\n", "For Campaigns, the only properties that contribute to semantic equivalence are `name` and `aliases`, with weights of 60 and 40, respectively. In this example, the two campaigns have completely different names, but slightly similar descriptions. The result may be higher than expected because the Jaro-Winkler algorithm used to compare string properties looks at the edit distance of the two strings rather than just the words in them." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
50.0\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from stix2.v21 import Campaign\n", "\n", "c1 = Campaign(\n", " name=\"Someone Attacks Somebody\",)\n", "\n", "c2 = Campaign(\n", " name=\"Another Campaign\",)\n", "print(env.semantically_equivalent(c1, c2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Identity Example\n", "\n", "For Identities, the only properties that contribute to semantic equivalence are `name`, `identity_class`, and `sectors`, with weights of 60, 20, and 20, respectively. In this example, the two identities are identical, but are missing one of the contributing properties. The algorithm only compares properties that are actually present on the objects. Also note that they have completely different description properties, but because description is not one of the properties considered for semantic equivalence, this difference has no effect on the result." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
100.0\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from stix2.v21 import Identity\n", "\n", "id1 = Identity(\n", " name=\"John Smith\",\n", " identity_class=\"individual\",\n", " description=\"Just some guy\",\n", ")\n", "id2 = Identity(\n", " name=\"John Smith\",\n", " identity_class=\"individual\",\n", " description=\"A person\",\n", ")\n", "print(env.semantically_equivalent(id1, id2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Indicator Example\n", "\n", "For Indicators, the only properties that contribute to semantic equivalence are `indicator_types`, `pattern`, and `valid_from`, with weights of 15, 80, and 5, respectively. In this example, the two indicators have patterns with different hashes but the same indicator_type and valid_from. For patterns, the algorithm currently only checks if they are identical." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "scrolled": true }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Indicator pattern equivalence is not fully defined; will default to zero if not completely identical\n" ] }, { "data": { "text/html": [ "
20.0\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from stix2.v21 import Indicator\n", "\n", "ind1 = Indicator(\n", " indicator_types=['malicious-activity'],\n", " pattern_type=\"stix\",\n", " pattern=\"[file:hashes.MD5 = 'd41d8cd98f00b204e9800998ecf8427e']\",\n", " valid_from=\"2017-01-01T12:34:56Z\",\n", ")\n", "ind2 = Indicator(\n", " indicator_types=['malicious-activity'],\n", " pattern_type=\"stix\",\n", " pattern=\"[file:hashes.MD5 = '79054025255fb1a26e4bc422aef54eb4']\",\n", " valid_from=\"2017-01-01T12:34:56Z\",\n", ")\n", "print(env.semantically_equivalent(ind1, ind2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If the patterns were identical the result would have been 100." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Location Example\n", "\n", "For Locations, the only properties that contribute to semantic equivalence are `longitude`/`latitude`, `region`, and `country`, with weights of 34, 33, and 33, respectively. In this example, the two locations are Washington, D.C. and New York City. The algorithm computes the distance between two locations using the haversine formula and uses that to influence equivalence." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
67.20663955882583\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from stix2.v21 import Location\n", "\n", "loc1 = Location(\n", " latitude=38.889,\n", " longitude=-77.023,\n", ")\n", "loc2 = Location(\n", " latitude=40.713,\n", " longitude=-74.006,\n", ")\n", "print(env.semantically_equivalent(loc1, loc2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Malware Example\n", "\n", "For Malware, the only properties that contribute to semantic equivalence are `malware_types` and `name`, with weights of 20 and 80, respectively. In this example, the two malware objects only differ in the strings in their malware_types lists. For lists, the algorithm bases its calculations on the intersection of the two lists. An empty intersection will result in a 0, and a complete intersection will result in a 1 for that property." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
90.0\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from stix2.v21 import Malware\n", "\n", "MALWARE_ID = \"malware--9c4638ec-f1de-4ddb-abf4-1b760417654e\"\n", "\n", "mal1 = Malware(id=MALWARE_ID,\n", " malware_types=['ransomware'],\n", " name=\"Cryptolocker\",\n", " is_family=False,\n", " )\n", "mal2 = Malware(id=MALWARE_ID,\n", " malware_types=['ransomware', 'dropper'],\n", " name=\"Cryptolocker\",\n", " is_family=False,\n", " )\n", "print(env.semantically_equivalent(mal1, mal2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Threat Actor Example\n", "\n", "For Threat Actors, the only properties that contribute to semantic equivalence are `threat_actor_types`, `name`, and `aliases`, with weights of 20, 60, and 20, respectively. In this example, the two threat actors have the same id properties but everything else is different. Since the id property does not factor into semantic equivalence, the result is not very high. The result is not zero because the algorithm is using the Jaro-Winkler distance between strings in the threat_actor_types and name properties." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
33.6\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from stix2.v21 import ThreatActor\n", "\n", "THREAT_ACTOR_ID = \"threat-actor--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f\"\n", "\n", "ta1 = ThreatActor(id=THREAT_ACTOR_ID,\n", " threat_actor_types=[\"crime-syndicate\"],\n", " name=\"Evil Org\",\n", " aliases=[\"super-evil\"],\n", ")\n", "ta2 = ThreatActor(id=THREAT_ACTOR_ID,\n", " threat_actor_types=[\"spy\"],\n", " name=\"James Bond\",\n", " aliases=[\"007\"],\n", ")\n", "print(env.semantically_equivalent(ta1, ta2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Tool Example\n", "\n", "For Tools, the only properties that contribute to semantic equivalence are `tool_types` and `name`, with weights of 20 and 80, respectively. In this example, the two tools have the same values for properties that contribute to semantic equivalence but one has an additional, non-contributing property." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
100.0\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from stix2.v21 import Tool\n", "\n", "t1 = Tool(\n", " tool_types=[\"remote-access\"],\n", " name=\"VNC\",\n", ")\n", "t2 = Tool(\n", " tool_types=[\"remote-access\"],\n", " name=\"VNC\",\n", " description=\"This is a tool\"\n", ")\n", "print(env.semantically_equivalent(t1, t2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Vulnerability Example\n", "\n", "For Vulnerabilities, the only properties that contribute to semantic equivalence are `name` and `external_references`, with weights of 30 and 70, respectively. In this example, the two vulnerabilities have the same name but one also has an external reference. The algorithm doesn't take into account any semantic equivalence contributing properties that are not present on both objects." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
100.0\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from stix2.v21 import Vulnerability\n", "\n", "vuln1 = Vulnerability(\n", " name=\"Heartbleed\",\n", " external_references=[\n", " {\n", " \"url\": \"https://example\",\n", " \"source_name\": \"some-source\",\n", " },\n", " ],\n", ")\n", "vuln2 = Vulnerability(\n", " name=\"Heartbleed\",\n", ")\n", "print(env.semantically_equivalent(vuln1, vuln2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Other Examples\n", "\n", "Comparing objects of different types will result in an error." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "ename": "ValueError", "evalue": "The objects to compare must be of the same type!", "output_type": "error", "traceback": [ "\u001b[0;31mValueError\u001b[0m\u001b[0;31m:\u001b[0m The objects to compare must be of the same type!\n" ] } ], "source": [ "print(env.semantically_equivalent(ind1, vuln1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some object types do not have a defined method for calculating semantic equivalence and by default will give a warning and a result of zero." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "'report' type has no 'weights' dict specified & thus no semantic equivalence method to call!\n" ] }, { "data": { "text/html": [ "
0\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from stix2.v21 import Report\n", "\n", "r1 = Report(\n", " report_types=[\"campaign\"],\n", " name=\"Bad Cybercrime\",\n", " published=\"2016-04-06T20:03:00.000Z\",\n", " object_refs=[\"indicator--a740531e-63ff-4e49-a9e1-a0a3eed0e3e7\"],\n", ")\n", "r2 = Report(\n", " report_types=[\"campaign\"],\n", " name=\"Bad Cybercrime\",\n", " published=\"2016-04-06T20:03:00.000Z\",\n", " object_refs=[\"indicator--a740531e-63ff-4e49-a9e1-a0a3eed0e3e7\"],\n", ")\n", "print(env.semantically_equivalent(r1, r2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By default, comparing objects of different spec versions will result in a `ValueError`." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "ename": "ValueError", "evalue": "The objects to compare must be of the same spec version!", "output_type": "error", "traceback": [ "\u001b[0;31mValueError\u001b[0m\u001b[0;31m:\u001b[0m The objects to compare must be of the same spec version!\n" ] } ], "source": [ "from stix2.v20 import Identity as Identity20\n", "\n", "id20 = Identity20(\n", " name=\"John Smith\",\n", " identity_class=\"individual\",\n", ")\n", "print(env.semantically_equivalent(id2, id20))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can optionally allow comparing across spec versions by providing a configuration dictionary like in the next example:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
100.0\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from stix2.v20 import Identity as Identity20\n", "\n", "id20 = Identity20(\n", " name=\"John Smith\",\n", " identity_class=\"individual\",\n", ")\n", "print(env.semantically_equivalent(id2, id20, **{\"_internal\": {\"ignore_spec_version\": True}}))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can modify the weights or provide your own functions for comparing objects of a certain type by providing them in a dictionary to the optional 3rd parameter to the semantic equivalence function. You can find functions (like `partial_string_based`) to help with this in the [Environment API docs](../api/stix2.environment.rst#stix2.environment.Environment). In this example we define semantic equivalence for our new `x-foobar` object type:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
60.0\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def _x_foobar_checks(obj1, obj2, **weights):\n", " matching_score = 0.0\n", " sum_weights = 0.0\n", " if stix2.environment.check_property_present(\"name\", obj1, obj2):\n", " w = weights[\"name\"]\n", " sum_weights += w\n", " matching_score += w * stix2.environment.partial_string_based(obj1[\"name\"], obj2[\"name\"])\n", " if stix2.environment.check_property_present(\"color\", obj1, obj2):\n", " w = weights[\"color\"]\n", " sum_weights += w\n", " matching_score += w * stix2.environment.partial_string_based(obj1[\"color\"], obj2[\"color\"])\n", " return matching_score, sum_weights\n", "\n", "weights = {\n", " \"x-foobar\": {\n", " \"name\": 60,\n", " \"color\": 40,\n", " \"method\": _x_foobar_checks,\n", " },\n", " \"_internal\": {\n", " \"ignore_spec_version\": False,\n", " },\n", "}\n", "foo1 = {\n", " \"type\":\"x-foobar\",\n", " \"id\":\"x-foobar--0c7b5b88-8ff7-4a4d-aa9d-feb398cd0061\",\n", " \"name\": \"Zot\",\n", " \"color\": \"red\",\n", "}\n", "foo2 = {\n", " \"type\":\"x-foobar\",\n", " \"id\":\"x-foobar--0c7b5b88-8ff7-4a4d-aa9d-feb398cd0061\",\n", " \"name\": \"Zot\",\n", " \"color\": \"blue\",\n", "}\n", "print(env.semantically_equivalent(foo1, foo2, **weights))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Detailed Results\n", "\n", "If your logging level is set to `DEBUG` or higher, the function will log more detailed results. These show the semantic equivalence and weighting for each property that is checked, to show how the final result was arrived at." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Starting semantic equivalence process between: 'threat-actor--54dc2aac-6fde-4a68-ae2a-0c0bc575ed70' and 'threat-actor--c51bce3b-a067-4692-ab77-fcdefdd3f157'\n", "--\t\tpartial_string_based 'Evil Org' 'James Bond'\tresult: '0.56'\n", "'name' check -- weight: 60, contributing score: 33.6\n", "--\t\tpartial_list_based '['crime-syndicate']' '['spy']'\tresult: '0.0'\n", "'threat_actor_types' check -- weight: 20, contributing score: 0.0\n", "--\t\tpartial_list_based '['super-evil']' '['007']'\tresult: '0.0'\n", "'aliases' check -- weight: 20, contributing score: 0.0\n", "Matching Score: 33.6, Sum of Weights: 100.0\n" ] }, { "data": { "text/html": [ "
33.6\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import logging\n", "logging.basicConfig(format='%(message)s')\n", "logger = logging.getLogger()\n", "logger.setLevel(logging.DEBUG)\n", "\n", "ta3 = ThreatActor(\n", " threat_actor_types=[\"crime-syndicate\"],\n", " name=\"Evil Org\",\n", " aliases=[\"super-evil\"],\n", ")\n", "ta4 = ThreatActor(\n", " threat_actor_types=[\"spy\"],\n", " name=\"James Bond\",\n", " aliases=[\"007\"],\n", ")\n", "print(env.semantically_equivalent(ta3, ta4))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Custom Comparisons\n", "If you wish, you can run your own custom semantic equivalence comparisons. Specifically, you can do any of three things:\n", " - Provide custom weights for each semantic equivalence contributing property\n", " - Provide custom comparison functions for individual semantic equivalence contributing properties\n", " - Provide a custom semantic equivalence method\n", "\n", "*Some of this has already been explained above, but we will go into more detail here.*\n", "\n", "#### The `weights` dictionary\n", "In order to do any of the aforementioned (*optional*) custom comparisons, you will need to provide a `weights` dictionary to the `semantically_equivalent()` method call. At a minimum, you must provide the custom weight and custom comparison function for each property. Now, you may use the default weights, or provide your own. You may also use any of the existing comparison functions, or provide your own.\n", "\n", "##### Existing comparison functions\n", "For reference, here is a list of comparison functions already in the codebase (found in stix2/environment.py):\n", " - `partial_timestamp_based`\n", " - `partial_list_based`\n", " - `exact_match`\n", " - `partial_string_based`\n", " - `custom_pattern_based`\n", " - `partial_external_reference_based`\n", " - `partial_location_distance`\n", "\n", "For instance, if we wanted to compare two `ThreatActor`s, but use our own weights, then we could do the following:\n", "\n", "(**Please note that if you provide a custom weights dictionary but not a custom semantic equivalence method [shown later], then you must follow the general format shown in the `weights` dict below**)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Using standard weights: 43.6\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "
Using custom weights: 41.8\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "weights = {\n", " \"threat-actor\": { # You must specify for which object type this dict is\n", " \"name\": (30, stix2.environment.partial_string_based), # Each property's value must be a tuple\n", " \"threat_actor_types\": (50, stix2.environment.partial_list_based), # The 1st component must be the weight\n", " \"aliases\": (20, stix2.environment.partial_list_based) # The 2nd component must be the comparison function\n", " }\n", "}\n", "\n", "ta5 = ThreatActor(\n", " threat_actor_types=[\"crime-syndicate\", \"spy\"],\n", " name=\"Evil Org\",\n", " aliases=[\"super-evil\"],\n", ")\n", "ta6 = ThreatActor(\n", " threat_actor_types=[\"spy\"],\n", " name=\"James Bond\",\n", " aliases=[\"007\"],\n", ")\n", "\n", "print(\"Using standard weights: %s\" % (env.semantically_equivalent(ta5, ta6)))\n", "print(\"Using custom weights: %s\" % (env.semantically_equivalent(ta5, ta6, **weights)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice how there is a difference in the semantic equivalence scores, simply due to the fact that custom weights were used.\n", "\n", "#### Custom Semantic Equivalence Function\n", "As said before, you can also write and use your own semantic equivalence method. To do this, you must provide a `weights` dictionary to `semantically_equivalent()`. In this dict, you will provide a key of \"method\" whose value will be your custom semantic equivalence function.\n", "\n", "If you provide your own custom semantic equivalence method, you **must also provide the weights for each of the properties** (unless, for some reason, your custom method is weights-agnostic). However, since you are writing the custom method, your weights need not necessarily follow the tuple format specified in the above code box.\n", "\n", "Here we use our own custom semantic equivalence function to compare two `ThreatActor`s. " ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Using a custom method: 21.263333333333335\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def custom_semantic_equivalence_method(obj1, obj2, **weights):\n", " sum_weights = 200.0\n", " matching_score = 20.19\n", " for prop in weights:\n", " if prop != \"method\":\n", " w = weights[prop][0]\n", " comp_funct = weights[prop][1]\n", " contributing_score = w * comp_funct(obj1[prop], obj2[prop])\n", " sum_weights += w\n", " matching_score += contributing_score\n", " return matching_score, sum_weights\n", "\n", "\n", "weights = {\n", " \"threat-actor\": {\n", " \"name\": (60, stix2.environment.partial_string_based), # We left each property's value as a tuple\n", " \"threat_actor_types\": (20, stix2.environment.partial_list_based), # However, weights could be simply numeric\n", " \"aliases\": (20, stix2.environment.partial_list_based), # They may also be anything else you want\n", " \"method\": custom_semantic_equivalence_method # As long as your func is written accordingly\n", " }\n", "}\n", "\n", "print(\"Using a custom method: %s\" % (env.semantically_equivalent(ta5, ta6, **weights)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice the semantic equivalence score of ~21.26 when using a custom semantic equivalence method to compare `ta5` & `ta6`. Compare this to the semantic equivalence score of 43.6 when using the default semantic equivalence method for comparing `ta5` & `ta6`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `prop_scores`\n", "The `semantically_equivalent()` function now takes an optional third argument, called `prop_scores`. As explained previously, the semantic equivalence functionality includes detailed debugging messages. This new argument is meant to be a dictionary that stores those detailed debugging messages so that the debug information can be accessed and used more programatically.\n", "\n", "Using `prop_scores` is simple: simply pass in a dictionary to `semantically_equivalent()`, and after the function is done executing, the dict will have the various scores in it. Specifically, it will have the overall `matching_score` and `sum_weights`, along with the weight and contributing score for each of the semantic equivalence contributing properties.\n", "\n", "For instance:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Semantic equivalence score using standard weights: 43.6\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "
Prop: name | weight: 60 | contributing_score: 33.6\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "
Prop: threat_actor_types | weight: 20 | contributing_score: 10.0\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "
Prop: aliases | weight: 20 | contributing_score: 0.0\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "
matching_score: 43.6\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "
sum_weights: 100.0\n",
       "
\n" ], "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "prop_scores = {}\n", "print(\"Semantic equivalence score using standard weights: %s\" % (env.semantically_equivalent(ta5, ta6, prop_scores)))\n", "for prop in prop_scores:\n", " if prop not in [\"matching_score\", \"sum_weights\"]:\n", " print (\"Prop: %s | weight: %s | contributing_score: %s\" % (prop, prop_scores[prop][0], prop_scores[prop][1]))\n", " else:\n", " print (\"%s: %s\" % (prop, prop_scores[prop]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we wanted, we could have also passed in a custom `weights` dict to the above `semantically_equivalent()` call. If we want to use both `prop_scores` and `weights`, then they would be the third and fourth arguments, respectively, to `sematically_equivalent()`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.7" } }, "nbformat": 4, "nbformat_minor": 2 }