add text and example for equivalence.ipynb

pull/1/head
Emmanuelle Vargas-Gonzalez 2021-02-19 14:48:23 -05:00
parent 35f4bb0443
commit 2308528957
1 changed files with 428 additions and 2 deletions

View File

@ -57,11 +57,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Checking Semantic Equivalence\n",
"## Checking Object Similarity and Equivalence\n",
"\n",
"The [Environment](../api/stix2.environment.rst#stix2.environment.Environment) has functions for checking if two STIX Objects are very similar or identical. The functions differentiate between equivalence, which is a binary concept (two things are either equivalent or they are not), and similarity, which is a continuum (an object can be more similar to one object than to another). The similarity function answers the question, “How similar are these two objects?” while the equivalence function uses the similarity function to answer the question, “Are these two objects equivalent?”\n",
"\n",
"For each supported object type, the [object_similarity()](../api/stix2.environment.rst#stix2.environment.Environment.object_similarity) function checks if the values for a specific set of properties match. Then each matching property is weighted since every property doesn't represent the same level of importance for semantic similarity. The result will be the sum of these weighted values, in the range of 0 to 100. A result of 0 means that the the two objects are not equivalent, and a result of 100 means that they are equivalent. Values in between mean the two objects are more or less similar, and can be used to determine if they should be considered equivalent or not. The [object_equivalence()](../api/stix2.environment.rst#stix2.environment.Environment.object_equivalence) calls [object_similarity()](../api/stix2.environment.rst#stix2.environment.Environment.object_similarity) and compares the result to a threshold to determine if the objects are equivalent. Different organizations or users may use different thresholds.\n",
"For each supported object type, the [object_similarity()](../api/stix2.environment.rst#stix2.environment.Environment.object_similarity) function checks if the values for a specific set of properties match. Then each matching property is weighted since every property does not represent the same level of importance for semantic similarity. The result will be the sum of these weighted values, in the range of 0 to 100. A result of 0 means that the two objects are not equivalent, and a result of 100 means that they are equivalent. Values in between mean the two objects are more or less similar and can be used to determine if they should be considered equivalent or not. The [object_equivalence()](../api/stix2.environment.rst#stix2.environment.Environment.object_equivalence) calls [object_similarity()](../api/stix2.environment.rst#stix2.environment.Environment.object_similarity) and compares the result to a threshold to determine if the objects are equivalent. Different organizations or users may use different thresholds.\n",
"\n",
"TODO: Add a link to the committee note when it is released.\n",
"\n",
@ -3956,6 +3956,432 @@
"print(env.object_similarity(foo1, foo2, prop_scores, **weights))\n",
"print(prop_scores)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Checking Graph Similarity and Equivalence\n",
"\n",
"The next logical step for checking if two individual objects are similar or equivalent is to check all relevant neighbors or equal type objects for the best match. It can help you determine if you have seen similar intelligence in the past and builds upon the same foundation of the local object similarity comparisons. The [Environment](../api/stix2.environment.rst#stix2.environment.Environment) has two functions with similar requirements for graph-based checks.\n",
"\n",
"For each supported object type, the [graph_similarity()](../api/stix2.environment.rst#stix2.environment.Environment.graph_similarity) function checks if the values for a specific set of objects to match and will compare against all of the same type objects maximizing for score obtained from the properties match. It requires two DataStore instances which will serve as our graph representation and will allow the algorithm to make additional checks like de-referencing objects. Internally it calls [object_similarity()](../api/stix2.environment.rst#stix2.environment.Environment.object_similarity). \n",
"\n",
"Some limitations exist in the implementation that are important to be understood by those analyzing the results of this algorithm.\n",
"- Only STIX types with weights defined will be checked. This could result in a maximal sub-graph and score that is smaller than expect. We recommend looking at the prop_scores or logging output for details and to understand how the result was calculated.\n",
"- Failure to de-reference an object for checks will result in a 0 for that property. This applies to `*_ref` or `*_refs` properties.\n",
"- Keep reasonable expectations in running-time, especially with DataStores that require network communication or when the number of items in the graphs is high. You can also tune how much depth the algorithm should check in de-reference calls; this can affect your running-time.\n",
"\n",
"**Please note** that you will need to install the TAXII dependencies in addition to the semantic requirements if you plan on using the TAXII DataStore classes. You can do this using:\n",
"\n",
"```pip install stix2[taxii]```\n",
"\n",
"#### Graph Similarity and Equivalence Example\n",
"\n",
"By default, it use default weights defined here [object_similarity()](../api/stix2.environment.rst#stix2.environment.Environment.object_similarity) in combination with [graph_similarity()](../api/stix2.environment.rst#stix2.environment.Environment.graph_similarity)."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"59.68831168831168\n",
"False\n",
"{\n",
" \"matching_score\": 835.6363636363635,\n",
" \"len_pairs\": 14,\n",
" \"summary\": {\n",
" \"campaign--a8c85d5d-bdc6-4613-8e0b-b836ff450c28\": {\n",
" \"lhs\": \"campaign--a8c85d5d-bdc6-4613-8e0b-b836ff450c28\",\n",
" \"rhs\": \"campaign--caf3f196-1d91-4b87-9f3b-855967af6782\",\n",
" \"prop_score\": {\n",
" \"name\": {\n",
" \"weight\": 60,\n",
" \"contributing_score\": 18.000000000000004\n",
" },\n",
" \"matching_score\": 18.000000000000004,\n",
" \"sum_weights\": 60.0\n",
" },\n",
" \"value\": 30.000000000000004\n",
" },\n",
" \"campaign--caf3f196-1d91-4b87-9f3b-855967af6782\": {\n",
" \"lhs\": \"campaign--caf3f196-1d91-4b87-9f3b-855967af6782\",\n",
" \"rhs\": \"campaign--a8c85d5d-bdc6-4613-8e0b-b836ff450c28\",\n",
" \"prop_score\": {\n",
" \"name\": {\n",
" \"weight\": 60,\n",
" \"contributing_score\": 18.000000000000004\n",
" },\n",
" \"matching_score\": 18.000000000000004,\n",
" \"sum_weights\": 60.0\n",
" },\n",
" \"value\": 30.000000000000004\n",
" },\n",
" \"attack-pattern--eb837f70-9798-4907-8c8a-bf883f7f4ec3\": {\n",
" \"lhs\": \"attack-pattern--eb837f70-9798-4907-8c8a-bf883f7f4ec3\",\n",
" \"rhs\": \"attack-pattern--94caa050-50d1-4c20-9891-a1b9f47d2448\",\n",
" \"prop_score\": {\n",
" \"name\": {\n",
" \"weight\": 30,\n",
" \"contributing_score\": 21.81818181818182\n",
" },\n",
" \"external_references\": {\n",
" \"weight\": 70,\n",
" \"contributing_score\": 70.0\n",
" },\n",
" \"matching_score\": 91.81818181818181,\n",
" \"sum_weights\": 100.0\n",
" },\n",
" \"value\": 91.81818181818181\n",
" },\n",
" \"attack-pattern--94caa050-50d1-4c20-9891-a1b9f47d2448\": {\n",
" \"lhs\": \"attack-pattern--94caa050-50d1-4c20-9891-a1b9f47d2448\",\n",
" \"rhs\": \"attack-pattern--eb837f70-9798-4907-8c8a-bf883f7f4ec3\",\n",
" \"prop_score\": {\n",
" \"name\": {\n",
" \"weight\": 30,\n",
" \"contributing_score\": 21.81818181818182\n",
" },\n",
" \"external_references\": {\n",
" \"weight\": 70,\n",
" \"contributing_score\": 70.0\n",
" },\n",
" \"matching_score\": 91.81818181818181,\n",
" \"sum_weights\": 100.0\n",
" },\n",
" \"value\": 91.81818181818181\n",
" },\n",
" \"identity--8d29b554-9904-430c-bc78-82c97750350a\": {\n",
" \"lhs\": \"identity--8d29b554-9904-430c-bc78-82c97750350a\",\n",
" \"rhs\": \"identity--4a4daf92-7c94-407c-a303-3a51924c32a0\",\n",
" \"prop_score\": {\n",
" \"name\": {\n",
" \"weight\": 60,\n",
" \"contributing_score\": 60.0\n",
" },\n",
" \"identity_class\": {\n",
" \"weight\": 20,\n",
" \"contributing_score\": 20.0\n",
" },\n",
" \"matching_score\": 80.0,\n",
" \"sum_weights\": 80.0\n",
" },\n",
" \"value\": 100.0\n",
" },\n",
" \"identity--4a4daf92-7c94-407c-a303-3a51924c32a0\": {\n",
" \"lhs\": \"identity--4a4daf92-7c94-407c-a303-3a51924c32a0\",\n",
" \"rhs\": \"identity--8d29b554-9904-430c-bc78-82c97750350a\",\n",
" \"prop_score\": {\n",
" \"name\": {\n",
" \"weight\": 60,\n",
" \"contributing_score\": 60.0\n",
" },\n",
" \"identity_class\": {\n",
" \"weight\": 20,\n",
" \"contributing_score\": 20.0\n",
" },\n",
" \"matching_score\": 80.0,\n",
" \"sum_weights\": 80.0\n",
" },\n",
" \"value\": 100.0\n",
" },\n",
" \"threat-actor--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f\": {\n",
" \"lhs\": \"threat-actor--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f\",\n",
" \"rhs\": \"threat-actor--8e2e2d2b-17d4-4cbf-938f-98ee46b3cd3f\",\n",
" \"prop_score\": {\n",
" \"name\": {\n",
" \"weight\": 60,\n",
" \"contributing_score\": 6.66666666666667\n",
" },\n",
" \"threat_actor_types\": {\n",
" \"weight\": 20,\n",
" \"contributing_score\": 0.0\n",
" },\n",
" \"aliases\": {\n",
" \"weight\": 20,\n",
" \"contributing_score\": 0.0\n",
" },\n",
" \"matching_score\": 6.66666666666667,\n",
" \"sum_weights\": 100.0\n",
" },\n",
" \"value\": 6.66666666666667\n",
" },\n",
" \"indicator--9884b67b-5a83-4377-a941-1821f705a6aa\": {\n",
" \"lhs\": \"indicator--9884b67b-5a83-4377-a941-1821f705a6aa\",\n",
" \"rhs\": \"indicator--94467f25-2857-4d86-9c4d-f7a5a2cd20f4\",\n",
" \"prop_score\": {\n",
" \"indicator_types\": {\n",
" \"weight\": 15,\n",
" \"contributing_score\": 15.0\n",
" },\n",
" \"pattern\": {\n",
" \"weight\": 80,\n",
" \"contributing_score\": 0\n",
" },\n",
" \"valid_from\": {\n",
" \"weight\": 5,\n",
" \"contributing_score\": 5.0\n",
" },\n",
" \"matching_score\": 20.0,\n",
" \"sum_weights\": 100.0\n",
" },\n",
" \"value\": 20.0\n",
" },\n",
" \"indicator--94467f25-2857-4d86-9c4d-f7a5a2cd20f4\": {\n",
" \"lhs\": \"indicator--94467f25-2857-4d86-9c4d-f7a5a2cd20f4\",\n",
" \"rhs\": \"indicator--9884b67b-5a83-4377-a941-1821f705a6aa\",\n",
" \"prop_score\": {\n",
" \"indicator_types\": {\n",
" \"weight\": 15,\n",
" \"contributing_score\": 15.0\n",
" },\n",
" \"pattern\": {\n",
" \"weight\": 80,\n",
" \"contributing_score\": 0\n",
" },\n",
" \"valid_from\": {\n",
" \"weight\": 5,\n",
" \"contributing_score\": 5.0\n",
" },\n",
" \"matching_score\": 20.0,\n",
" \"sum_weights\": 100.0\n",
" },\n",
" \"value\": 20.0\n",
" },\n",
" \"report--5205b115-eb30-4ec3-89df-bd0b7ab3da7d\": {\n",
" \"lhs\": \"report--5205b115-eb30-4ec3-89df-bd0b7ab3da7d\",\n",
" \"rhs\": \"report--230450a3-f484-4555-ab1a-9bd67665d359\",\n",
" \"prop_score\": {\n",
" \"name\": {\n",
" \"weight\": 30,\n",
" \"contributing_score\": 30.0\n",
" },\n",
" \"published\": {\n",
" \"weight\": 10,\n",
" \"contributing_score\": 10.0\n",
" },\n",
" \"object_refs\": {\n",
" \"weight\": 60,\n",
" \"contributing_score\": 29.0\n",
" },\n",
" \"matching_score\": 69.0,\n",
" \"sum_weights\": 100.0\n",
" },\n",
" \"value\": 69.0\n",
" },\n",
" \"report--230450a3-f484-4555-ab1a-9bd67665d359\": {\n",
" \"lhs\": \"report--230450a3-f484-4555-ab1a-9bd67665d359\",\n",
" \"rhs\": \"report--5205b115-eb30-4ec3-89df-bd0b7ab3da7d\",\n",
" \"prop_score\": {\n",
" \"name\": {\n",
" \"weight\": 30,\n",
" \"contributing_score\": 30.0\n",
" },\n",
" \"published\": {\n",
" \"weight\": 10,\n",
" \"contributing_score\": 10.0\n",
" },\n",
" \"object_refs\": {\n",
" \"weight\": 60,\n",
" \"contributing_score\": 29.0\n",
" },\n",
" \"matching_score\": 69.0,\n",
" \"sum_weights\": 100.0\n",
" },\n",
" \"value\": 69.0\n",
" },\n",
" \"malware--9c4638ec-f1de-4ddb-abf4-1b760417654e\": {\n",
" \"lhs\": \"malware--9c4638ec-f1de-4ddb-abf4-1b760417654e\",\n",
" \"rhs\": \"malware--9c4638ec-f1de-4ddb-abf4-1b760417654e\",\n",
" \"prop_score\": {\n",
" \"malware_types\": {\n",
" \"weight\": 20,\n",
" \"contributing_score\": 10.0\n",
" },\n",
" \"name\": {\n",
" \"weight\": 80,\n",
" \"contributing_score\": 80.0\n",
" },\n",
" \"matching_score\": 90.0,\n",
" \"sum_weights\": 100.0\n",
" },\n",
" \"value\": 90.0\n",
" },\n",
" \"relationship--c9dea34f-fe7c-43a1-a496-766a4290d63d\": {\n",
" \"lhs\": \"relationship--c9dea34f-fe7c-43a1-a496-766a4290d63d\",\n",
" \"rhs\": \"relationship--2e8ec6c1-7934-416c-a471-d572ec84e1e7\",\n",
" \"prop_score\": {\n",
" \"relationship_type\": {\n",
" \"weight\": 20,\n",
" \"contributing_score\": 20.0\n",
" },\n",
" \"source_ref\": {\n",
" \"weight\": 40,\n",
" \"contributing_score\": 2.666666666666668\n",
" },\n",
" \"target_ref\": {\n",
" \"weight\": 40,\n",
" \"contributing_score\": 36.0\n",
" },\n",
" \"matching_score\": 58.66666666666667,\n",
" \"sum_weights\": 100.0\n",
" },\n",
" \"value\": 58.666666666666664\n",
" },\n",
" \"relationship--2e8ec6c1-7934-416c-a471-d572ec84e1e7\": {\n",
" \"lhs\": \"relationship--2e8ec6c1-7934-416c-a471-d572ec84e1e7\",\n",
" \"rhs\": \"relationship--c9dea34f-fe7c-43a1-a496-766a4290d63d\",\n",
" \"prop_score\": {\n",
" \"relationship_type\": {\n",
" \"weight\": 20,\n",
" \"contributing_score\": 20.0\n",
" },\n",
" \"source_ref\": {\n",
" \"weight\": 40,\n",
" \"contributing_score\": 2.666666666666668\n",
" },\n",
" \"target_ref\": {\n",
" \"weight\": 40,\n",
" \"contributing_score\": 36.0\n",
" },\n",
" \"matching_score\": 58.66666666666667,\n",
" \"sum_weights\": 100.0\n",
" },\n",
" \"value\": 58.666666666666664\n",
" }\n",
" }\n",
"}\n"
]
}
],
"source": [
"import json\n",
"\n",
"from stix2 import Relationship\n",
"\n",
"\n",
"g1 = [\n",
" AttackPattern(\n",
" name=\"Phishing\",\n",
" external_references=[\n",
" {\n",
" \"url\": \"https://example2\",\n",
" \"source_name\": \"some-source2\",\n",
" },\n",
" ],\n",
" ),\n",
" Campaign(name=\"Someone Attacks Somebody\"),\n",
" Identity(\n",
" name=\"John Smith\",\n",
" identity_class=\"individual\",\n",
" description=\"Just some guy\",\n",
" ),\n",
" Indicator(\n",
" indicator_types=['malicious-activity'],\n",
" pattern_type=\"stix\",\n",
" pattern=\"[file:hashes.MD5 = 'd41d8cd98f00b204e9800998ecf8427e']\",\n",
" valid_from=\"2017-01-01T12:34:56Z\",\n",
" ),\n",
" Malware(id=MALWARE_ID,\n",
" malware_types=['ransomware'],\n",
" name=\"Cryptolocker\",\n",
" is_family=False,\n",
" ),\n",
" ThreatActor(id=THREAT_ACTOR_ID,\n",
" threat_actor_types=[\"crime-syndicate\"],\n",
" name=\"Evil Org\",\n",
" aliases=[\"super-evil\"],\n",
" ),\n",
" Relationship(\n",
" source_ref=THREAT_ACTOR_ID,\n",
" target_ref=MALWARE_ID,\n",
" relationship_type=\"uses\",\n",
" ),\n",
" Report(\n",
" report_types=[\"campaign\"],\n",
" name=\"Bad Cybercrime\",\n",
" published=\"2016-04-06T20:03:00.000Z\",\n",
" object_refs=[THREAT_ACTOR_ID, MALWARE_ID],\n",
" ),\n",
"]\n",
"\n",
"g2 = [\n",
" AttackPattern(\n",
" name=\"Spear phishing\",\n",
" external_references=[\n",
" {\n",
" \"url\": \"https://example2\",\n",
" \"source_name\": \"some-source2\",\n",
" },\n",
" ],\n",
" ),\n",
" Campaign(name=\"Another Campaign\"),\n",
" Identity(\n",
" name=\"John Smith\",\n",
" identity_class=\"individual\",\n",
" description=\"A person\",\n",
" ),\n",
" Indicator(\n",
" indicator_types=['malicious-activity'],\n",
" pattern_type=\"stix\",\n",
" pattern=\"[file:hashes.MD5 = '79054025255fb1a26e4bc422aef54eb4']\",\n",
" valid_from=\"2017-01-01T12:34:56Z\",\n",
" ),\n",
" Malware(id=MALWARE_ID,\n",
" malware_types=['ransomware', 'dropper'],\n",
" name=\"Cryptolocker\",\n",
" is_family=False,\n",
" ),\n",
" ThreatActor(id=THREAT_ACTOR_ID,\n",
" threat_actor_types=[\"spy\"],\n",
" name=\"James Bond\",\n",
" aliases=[\"007\"],\n",
" ),\n",
" Relationship(\n",
" source_ref=THREAT_ACTOR_ID,\n",
" target_ref=MALWARE_ID,\n",
" relationship_type=\"uses\",\n",
" ),\n",
" Report(\n",
" report_types=[\"campaign\"],\n",
" name=\"Bad Cybercrime\",\n",
" published=\"2016-04-06T20:03:00.000Z\",\n",
" object_refs=[THREAT_ACTOR_ID, MALWARE_ID],\n",
" ),\n",
"]\n",
"\n",
"\n",
"weights = {\n",
" \"_internal\": {\n",
" \"ignore_spec_version\": False,\n",
" \"versioning_checks\": False,\n",
" \"max_depth\": 1,\n",
" },\n",
"}\n",
"\n",
"memstore1 = MemoryStore(g1)\n",
"memstore2 = MemoryStore(g2)\n",
"prop_scores = {}\n",
"\n",
"similarity_result = env.graph_similarity(memstore1, memstore2, prop_scores, **weights)\n",
"equivalence_result = env.graph_equivalence(memstore1, memstore2, threshold=60)\n",
"\n",
"print(similarity_result)\n",
"print(equivalence_result)\n",
"print(json.dumps(prop_scores, indent=4, sort_keys=False))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The example above uses the same objects found in this guide to demonstrate the graph similarity and equivalence use. Under this approach Grouping, Relationship, Report and Sighting have default weights defined allowing object de-reference. The Report and Relationship objects respectively show their `*_ref` and `*_refs` properties checked in the summary output. Analyzing the similarity output we can observe that objects checked individually rated high, but as we take into account the rest of the graph discrepancies add up and produced a lower score."
]
}
],
"metadata": {