Update semantic equivalence docs

master
Chris Lenk 2019-10-14 14:31:44 -04:00
parent c42f42e983
commit 39e1ddbbf6
1 changed files with 74 additions and 36 deletions

View File

@ -2,7 +2,7 @@
"cells": [ "cells": [
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 14, "execution_count": 1,
"metadata": { "metadata": {
"nbsphinx": "hidden" "nbsphinx": "hidden"
}, },
@ -22,7 +22,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 15, "execution_count": 2,
"metadata": { "metadata": {
"nbsphinx": "hidden" "nbsphinx": "hidden"
}, },
@ -58,7 +58,7 @@
"source": [ "source": [
"## Checking Semantic Equivalence\n", "## Checking Semantic Equivalence\n",
"\n", "\n",
"The [Environment](../api/stix2.environment.rst#stix2.environment.Environment) has a function for checking if two STIX Objects are semantically equivalent. For each supported objct type, the algorithm checks if the values for a specific set of properties match. Then each matching property is weighted since every property doesn't represent the same level of importance for semantic equivalence. The result will be the sum of these weighted values, in the range of 0 to 100. A result of 0 means that the the two objects are not equivalent, and a result of 100 means that they are equivalent.\n", "The [Environment](../api/stix2.environment.rst#stix2.environment.Environment) has a function for checking if two STIX Objects are semantically equivalent. For each supported object type, the algorithm checks if the values for a specific set of properties match. Then each matching property is weighted since every property doesn't represent the same level of importance for semantic equivalence. The result will be the sum of these weighted values, in the range of 0 to 100. A result of 0 means that the the two objects are not equivalent, and a result of 100 means that they are equivalent.\n",
"\n", "\n",
"TODO: Add a link to the committee note when it is released.\n", "TODO: Add a link to the committee note when it is released.\n",
"\n", "\n",
@ -71,7 +71,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 16, "execution_count": 3,
"metadata": {}, "metadata": {},
"outputs": [ "outputs": [
{ {
@ -152,7 +152,7 @@
"<IPython.core.display.HTML object>" "<IPython.core.display.HTML object>"
] ]
}, },
"execution_count": 16, "execution_count": 3,
"metadata": {}, "metadata": {},
"output_type": "execute_result" "output_type": "execute_result"
} }
@ -191,12 +191,12 @@
"source": [ "source": [
"### Campaign Example\n", "### Campaign Example\n",
"\n", "\n",
"For Campaigns, the only properties that contribute to semantic equivalence are `name` and `aliases`, with weights of 60 and 40, respectively. In this example, the two campaigns have completely different names, but slightly similar descriptions." "For Campaigns, the only properties that contribute to semantic equivalence are `name` and `aliases`, with weights of 60 and 40, respectively. In this example, the two campaigns have completely different names, but slightly similar descriptions. The result may be higher than expected because the Jaro-Winkler algorithm used to compare string properties looks at the edit distance of the two strings rather than just the words in them."
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 17, "execution_count": 4,
"metadata": {}, "metadata": {},
"outputs": [ "outputs": [
{ {
@ -270,14 +270,14 @@
".highlight .vg { color: #19177C } /* Name.Variable.Global */\n", ".highlight .vg { color: #19177C } /* Name.Variable.Global */\n",
".highlight .vi { color: #19177C } /* Name.Variable.Instance */\n", ".highlight .vi { color: #19177C } /* Name.Variable.Instance */\n",
".highlight .vm { color: #19177C } /* Name.Variable.Magic */\n", ".highlight .vm { color: #19177C } /* Name.Variable.Magic */\n",
".highlight .il { color: #666666 } /* Literal.Number.Integer.Long */</style><div class=\"highlight\"><pre><span></span>50.0\n", ".highlight .il { color: #666666 } /* Literal.Number.Integer.Long */</style><div class=\"highlight\"><pre><span></span>44.0\n",
"</pre></div>\n" "</pre></div>\n"
], ],
"text/plain": [ "text/plain": [
"<IPython.core.display.HTML object>" "<IPython.core.display.HTML object>"
] ]
}, },
"execution_count": 17, "execution_count": 4,
"metadata": {}, "metadata": {},
"output_type": "execute_result" "output_type": "execute_result"
} }
@ -286,12 +286,10 @@
"from stix2.v21 import Campaign\n", "from stix2.v21 import Campaign\n",
"\n", "\n",
"c1 = Campaign(\n", "c1 = Campaign(\n",
" name=\"Someone Attacks Somebody\",\n", " name=\"there\",)\n",
" description=\"A campaign targeting....\",)\n",
"\n", "\n",
"c2 = Campaign(\n", "c2 = Campaign(\n",
" name=\"Another Campaign\",\n", " name=\"something\",)\n",
" description=\"A campaign that targets....\",)\n",
"print(env.semantically_equivalent(c1, c2))" "print(env.semantically_equivalent(c1, c2))"
] ]
}, },
@ -306,7 +304,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 18, "execution_count": 5,
"metadata": {}, "metadata": {},
"outputs": [ "outputs": [
{ {
@ -387,7 +385,7 @@
"<IPython.core.display.HTML object>" "<IPython.core.display.HTML object>"
] ]
}, },
"execution_count": 18, "execution_count": 5,
"metadata": {}, "metadata": {},
"output_type": "execute_result" "output_type": "execute_result"
} }
@ -419,8 +417,10 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 19, "execution_count": 6,
"metadata": {}, "metadata": {
"scrolled": true
},
"outputs": [ "outputs": [
{ {
"name": "stderr", "name": "stderr",
@ -507,7 +507,7 @@
"<IPython.core.display.HTML object>" "<IPython.core.display.HTML object>"
] ]
}, },
"execution_count": 19, "execution_count": 6,
"metadata": {}, "metadata": {},
"output_type": "execute_result" "output_type": "execute_result"
} }
@ -530,6 +530,13 @@
"print(env.semantically_equivalent(ind1, ind2))" "print(env.semantically_equivalent(ind1, ind2))"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If the patterns were identical the result would have been 100."
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@ -541,7 +548,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 20, "execution_count": 7,
"metadata": { "metadata": {
"scrolled": true "scrolled": true
}, },
@ -624,7 +631,7 @@
"<IPython.core.display.HTML object>" "<IPython.core.display.HTML object>"
] ]
}, },
"execution_count": 20, "execution_count": 7,
"metadata": {}, "metadata": {},
"output_type": "execute_result" "output_type": "execute_result"
} }
@ -654,7 +661,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 21, "execution_count": 8,
"metadata": { "metadata": {
"scrolled": true "scrolled": true
}, },
@ -737,7 +744,7 @@
"<IPython.core.display.HTML object>" "<IPython.core.display.HTML object>"
] ]
}, },
"execution_count": 21, "execution_count": 8,
"metadata": {}, "metadata": {},
"output_type": "execute_result" "output_type": "execute_result"
} }
@ -771,7 +778,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 22, "execution_count": 9,
"metadata": { "metadata": {
"scrolled": true "scrolled": true
}, },
@ -854,7 +861,7 @@
"<IPython.core.display.HTML object>" "<IPython.core.display.HTML object>"
] ]
}, },
"execution_count": 22, "execution_count": 9,
"metadata": {}, "metadata": {},
"output_type": "execute_result" "output_type": "execute_result"
} }
@ -888,7 +895,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 23, "execution_count": 10,
"metadata": { "metadata": {
"scrolled": true "scrolled": true
}, },
@ -971,7 +978,7 @@
"<IPython.core.display.HTML object>" "<IPython.core.display.HTML object>"
] ]
}, },
"execution_count": 23, "execution_count": 10,
"metadata": {}, "metadata": {},
"output_type": "execute_result" "output_type": "execute_result"
} }
@ -1002,7 +1009,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 24, "execution_count": 11,
"metadata": {}, "metadata": {},
"outputs": [ "outputs": [
{ {
@ -1083,7 +1090,7 @@
"<IPython.core.display.HTML object>" "<IPython.core.display.HTML object>"
] ]
}, },
"execution_count": 24, "execution_count": 11,
"metadata": {}, "metadata": {},
"output_type": "execute_result" "output_type": "execute_result"
} }
@ -1117,7 +1124,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 25, "execution_count": 12,
"metadata": {}, "metadata": {},
"outputs": [ "outputs": [
{ {
@ -1137,12 +1144,12 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Some object types do not have a defined method for calculating semantic equivalence and by default will raise an error." "Some object types do not have a defined method for calculating semantic equivalence and by default will raise an [error](../api/stix2.exceptions.rst#stix2.exceptions.SemanticEquivalenceUnsupportedTypeError)."
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 26, "execution_count": 13,
"metadata": {}, "metadata": {},
"outputs": [ "outputs": [
{ {
@ -1176,12 +1183,43 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"By default, comparing objects of different spec versions will result in an error. You can optionally allow this by providing a configuration dictionary like in the next example:" "By default, comparing objects of different spec versions will result in a `ValueError`."
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 27, "execution_count": 14,
"metadata": {},
"outputs": [
{
"ename": "ValueError",
"evalue": "The objects to compare must be of the same spec version!",
"output_type": "error",
"traceback": [
"\u001b[0;31mValueError\u001b[0m\u001b[0;31m:\u001b[0m The objects to compare must be of the same spec version!\n"
]
}
],
"source": [
"from stix2.v20 import Identity as Identity20\n",
"\n",
"id20 = Identity20(\n",
" name=\"John Smith\",\n",
" identity_class=\"individual\",\n",
")\n",
"print(env.semantically_equivalent(id2, id20))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can optionally allow comparing across spec versions by providing a configuration dictionary like in the next example:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {}, "metadata": {},
"outputs": [ "outputs": [
{ {
@ -1262,7 +1300,7 @@
"<IPython.core.display.HTML object>" "<IPython.core.display.HTML object>"
] ]
}, },
"execution_count": 27, "execution_count": 15,
"metadata": {}, "metadata": {},
"output_type": "execute_result" "output_type": "execute_result"
} }
@ -1286,7 +1324,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 28, "execution_count": 16,
"metadata": {}, "metadata": {},
"outputs": [ "outputs": [
{ {
@ -1367,7 +1405,7 @@
"<IPython.core.display.HTML object>" "<IPython.core.display.HTML object>"
] ]
}, },
"execution_count": 28, "execution_count": 16,
"metadata": {}, "metadata": {},
"output_type": "execute_result" "output_type": "execute_result"
} }