tag:blogger.com,1999:blog-40388352871408519312024-03-13T07:19:54.252-07:00Recombinant DataThoughts on the improved design and development of software and intelligent Internet systems.Unknownnoreply@blogger.comBlogger6125tag:blogger.com,1999:blog-4038835287140851931.post-45815739052435136012019-06-30T14:39:00.000-07:002019-06-30T15:20:09.150-07:00Interpretable AI: Reasoning about Why (...and why a solution is the right solution)<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgJW8tNnykQGRKQdp44d9DBEPj-er8KxJDjDK2_kYrbYEtaEZ2I8SZS94eA94sFOoK6JMr3BSwFeUD7BZhEGabcN8T3Ohq1bB4Ad-kXROjXDPN_XSNQQcMlaCxFkAZW3dg7xbj8TSlGEow/s1600/pipe3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="775" data-original-width="573" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgJW8tNnykQGRKQdp44d9DBEPj-er8KxJDjDK2_kYrbYEtaEZ2I8SZS94eA94sFOoK6JMr3BSwFeUD7BZhEGabcN8T3Ohq1bB4Ad-kXROjXDPN_XSNQQcMlaCxFkAZW3dg7xbj8TSlGEow/s320/pipe3.png" width="236" /></a></div>
<h2 dir="ltr" style="line-height: 1.38; margin-bottom: 6pt; margin-top: 18pt;">
<span style="font-family: "arial"; font-size: 12pt; font-weight: 400; white-space: pre-wrap;">In the Sherlock Holmes novels, Conan Doyle’s hero is said to use his deductive power to infer by whom and how a crime was committed. He gathers the facts and then proceeds to deduce their logical conclusion. Ideally, given rules [A→B, B→C, …Y→Z] and fact (antecedent) A, Z can be deduced using the rules transitively. But in each of his cases there are gaps, not just in facts, but in available explanations. He therefore has to propose new explanations, since much of the crime was done without any witnesses. He applies abductive reasoning rather than deductive reasoning, to infer, or abduce, the cause and explanation for a certain set of given resulting facts.</span></h2>
<b id="docs-internal-guid-ac520dba-7fff-888b-2192-c0f3cf008ad8" style="-webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; caret-color: rgb(0, 0, 0); color: black; font-family: -webkit-standard; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-decoration: none; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px;"><br /></b>
<br />
<div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Sherlock’s favorite phrase is “Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth”. This is about finding explanations for the result B, beginning with an open set of antecedents {A1,A2,A3…}; it is not simple deduction from A to B to C. And if all explanations are found impossible (possibly even by using deduction, but possible by other means), than the remaining single one must be the answer. But what if the case isn’t so discrete? What if your elimination reduces to a set of several solutions, not just one (e.g., different but overlapping genotypes)? Then you must find the most likely explanation using some oth3re means, like posterior (Bayesian) probabilities. This is precisely what abductive reasoning is all about: to find the best explanation or set of best explanations, not deducing the exact correct solution!</span></div>
<b style="-webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; caret-color: rgb(0, 0, 0); color: black; font-family: -webkit-standard; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-decoration: none; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px;"><br /></b>
<br />
<div dir="ltr" style="margin-left: 141.75pt;">
<table style="border-collapse: collapse; border: none;"><colgroup><col width="51"></col><col width="72"></col><col width="70"></col><col width="70"></col></colgroup><tbody>
<tr style="height: 22pt;"><td colspan="2" rowspan="2" style="border: 1pt solid rgb(0, 0, 0); padding: 5pt; vertical-align: top;"><div dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt; text-align: center;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Truth Table:</span></div>
<div dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt; text-align: center;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">A→B</span></div>
</td><td colspan="2" style="border: 1pt solid rgb(0, 0, 0); padding: 5pt; vertical-align: top;"><div dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt; text-align: center;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">A</span></div>
</td></tr>
<tr style="height: 18pt;"><td style="border: 1pt solid rgb(0, 0, 0); padding: 5pt; vertical-align: top;"><div dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">True</span></div>
</td><td style="border: 1pt solid rgb(0, 0, 0); padding: 5pt; vertical-align: top;"><div dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">False</span></div>
</td></tr>
<tr style="height: 22pt;"><td rowspan="2" style="border: 1pt solid rgb(0, 0, 0); padding: 5pt; vertical-align: top;"><br />
<div dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt; text-align: center;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">B</span></div>
</td><td style="border: 1pt solid rgb(0, 0, 0); padding: 5pt; vertical-align: top;"><div dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">True</span></div>
</td><td style="border: 1pt solid rgb(0, 0, 0); padding: 5pt; vertical-align: top;"><div dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">True</span></div>
</td><td style="border: 1pt solid rgb(0, 0, 0); padding: 5pt; vertical-align: top;"><div dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: #9900ff; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">True</span></div>
</td></tr>
<tr style="height: 22pt;"><td style="border: 1pt solid rgb(0, 0, 0); padding: 5pt; vertical-align: top;"><div dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">False</span></div>
</td><td style="background-color: #dd7e6b; border: 1pt solid rgb(0, 0, 0); padding: 5pt; vertical-align: top;"><div dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">False</span></div>
</td><td style="border: 1pt solid rgb(0, 0, 0); padding: 5pt; vertical-align: top;"><div dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">True</span></div>
</td></tr>
</tbody></table>
</div>
<b style="-webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; caret-color: rgb(0, 0, 0); color: black; font-family: -webkit-standard; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-decoration: none; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px;"><br /></b>
<br />
<div dir="ltr" style="line-height: 1.38; margin-bottom: 10pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Let’s break this down further. Logically, deduction is sound if the implications (rules) used are all sound. For the implication A→B , </span><span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: italic; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">modus ponens</span><span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> exactly states:</span><span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: italic; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> if (A→B) & A then B</span><span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">. That is, given a rule and the knowing the antecedent (A) is true, then B must be true if everything is sound. However, the inverse (</span><span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: italic; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">if (A→B) & B then A</span><span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">) is not necessarily true (see truth table, purple text). On the other hand, it is also not necessarily false to predict A from B, tough it isn’t exactly (always) sound, and is therefore referred to as the </span><span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: italic; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">fallacy of the converse</span><span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">. Yet this reasoning when done over the complete set of possible explanations, with them being ranked by which is most probable, it can be used to infer real, possible explanations. This is at the heart of abductive reasoning, which has been cited by many [] as what scientists frequently apply. Researchers using Bayesian Inference to propose explanations or mechanisms are indeed applying abduction.</span></div>
<div dir="ltr" style="line-height: 1.38; margin-bottom: 10pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Abductive reasoning requires a high bar for tools that support it, since it is not only a matter of being able to proffer a few different explanations about a phenomenon B, but to have sufficient coverage on all possible explanations so that the best ones can be ranked (using posterior probabilities), which often requires knowing the sum of (almost) all probabilities. I recommend this being the high-bar for what we have been calling </span><span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: italic; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">knowledge bases</span><span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">. One can argue that practical knowledge should include verifiable explanations or the ability to find such explanations for evidence-based discovery. The guidance for this should be openly discussed and agreed on soon due to the large number of recent knowledge graph/base related offerings, some which may not meet this requirement. Knowledge systems should serve both human queries and machine-driven interrogations and inference. Currently, there are no well-defined objectives of their use, making recommendations and selection by enterprises and institutions very ambiguous.</span></div>
<div dir="ltr" style="line-height: 1.38; margin-bottom: 10pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">In addition, the AI/ML community needs to address the relevance and benefits of using such knowledge resources, and whether further alignment (e.g., APIs) are needed. Specifically, knowledge systems could be used to address interpretability of AI solutions, in order to inject both context and non-technical access to their overall benefits. So far the AI community often views knowledge as something an AI system finds but doesn't require itself to take advantage of, while those developing knowledge graphs view their semantic forms as their interpretation of AI and feeding learning systems transformed data from within the graph. Both views are unproductive, and the real benefits will emerge out of considering how both technologies can be more fundamentally </span><span style="font-family: "arial"; font-size: 16px; white-space: pre-wrap;">integrated</span><span style="font-family: "arial"; font-size: 12pt; white-space: pre-wrap;">.</span></div>
<h3>
<b style="-webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; caret-color: rgb(0, 0, 0); color: black; font-family: -webkit-standard; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-decoration: none; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px;"><br /></b></h3>
<h3>
<b style="-webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; caret-color: rgb(0, 0, 0); color: black; font-family: -webkit-standard; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-decoration: none; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px;">Examples of logical inference</b></h3>
<div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Deduction</span></div>
<ol style="margin-bottom: 0pt; margin-top: 0pt;">
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 12pt; font-style: normal; font-variant-caps: normal; font-variant-east-asian: normal; font-variant-ligatures: normal; font-variant-position: normal; font-weight: 400; list-style-type: decimal; text-decoration: none; vertical-align: baseline; white-space: pre;"><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">All oncogenes have the potential of becoming mutated and driving oncogenesis.</span></div>
</li>
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 12pt; font-style: normal; font-variant-caps: normal; font-variant-east-asian: normal; font-variant-ligatures: normal; font-variant-position: normal; font-weight: 400; list-style-type: decimal; text-decoration: none; vertical-align: baseline; white-space: pre;"><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Gene W is an oncogene.</span></div>
</li>
</ol>
<div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-left: 18pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">∴ Gene W can cause cancer if mutated.</span></div>
<b style="-webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; caret-color: rgb(0, 0, 0); color: black; font-family: -webkit-standard; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-decoration: none; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px;"><br /></b>
<br />
<div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Abduction</span></div>
<ol style="margin-bottom: 0pt; margin-top: 0pt;">
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 12pt; font-style: normal; font-variant-caps: normal; font-variant-east-asian: normal; font-variant-ligatures: normal; font-variant-position: normal; font-weight: 400; list-style-type: decimal; text-decoration: none; vertical-align: baseline; white-space: pre;"><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">All oncogenes have the potential of becoming mutated and driving oncogenesis.</span></div>
</li>
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 12pt; font-style: normal; font-variant-caps: normal; font-variant-east-asian: normal; font-variant-ligatures: normal; font-variant-position: normal; font-weight: 400; list-style-type: decimal; text-decoration: none; vertical-align: baseline; white-space: pre;"><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Gene Y is observed to be mutated in a tumor</span></div>
</li>
</ol>
<div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-left: 18pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">∴ Gene Y is an oncogene. </span></div>
<div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">→ NO! Counter-example: Altered Gene Y can also affect oncogenesis if it’s a Tumor Suppressor. It could even be passive and simply incidental.</span></div>
<div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">→ However, if one continues to see an association of Y mutants in similar classes of tumors </span><span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">and</span><span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> it appears to be a gain of function </span><span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">and</span><span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> these mutations are rarer in other cases, then the proposition may be verified.</span></div>
<b style="-webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; caret-color: rgb(0, 0, 0); color: black; font-family: -webkit-standard; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-decoration: none; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px;"><br /></b>
<br />
<div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Induction</span></div>
<ol style="margin-bottom: 0pt; margin-top: 0pt;">
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 12pt; font-style: normal; font-variant-caps: normal; font-variant-east-asian: normal; font-variant-ligatures: normal; font-variant-position: normal; font-weight: 400; list-style-type: decimal; text-decoration: none; vertical-align: baseline; white-space: pre;"><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Genes W,X,Y,Z are RTKs.</span></div>
</li>
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 12pt; font-style: normal; font-variant-caps: normal; font-variant-east-asian: normal; font-variant-ligatures: normal; font-variant-position: normal; font-weight: 400; list-style-type: decimal; text-decoration: none; vertical-align: baseline; white-space: pre;"><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Gene Y is observed to be mutated in 20% of lung carcinomas</span></div>
</li>
</ol>
<div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-left: 18pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 12pt; font-style: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">∴ RTKs can drive oncogenesis in lung when mutated </span></div>
<br class="Apple-interchange-newline" />Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4038835287140851931.post-65099778344264696302017-10-24T08:22:00.000-07:002019-06-30T14:37:06.882-07:00Wrecks at the Bottom of Data Lakes<div style="text-align: center;">
<img src="blob:https://www.blogger.com/af4261bd-b1d7-4d4c-8a8e-14e147129f7d" /></div>
<h2 style="text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgrdyr_QfyTFxtSYxt4Lll3fqIHchf2Ns8Iw1nXUHQ0Ro68s0mjZ27zlAKI7Wq2kA3vfs8jg8ZiHqd1PgjvE5zZ1z33ymxY20oIa9PA6sPZXVxZpr0lWFbl_yGyl-8lg7kBnbTl5Q6ZpRA/s1600/Edmind_Fitz.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="460" data-original-width="729" height="201" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgrdyr_QfyTFxtSYxt4Lll3fqIHchf2Ns8Iw1nXUHQ0Ro68s0mjZ27zlAKI7Wq2kA3vfs8jg8ZiHqd1PgjvE5zZ1z33ymxY20oIa9PA6sPZXVxZpr0lWFbl_yGyl-8lg7kBnbTl5Q6ZpRA/s320/Edmind_Fitz.jpg" width="320" /></a></h2>
<h2 style="text-align: left;">
What are Data Lakes?</h2>
<h2>
<o:p></o:p></h2>
<div class="MsoNormal">
Along with all the activity and marketing hype around Big
Data, there are still troubling loose ends to contend with: how do we associate
disparate but overlapping data to each other if we’re simply to “pour” data
together? Using the lake paradigm, how is one to fish out the specific data
that match some form of criterion, as well as anything else associated to it?
Some explanations point to adding some type information, but this limits how
data from different collections (related but not exactly equal types) can be
cross-linked when necessary. We can choose to link entities across types using constrained
rules or semantics. However, if we are to rely on some form of data semantics
to associate related things, how is this data semantics to be established,
added to the lake, and then managed? The metaphor for the lake quickly begins
to get murky…<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
But what happened to semantic data aka linked data, to the
ability to link data from multiple sources across an organization or even the
Internet? What of all the promises of truly interlinked data independent of
where they arise? Is the data lake the replacement paradigm? One notable shift
has been to the localization of data to within an organization’s auspices,
rather than relying on outbound links, as championed by semantic web standards.
But is the <i style="mso-bidi-font-style: normal;">lake</i> terminology right for
this? In the sciences, there are always external resources that need to be
updated and merged with the internal sets. If not properly using linked data identifier
(URI) semantics, what then? What is really offered here?<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
To many, the lake analogy affords a serene image of lazy
afternoons of sailing and fishing; but it is deceptive nonetheless. Are things
best discovered by using simple tags, are these controlled? Are unique
relations the key in identifying special objects? Is it a particular tangle of
linked things that help fish out a prize catch? Do large assemblages of
multiple facts come out whole in a meaningful way, or is it a jumble of stringy
facts? It is not a far stretch to conjure up the thought of an Edmund
Fitzgerald<a href="applewebdata://77DBCF5D-F7B1-49B6-A4F7-E9159879AEBC#_ftn1" name="_ftnref1" style="mso-footnote-id: ftn1;" title=""><span class="MsoFootnoteReference"><span style="mso-special-character: footnote;"><!--[if !supportFootnotes]--><span class="MsoFootnoteReference"><span style="font-family: "times new roman" , serif; font-size: 12.0pt; line-height: 130%;">[1]</span></span><!--[endif]--></span></span></a>-size
data wreck if one does not take the time to structure the inserted data. Some
things dumped into the lake may never see the light of day again. Is data depth
now become a good thing or a bad thing? In this article, we will take a deeper
dive into the challenges facing data aggregation and struxturing, and some new
ideas of how to better organize growing and evolving data resources.<o:p></o:p></div>
<div class="MsoNormal">
A concept that was introduced in a previous article, is the
Yoneda lemma (abstract algebras), which formally ties all records of entities
(including keys) from any table to each other to create one large network of composite
relations. It makes it possible to define a query algebra (e.g., SQL, SPARQL)
that works with any schema for a dataset. In the case of data lakes, this
foundation is missing or at least has not been formally introduced, so a large
uncertainty exists on what the formal basis will be to ensure data integrity
for insertions, updates and queries. Currently data lakes appear to be a convenient
option for handling large influx of datasets, coming in varied, disjoint
structural forms. Sean Martin of Cambridge Semantics said of current efforts
[1]: “We see customers creating big data graveyards, dumping everything into
HDFS [Hadoop Distributed File System] and hoping to do something with it down
the road. But then they just lose track of what’s there”.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
An alternative generalized model is the concept of what I
call a <b style="mso-bidi-font-weight: normal;"><i style="mso-bidi-font-style: normal;">Datacomb<a href="applewebdata://77DBCF5D-F7B1-49B6-A4F7-E9159879AEBC#_ftn2" name="_ftnref2" style="mso-footnote-id: ftn2;" title=""><span class="MsoFootnoteReference"><span style="mso-special-character: footnote;"><!--[if !supportFootnotes]--><span class="MsoFootnoteReference"><b style="mso-bidi-font-weight: normal;"><span style="font-family: "times new roman" , serif; font-size: 12.0pt; line-height: 130%;">[2]</span></b></span><!--[endif]--></span></span></a></i></b>,
which relies on both efficiency and logic (ala geometric algebras) for storage,
structure, and discoverability. Here any typed real-world entity (RWE) or
conjunction of RWEs, can be mapped using single or multiple keys. The latter is
usually associated with JOIN results (Patient + Primary Physician), but which
can be automatically typed as a Cartesian Product (CP) using existing atomic entities: </div>
<div class="MsoNormal">
PATIENT×PPHYSICIAN. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Such a relation instance materializes if a fact exists
about a patient having a primary physician, as in any join, but now a compound typed-object
exists as well. This compound object may uniquely contain data on when the
patient first began going to this doctor, and what was the circumstance of the
first visit. The actual visits are also compositionally typed (and linked) as
VISIT <span style="font-family: "ms mincho" , serif; mso-bidi-font-family: "MS Mincho";">≝
</span>PATIENT×PPHYSICIAN×DATE, which would include the location<a href="applewebdata://77DBCF5D-F7B1-49B6-A4F7-E9159879AEBC#_ftn3" name="_ftnref3" style="mso-footnote-id: ftn3;" title=""><span class="MsoFootnoteReference"><span style="mso-special-character: footnote;"><!--[if !supportFootnotes]--><span class="MsoFootnoteReference"><span style="font-family: "times new roman" , serif; font-size: 12.0pt; line-height: 130%;">[3]</span></span><!--[endif]--></span></span></a>,
any tests performed, and what was the diagnosis. Cartesian products have the
basic ability to be decomposed (projected) into the set of atomic entities
((PATIENT, PPHYSICIAN), DATE), with their original associated (row) data. If we
wish to include prescribed drug therapies, we can organize this by extending
the previous objects thusly: PATIENT×PPHYSICIAN×DATE×THERAPY_START. For every a
<span style="font-family: "ms mincho" , serif; mso-bidi-font-family: "MS Mincho";">∊ </span>PATIENT,
b <span style="font-family: "ms mincho" , serif; mso-bidi-font-family: "MS Mincho";">∊
</span>PPHYSICIAN, c <span style="font-family: "ms mincho" , serif; mso-bidi-font-family: "MS Mincho";">∊ </span>DATE, and d <span style="font-family: "ms mincho" , serif; mso-bidi-font-family: "MS Mincho";">∊ </span>THERAPY_START, a 3-simplex (4
vertices) is created, where each combination from 1-4 conjunctions (total of
15) has compositional semantic meaning:<o:p></o:p></div>
<div align="center" class="MsoNormal" style="text-align: center;">
<!--[if gte vml 1]><v:shapetype
id="_x0000_t202" coordsize="21600,21600" o:spt="202" path="m0,0l0,21600,21600,21600,21600,0xe">
<v:stroke joinstyle="miter"/>
<v:path gradientshapeok="t" o:connecttype="rect"/>
</v:shapetype><v:shape id="Text_x0020_Box_x0020_4" o:spid="_x0000_s1027"
type="#_x0000_t202" style='position:absolute;left:0;text-align:left;
margin-left:162.2pt;margin-top:19.4pt;width:44.95pt;height:53.95pt;z-index:251660288;
visibility:visible;mso-wrap-style:square;mso-width-percent:0;
mso-height-percent:0;mso-wrap-distance-left:9pt;mso-wrap-distance-top:0;
mso-wrap-distance-right:9pt;mso-wrap-distance-bottom:0;
mso-position-horizontal:absolute;mso-position-horizontal-relative:text;
mso-position-vertical:absolute;mso-position-vertical-relative:text;
mso-width-percent:0;mso-height-percent:0;mso-width-relative:margin;
mso-height-relative:margin;v-text-anchor:top' o:gfxdata="UEsDBBQABgAIAAAAIQApm/tGBAEAAB4CAAATAAAAW0NvbnRlbnRfVHlwZXNdLnhtbKSRzU7DMBCE
70i8g+UrSpxyQAgl6YGfI3AoD7DYm8TCsS2vW9q3Z5OmF1RVSFws2+uZ+TSu1/vRiR0mssE3clVW
UqDXwVjfN/Jj81LcS0EZvAEXPDbygCTX7fVVvTlEJMFqT40cco4PSpEecAQqQ0TPky6kETIfU68i
6C/oUd1W1Z3SwWf0uciTh2zrJ+xg67J43vP1kYTlUjwe301RjYQYndWQGVRNU3VWl9DRBeHOm190
xUJWsnI2p8FGulkS3riaZA2Kd0j5FUbmUNrZ+BkgGWUSfHNRdNqsysvYZ9JD11mNJujtyI2Ui+Pf
4jO3jWpe/58825xy1fy77Q8AAAD//wMAUEsDBBQABgAIAAAAIQCtMD/xwQAAADIBAAALAAAAX3Jl
bHMvLnJlbHOEj80KwjAQhO+C7xD2btN6EJGmvYjgVfQB1mTbBtskZOPf25uLoCB4m2XYb2bq9jGN
4kaRrXcKqqIEQU57Y12v4HTcLdYgOKEzOHpHCp7E0DbzWX2gEVN+4sEGFpniWMGQUthIyXqgCbnw
gVx2Oh8nTPmMvQyoL9iTXJblSsZPBjRfTLE3CuLeVCCOz5CT/7N911lNW6+vE7n0I0KaiPe8LCMx
9pQU6NGGs8do3ha/RVXk5iCbWn4tbV4AAAD//wMAUEsDBBQABgAIAAAAIQDdnfIIagMAAF8PAAAf
AAAAY2xpcGJvYXJkL2RyYXdpbmdzL2RyYXdpbmcxLnhtbOxXTU8bMRC9V+p/sHyHJBBSiNigkJKq
EoIoCeU8eL3JCq+9tZ0vfn1nvLtJoIgC5VI1HMLYfh6/mX22x6dny0yxubQuNTrijf06Z1ILE6d6
EvGbcX/vmDPnQcegjJYRX0nHzzqfP51Ce2Ihn6aCoQft2hDxqfd5u1ZzYiozcPsmlxrHEmMz8Ni0
k1psYYGeM1U7qNdbtQxSzTsbV1/BA5vZ9B2ulBH3Mu6BnoNDl0q0t3tKjkr8vWdo6/k3m4/ygSXm
4mo+sCyNI46Z05BhinitHChh2Kw9mTXZOFgmNiO8SRK2jPjBSaN+UD/ibBXxw0bjsIl28CeXngkE
HH2pH7dwXCCgdXzUQLtYb3r9Bw9ievGiDyRZkEFji6DLiZ6e/x5xs4p4TOTOzZI117ETmvkldqKq
qDekoPLhyux9TPBr4tDOrfPfpMkYGRG3UvggMJhfOl+wqCAhKtNPlQr5U/pRB/osemSSoJP17Iq5
X45CXijEeEVT7/A/xmwNrotacLnop0jiEpwfgMXtgZ240fw1/iTKLCJuSouzqbEPz/UTHiWLo5wt
cLtF3P2cgZWcqe/aBZe+Mmxl3FWGnmU9oyj/xCaYyMF6VZmJNdmtsXGXVsEh0ALXirivzJ7HFg7g
Hhay2w22MFkO/lKPctxNjZBdSul4eQs2L/PuURBXZjSFXD6X/gJb5Ls78yZJy29T5JAGlPMjv1Iy
fJuQaRJUBvYykEBjSEaAEhUyMMiB8GwOFPNhHf9K4aktxLlMKqx3BbaC4fzNaDfxL+DCKGqk1LEl
qEWCCujclHrvZoRJf4j4CZIIfoxKYxJbaNjJXU/ZcnUiuqawDduWHrQlML/KZQICz5hxmknHruSC
DU0GGo+DKVgnw2cMKRPu9WiMA7kTMd9psJ5UinYrhk8D1C11TBIePhdgA0/yfyHCTQwhrkJSFNxO
V69XCt5379RVk/VRuG4nLFTc9nG0O7DC0fd+YbXYRTzZCSscwTthfeBN2GQ/sPpK/5dDi278dS07
c3KUD7HuLUrmqth1VJHRnan0UCb46sHivrz86c0n1zUNCCG1r+r+gKZpCdY/b5lY4kMJEsrwt0wu
qiecEVY22q8nZ6k2tqhZHtOO7yvKSYEPpUIZNSaInkK1J0/LACmfwvR+3W53fgEAAP//AwBQSwME
FAAGAAgAAAAhAGFjkMYJBgAA8xkAABoAAABjbGlwYm9hcmQvdGhlbWUvdGhlbWUxLnhtbOxZzW4b
NxC+F+g7EHtvpNWvZUQObFlKmkRJECkpcqR2qV1G3OViSdnRuccCvTQtemifoGgRoD2mb9O6KIrC
r9Ah90ekRdeJkUMO9sHYnf2G/Dgz+obcvX3nVcLQCckF5enQ8281PUTSgIc0jYbes/nksz0PCYnT
EDOekqG3IcK7c/DpJ7fxfsBotuA4D+cxSQiCgVKxj4deLGW232iIAMxY3OIZSeHZkucJlnCbR40w
x6cwQcIarWaz10gwTb0DGFGqgcYM/qVSKEPA8pkahqAUJzD7aC0kT1CnreHhylcgsREjlqMTzIYe
DBvy0zl5JT3EsJDwYOg19Z/XOLjdwPulE5OX+Bp+E/1X+pUO4aql58yjRT2p3/XH/qQeXwOY3MVN
/Eln0qnH0wAcBLDYgos5Zrfd7/cPS6wBKi53x+52Wp1238Ib47d3OI+arUGnZeE1qBi/s4M/HnSP
IOxFDA1QcdndwR81B35vYOE1qMD3dvDj0XGzv2fhNShmNF3toLvjwfjQL9E1ZMnZPSe8f9jp9ir4
FgXVUBeYmmLJU2mV20xuGEEzItHAU4AEv+T5BFDqhmFJUyQ3GVniAGpzThMi0CNyip7yBKeKHN4n
2EAUpkDsmNTESAQ5zeTQu5+BswE5f/vz+dtf0fnbN2fff332w4/FKJbLPZxGpss/b17/++W3bqAw
gX/+8tUfv3/jBsIvaLu4s+/e/PWbIvD3T68d8MMcL0y4MxY2ZbLI389jHmNqehymkcApVhF3MBrL
2EI/2mCGHbgjYofueQ4K4gLeXb+0CM/ifC2pY8QHcWIBp5yzI55XFWFF4YGaywjzfJ1G7snztYl7
ivGJa+4RTq38jtcZqCd1DTmKiUXzCcOpxBFJodrVM74ixLG6F5RacZ3SIOeCLyV6QdERps6QzOnC
qqat0z2aQF42LoKQbys20+foiDPXqo/JiY2EnwNmDvJzwqww3sVriRPXkHOcMDPgD7GMXSRnmzww
cWMhIdMRYRyNQyKEy+dxDus1kv4Ag2g50z5lm8RG5pKuXGM+xJybyGO+GsU4yVzYGU1jE/u5WEGJ
YvSESxd8yu1fiLqHPOD00nQ/p8RK99Vq8IxGFqVtgagn69yRy7uEW/U727AlJlp2QdctpU5oeiPb
N7Ktu/SNbOswbH9gN7INe7cb2YaN58cg21ulBhFXW6ViP65354m9OX+8XNKA6H35kjKm9+oPhd6a
C+hP4QSMyk+fQ0l9WMtiuFT9BCawcFGOtQ/KufyCyngW4wxKw9czRKIcOhIo4wLOlNrsHFsfDtbJ
lIfFmdT31fmzaGECy6292a3tcJSQBbrXL42Knz77Al/NNtJH4oqA8n0fEsZkNom2g0S/Ml5BQq/s
g7AYOFjsqeGrVO2EAqjVWYENFIJt19DrdsAFnOA0hRkJVZ6KVFfZVcmprj9Ipi8LJjMroAnvOcoK
2GZ6oLheujy1uqLU3iHTFgmj3GwSOjL6WCpiHJKyOpX1XWi8b64H25Ra9FQoylgYNPp7/8fiurkG
v4vawFJTKViKToder92FkglwNvSWcKyHyySD2hFq44tZBC/GApkXP/jrKEuWC3mMRVwEXItOoQYJ
lSRHjCZDTy2/TgNLtYZobn4LBOGjJTcAWfnYyEHS7SST5ZIE0ky7YVGRLm5B4QutcD7V7tcHK0++
hnTP4vAULdg6f4qhxLp9XwUwpAIasF9EM6TwzrIWsm39XZCrUnbNl4a6hgo7ZlmMy45iinkB1/2k
pqPv6hgYd+WaIaBGSMpGuIhUgzWDanXTunUVHC7tulc7qcgZorntmZaqqK7pVjFrhg8q/QarKsTQ
s80OX0j3RckdVFoHhersEhDwOn7Xa/0Gte1kFjXFeFeGlWaXVptatcArqL1LkzBUv1cNeyFudY9w
TgfGa3V+8LtYtWBaVvtKHWnXJ4gpztAi8ocefAOAFyev4Aq+InhgaylbS9ngCj4NQLso3ucPvfKi
ssDzwlJj2pWlXWE6laVTWbqVpVtZepWl5yH94hu+t6h33h6q3mtDDyvfg5d7C/s7zcF/AAAA//8D
AFBLAwQUAAYACAAAACEAnGZGQbsAAAAkAQAAKgAAAGNsaXBib2FyZC9kcmF3aW5ncy9fcmVscy9k
cmF3aW5nMS54bWwucmVsc4SPzQrCMBCE74LvEPZu0noQkSa9iNCr1AcIyTYtNj8kUezbG+hFQfCy
MLPsN7NN+7IzeWJMk3ccaloBQae8npzhcOsvuyOQlKXTcvYOOSyYoBXbTXPFWeZylMYpJFIoLnEY
cw4nxpIa0cpEfUBXNoOPVuYio2FBqrs0yPZVdWDxkwHii0k6zSF2ugbSL6Ek/2f7YZgUnr16WHT5
RwTLpRcWoIwGMwdKV2edNS1dgYmGff0m3gAAAP//AwBQSwECLQAUAAYACAAAACEAKZv7RgQBAAAe
AgAAEwAAAAAAAAAAAAAAAAAAAAAAW0NvbnRlbnRfVHlwZXNdLnhtbFBLAQItABQABgAIAAAAIQCt
MD/xwQAAADIBAAALAAAAAAAAAAAAAAAAADUBAABfcmVscy8ucmVsc1BLAQItABQABgAIAAAAIQDd
nfIIagMAAF8PAAAfAAAAAAAAAAAAAAAAAB8CAABjbGlwYm9hcmQvZHJhd2luZ3MvZHJhd2luZzEu
eG1sUEsBAi0AFAAGAAgAAAAhAGFjkMYJBgAA8xkAABoAAAAAAAAAAAAAAAAAxgUAAGNsaXBib2Fy
ZC90aGVtZS90aGVtZTEueG1sUEsBAi0AFAAGAAgAAAAhAJxmRkG7AAAAJAEAACoAAAAAAAAAAAAA
AAAABwwAAGNsaXBib2FyZC9kcmF3aW5ncy9fcmVscy9kcmF3aW5nMS54bWwucmVsc1BLBQYAAAAA
BQAFAGcBAAAKDQAAAAA=
" filled="f" stroked="f">
<v:textbox inset="0,0,0,0">
<![if !mso]>
<table cellpadding=0 cellspacing=0 width="100%">
<tr>
<td><![endif]>
<div>
<p class=MsoNormal style='margin-bottom:0in;margin-bottom:.0001pt;
line-height:normal'>
<span style='font-size:9.0pt;mso-bidi-font-size:12.0pt'>1
Cell<o:p></o:p></span></p>
<p class=MsoNormal style='margin-bottom:0in;margin-bottom:.0001pt;
line-height:normal'>
<span style='font-size:9.0pt;mso-bidi-font-size:12.0pt'>4
Faces<o:p></o:p></span></p>
<p class=MsoNormal style='margin-bottom:0in;margin-bottom:.0001pt;
line-height:normal'>
<span style='font-size:9.0pt;mso-bidi-font-size:12.0pt'>6
Edges<o:p></o:p></span></p>
<p class=MsoNormal style='margin-bottom:0in;margin-bottom:.0001pt;
line-height:normal'>
<span style='font-size:9.0pt;mso-bidi-font-size:12.0pt'>4
Vertices<o:p></o:p></span></p>
</div>
<![if !mso]></td>
</tr>
</table>
<![endif]></v:textbox>
<w:wrap type="square"/>
</v:shape><![endif]--><!--[if !vml]--><a href="about:invalid#zClosurez" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"></a></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="about:invalid#zClosurez" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img align="left" alt="Text Box: 1 Cell
4 Faces
6 Edges
4 Vertices
" border="0" height="56" hspace="9" src="cid:clip_image001.png" v:shapes="Text_x0020_Box_x0020_4" width="47" /></a><a href="about:invalid#zClosurez" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="99" src="cid:clip_image004.png" v:shapes="Picture_x0020_6" width="101" /></a></div>
<!--[endif]--><!--[if gte vml 1]><v:shape id="Text_x0020_Box_x0020_3"
o:spid="_x0000_s1026" type="#_x0000_t202" style='position:absolute;left:0;
text-align:left;margin-left:270.1pt;margin-top:10.75pt;width:36pt;height:9pt;
z-index:251659264;visibility:visible;mso-wrap-style:square;
mso-wrap-distance-left:9pt;mso-wrap-distance-top:0;mso-wrap-distance-right:9pt;
mso-wrap-distance-bottom:0;mso-position-horizontal:absolute;
mso-position-horizontal-relative:text;mso-position-vertical:absolute;
mso-position-vertical-relative:text;v-text-anchor:top' o:gfxdata="UEsDBBQABgAIAAAAIQApm/tGBAEAAB4CAAATAAAAW0NvbnRlbnRfVHlwZXNdLnhtbKSRzU7DMBCE
70i8g+UrSpxyQAgl6YGfI3AoD7DYm8TCsS2vW9q3Z5OmF1RVSFws2+uZ+TSu1/vRiR0mssE3clVW
UqDXwVjfN/Jj81LcS0EZvAEXPDbygCTX7fVVvTlEJMFqT40cco4PSpEecAQqQ0TPky6kETIfU68i
6C/oUd1W1Z3SwWf0uciTh2zrJ+xg67J43vP1kYTlUjwe301RjYQYndWQGVRNU3VWl9DRBeHOm190
xUJWsnI2p8FGulkS3riaZA2Kd0j5FUbmUNrZ+BkgGWUSfHNRdNqsysvYZ9JD11mNJujtyI2Ui+Pf
4jO3jWpe/58825xy1fy77Q8AAAD//wMAUEsDBBQABgAIAAAAIQCtMD/xwQAAADIBAAALAAAAX3Jl
bHMvLnJlbHOEj80KwjAQhO+C7xD2btN6EJGmvYjgVfQB1mTbBtskZOPf25uLoCB4m2XYb2bq9jGN
4kaRrXcKqqIEQU57Y12v4HTcLdYgOKEzOHpHCp7E0DbzWX2gEVN+4sEGFpniWMGQUthIyXqgCbnw
gVx2Oh8nTPmMvQyoL9iTXJblSsZPBjRfTLE3CuLeVCCOz5CT/7N911lNW6+vE7n0I0KaiPe8LCMx
9pQU6NGGs8do3ha/RVXk5iCbWn4tbV4AAAD//wMAUEsDBBQABgAIAAAAIQCAY7GBIAMAAI8HAAAf
AAAAY2xpcGJvYXJkL2RyYXdpbmdzL2RyYXdpbmcxLnhtbKxV224aMRB9r9R/sPyeAAHaFGWJCG2q
SlGCgCjPE6+XXcVrb20DS76m39Iv64z3Akmjtmn7AuOZM+MzN+/ZeZkrtpHWZUZHvHfc5UxqYeJM
ryJ+u7w8OuXMedAxKKNlxHfS8fPx2zdnMFpZKNJMMIyg3QginnpfjDodJ1KZgzs2hdRoS4zNwePR
rjqxhS1GzlXnpNt918kh03y8D/URPLC1zf4ilDLiQcZT0BtwGFKJ0aGm5qjEv0eGkd58tsWimFli
Lq43M8uyOOJYOQ05loh3akMNw2PnmddqH6BMbE54kySsjPjg5LQ77A8520W83+32h8NuFU+WngkC
DN9j7TgTCOj1Boip70tvfhNBpJ9+GQNJVmRQOCDoCqKnNz9n3G8yXhK5C1Oyfps7oZkvUYk0SRtK
0MRwdfX+T/ItcRgV1vnP0uSMhIhbKXwYMNhcOV+xaCAhK3OZKRX6pfQTBcasNDJJMEjr3TD35SLU
hVKMd+R6j/+YszV4L/bHFeIyQxJX4PwMLK4HKnHR/A3+JMpsI25qibPU2MeX9ITHkUUrZ1tct4i7
r2uwkjP1RbuIf+gNBhjWh0OYDM7soeX+0KLX+dQo6gexCyI6W68aMbEmvzM2ntCtaAIt8O6I+0ac
ejyhAXdayMkkyMLkBfgrvShwu3qh2lTiZXkHtqj74HFArs0ihUK+1I4KW9V/svYmyepeVTUlg3J+
4XdKhl6FytOA5WCvAgkU5iQEKFEhAZOcCc82QDnjolSrEjq7R1zIpMF6V2HDRiEM/ffWSeKf43q0
iNVQITQA0KsebUtoixwV0FMq9dHtAuv+iFTILcQyKotpAMPBru6nytYMiGwb+xB2OI4wksD8rpAJ
CHx3llkuHbuWWzY3OWh8IlKwToZWhrIJ9+doTATJEzE//v6NthfTJ2X4xQaQpp39tZOLYo57UlWj
WQ5HHaMYSs9lgq8kPgZ16vSNkG2+IITUvnknAprcEqzNaxxrPLlWdXqNc+sRbjbat855po2tOvaU
dvzQUE4qfChLnTUWiJ7OzrNPUYDUn0763h2exz8AAAD//wMAUEsDBBQABgAIAAAAIQBhY5DGCQYA
APMZAAAaAAAAY2xpcGJvYXJkL3RoZW1lL3RoZW1lMS54bWzsWc1uGzcQvhfoOxB7b6TVr2VEDmxZ
SppESRApKXKkdqldRtzlYknZ0bnHAr00LXpon6BoEaA9pm/TuiiKwq/QIfdHpEXXiZFDDvbB2J39
hvw4M/qG3L1951XC0AnJBeXp0PNvNT1E0oCHNI2G3rP55LM9DwmJ0xAznpKhtyHCu3Pw6Se38X7A
aLbgOA/nMUkIgoFSsY+HXixltt9oiADMWNziGUnh2ZLnCZZwm0eNMMenMEHCGq1ms9dIME29AxhR
qoHGDP6lUihDwPKZGoagFCcw+2gtJE9Qp63h4cpXILERI5ajE8yGHgwb8tM5eSU9xLCQ8GDoNfWf
1zi43cD7pROTl/gafhP9V/qVDuGqpefMo0U9qd/1x/6kHl8DmNzFTfxJZ9Kpx9MAHASw2IKLOWa3
3e/3D0usASoud8fudlqddt/CG+O3dziPmq1Bp2XhNagYv7ODPx50jyDsRQwNUHHZ3cEfNQd+b2Dh
NajA93bw49Fxs79n4TUoZjRd7aC748H40C/RNWTJ2T0nvH/Y6fYq+BYF1VAXmJpiyVNpldtMbhhB
MyLRwFOABL/k+QRQ6oZhSVMkNxlZ4gBqc04TItAjcoqe8gSnihzeJ9hAFKZA7JjUxEgEOc3k0Luf
gbMBOX/78/nbX9H52zdn33999sOPxSiWyz2cRqbLP29e//vlt26gMIF//vLVH79/4wbCL2i7uLPv
3vz1myLw90+vHfDDHC9MuDMWNmWyyN/PYx5janocppHAKVYRdzAay9hCP9pghh24I2KH7nkOCuIC
3l2/tAjP4nwtqWPEB3FiAaecsyOeVxVhReGBmssI83ydRu7J87WJe4rxiWvuEU6t/I7XGagndQ05
iolF8wnDqcQRSaHa1TO+IsSxuheUWnGd0iDngi8lekHREabOkMzpwqqmrdM9mkBeNi6CkG8rNtPn
6Igz16qPyYmNhJ8DZg7yc8KsMN7Fa4kT15BznDAz4A+xjF0kZ5s8MHFjISHTEWEcjUMihMvncQ7r
NZL+AINoOdM+ZZvERuaSrlxjPsScm8hjvhrFOMlc2BlNYxP7uVhBiWL0hEsXfMrtX4i6hzzg9NJ0
P6fESvfVavCMRhalbYGoJ+vckcu7hFv1O9uwJSZadkHXLaVOaHoj2zeyrbv0jWzrMGx/YDeyDXu3
G9mGjefHINtbpQYRV1ulYj+ud+eJvTl/vFzSgOh9+ZIypvfqD4XemgvoT+EEjMpPn0NJfVjLYrhU
/QQmsHBRjrUPyrn8gsp4FuMMSsPXM0SiHDoSKOMCzpTa7BxbHw7WyZSHxZnU99X5s2hhAsutvdmt
7XCUkAW61y+Nip8++wJfzTbSR+KKgPJ9HxLGZDaJtoNEvzJeQUKv7IOwGDhY7Knhq1TthAKo1VmB
DRSCbdfQ63bABZzgNIUZCVWeilRX2VXJqa4/SKYvCyYzK6AJ7znKCthmeqC4Xro8tbqi1N4h0xYJ
o9xsEjoy+lgqYhySsjqV9V1ovG+uB9uUWvRUKMpYGDT6e//H4rq5Br+L2sBSUylYik6HXq/dhZIJ
cDb0lnCsh8skg9oRauOLWQQvxgKZFz/46yhLlgt5jEVcBFyLTqEGCZUkR4wmQ08tv04DS7WGaG5+
CwThoyU3AFn52MhB0u0kk+WSBNJMu2FRkS5uQeELrXA+1e7XBytPvoZ0z+LwFC3YOn+KocS6fV8F
MKQCGrBfRDOk8M6yFrJt/V2Qq1J2zZeGuoYKO2ZZjMuOYop5Adf9pKaj7+oYGHflmiGgRkjKRriI
VIM1g2p107p1FRwu7bpXO6nIGaK57ZmWqqiu6VYxa4YPKv0GqyrE0LPNDl9I90XJHVRaB4Xq7BIQ
8Dp+12v9BrXtZBY1xXhXhpVml1abWrXAK6i9S5MwVL9XDXshbnWPcE4Hxmt1fvC7WLVgWlb7Sh1p
1yeIKc7QIvKHHnwDgBcnr+AKviJ4YGspW0vZ4Ao+DUC7KN7nD73yorLA88JSY9qVpV1hOpWlU1m6
laVbWXqVpech/eIbvreod94eqt5rQw8r34OXewv7O83BfwAAAP//AwBQSwMEFAAGAAgAAAAhAJxm
RkG7AAAAJAEAACoAAABjbGlwYm9hcmQvZHJhd2luZ3MvX3JlbHMvZHJhd2luZzEueG1sLnJlbHOE
j80KwjAQhO+C7xD2btJ6EJEmvYjQq9QHCMk2LTY/JFHs2xvoRUHwsjCz7DezTfuyM3liTJN3HGpa
AUGnvJ6c4XDrL7sjkJSl03L2DjksmKAV201zxVnmcpTGKSRSKC5xGHMOJ8aSGtHKRH1AVzaDj1bm
IqNhQaq7NMj2VXVg8ZMB4otJOs0hdroG0i+hJP9n+2GYFJ69elh0+UcEy6UXFqCMBjMHSldnnTUt
XYGJhn39Jt4AAAD//wMAUEsBAi0AFAAGAAgAAAAhACmb+0YEAQAAHgIAABMAAAAAAAAAAAAAAAAA
AAAAAFtDb250ZW50X1R5cGVzXS54bWxQSwECLQAUAAYACAAAACEArTA/8cEAAAAyAQAACwAAAAAA
AAAAAAAAAAA1AQAAX3JlbHMvLnJlbHNQSwECLQAUAAYACAAAACEAgGOxgSADAACPBwAAHwAAAAAA
AAAAAAAAAAAfAgAAY2xpcGJvYXJkL2RyYXdpbmdzL2RyYXdpbmcxLnhtbFBLAQItABQABgAIAAAA
IQBhY5DGCQYAAPMZAAAaAAAAAAAAAAAAAAAAAHwFAABjbGlwYm9hcmQvdGhlbWUvdGhlbWUxLnht
bFBLAQItABQABgAIAAAAIQCcZkZBuwAAACQBAAAqAAAAAAAAAAAAAAAAAL0LAABjbGlwYm9hcmQv
ZHJhd2luZ3MvX3JlbHMvZHJhd2luZzEueG1sLnJlbHNQSwUGAAAAAAUABQBnAQAAwAwAAAAA
" filled="f" stroked="f">
<v:textbox>
<![if !mso]>
<table cellpadding=0 cellspacing=0 width="100%">
<tr>
<td><![endif]>
<div>
<p class=MsoNormal>
<o:p> </o:p></p>
</div>
<![if !mso]></td>
</tr>
</table>
<![endif]></v:textbox>
<w:wrap type="square"/>
</v:shape><![endif]--><!--[if !vml]--><img align="left" height="11" hspace="9" src="cid:clip_image002.png" v:shapes="Text_x0020_Box_x0020_3" width="38" /><!--[endif]--><span style="color: windowtext; font-family: "helvetica" , sans-serif; mso-bidi-font-family: Helvetica; mso-fareast-language: EN-US; mso-no-proof: yes;"><!--[if gte vml 1]><v:shapetype
id="_x0000_t75" coordsize="21600,21600" o:spt="75" o:preferrelative="t"
path="m@4@5l@4@11@9@11@9@5xe" filled="f" stroked="f">
<v:stroke joinstyle="miter"/>
<v:formulas>
<v:f eqn="if lineDrawn pixelLineWidth 0"/>
<v:f eqn="sum @0 1 0"/>
<v:f eqn="sum 0 0 @1"/>
<v:f eqn="prod @2 1 2"/>
<v:f eqn="prod @3 21600 pixelWidth"/>
<v:f eqn="prod @3 21600 pixelHeight"/>
<v:f eqn="sum @0 0 1"/>
<v:f eqn="prod @6 1 2"/>
<v:f eqn="prod @7 21600 pixelWidth"/>
<v:f eqn="sum @8 21600 0"/>
<v:f eqn="prod @7 21600 pixelHeight"/>
<v:f eqn="sum @10 21600 0"/>
</v:formulas>
<v:path o:extrusionok="f" gradientshapeok="t" o:connecttype="rect"/>
<o:lock v:ext="edit" aspectratio="t"/>
</v:shapetype><v:shape id="Picture_x0020_6" o:spid="_x0000_i1025" type="#_x0000_t75"
style='width:101pt;height:99pt;visibility:visible;mso-wrap-style:square'>
<v:imagedata src="file:////Users/ericneumann/Library/Group%20Containers/UBF8T346G9.Office/msoclip1/01/clip_image003.gif"
o:title="" croptop="-118f" cropbottom="-118f" cropleft="-1f" cropright="43f"/>
</v:shape><![endif]--><!--[if !vml]--><!--[endif]--></span><o:p></o:p><br />
<h2>
Simplicial Databases</h2>
<h2>
<o:p></o:p></h2>
<div class="MsoNormal">
The ability to compose and decompose objects is very useful
and mathematically sound, and enables databases to be quite flexible. In fact,
any set of <i style="mso-bidi-font-style: normal;">k</i>-joined entities can be
(if one needs to) decomposed generally into <i style="mso-bidi-font-style: normal;">k</i>
subsets of <i style="mso-bidi-font-style: normal;">k-1 </i>CP entities, which
then can be decomposed into <i style="mso-bidi-font-style: normal;">k(k-1)/2</i>
subsets of <i style="mso-bidi-font-style: normal;">k-2</i> CP entities, etc,
until we arrive at the <i style="mso-bidi-font-style: normal;">k</i> atomic
entities. This structure is commonly known as a <i style="mso-bidi-font-style: normal;">Simplex </i>and the data instance constructs are known as <i style="mso-bidi-font-style: normal;">Simplicial Sets</i>, and was first described
by David Spivak as having many uses in data storage [2]. One application of
them is in statistical inference when computing/analyzing joint and marginal frequencies
or probabilities of mixed combinations of similar events or attributes. For
example, if a patient has a tumor containing somatic mutations [EGFR amp, P53,
PTEN], a <i style="mso-bidi-font-style: normal;">mutation simplex</i> is defined
that may be part of a larger mutation pattern [EGFR amp, CDK4, P53, PTEN] that
some patients have, as well as subsuming smaller patterns of others: [EGFR amp,
PTEN] and [EGFR amp, P53]. The entities are different subsets of mutations that
are co-occurring, and may each contain the incidence counts for each
combination found in patients, or an identified molecular interaction between the
co-occurring mutations. This is a numeric example, which can be further
combined with other data.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
It is worthwhile considering that the actual physical
storage implementation of a <i style="mso-bidi-font-style: normal;">Simplicial </i>database
[1] does not have to allocate every mutation combination possible, nor every
combination that exists within sets of patients. The logical constraints are
complete, so the model may need to only allocate those for which useful data
can be associated (e.g., therapies). This can be considered a form of storage caching
and compression, for faster look-ups and associations.<span style="mso-spacerun: yes;"> </span>Nonetheless, a simple analysis of real
genomic data from ~1000 cancer patients required only a few million unique
simplicial entities to be allocated and linked, which makes this highly
tractable in today’s large-scale storage systems. Moreover, in some data spaces
where events are strongly mutually associated, the combinatorics is not
unbounded, and often simplicial sets become saturated (relatively sparse) at
intermediate and lower levels. <o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Note the hierarchy of entities from large mutation
combinations to smaller subsets form a “sieve”. Each patient’s pattern is
linked to the top (complete) entity, and then filters down to all the subsets
contained within that pattern, providing information of which patients share a
particular sub-pattern. If these mutation distributions are not statistically
independent, it provides evidence there is an underlying mechanism at work [see
Fichtenholtz, 2016]. The simplicial database makes it very efficient to find
all cases of shared patterns, compared to a query filter (for each) in a
relational DB or an edge traversal in a data graph. The mutation simplex is
formed directly from calculating and indexing the patterns from a list of
mutations for each patient’s analysis, and is cost efficient after most
patterns are captured.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Returning to our original PATIENT×PHYSICIAN×DATE example,
one can build a simplicial model around the PATIENT×PHYSICIAN pair (edge)
linked to a sequence of dates (vertices) to create an implicit series visits
(=PATIENT×PHYSICIAN×DATE), i.e., triangular faces. This structure includes a
PHYSICIAN×DATE edge, which maps to all the patients that doctor has seen on the
same day. A clear advantage of this form of database, is that all key
combinations are pre-computed (aka pre-joined), so a simple canonical n-way hash
of the values can find the full set of data in a single lookup; this is very
well-suited for fast analytics, where multiple lookups are equivalent to query
caching. Another advantage is that the CP entities have clear automatic types
and can be handled exactly by type-dependent downstream processes, specifically
by descriptive algebras supporting CP entities (e.g., MUTATION_SIMPLEX<span style="font-family: "ms mincho" , serif; mso-bidi-font-family: "MS Mincho";">⊗</span>DISEASE<span style="font-family: "ms mincho" , serif; mso-bidi-font-family: "MS Mincho";">⊗</span>THERAPY
<span style="font-family: "wingdings"; mso-ascii-font-family: "Times New Roman"; mso-ascii-theme-font: minor-latin; mso-char-type: symbol; mso-hansi-font-family: "Times New Roman"; mso-hansi-theme-font: minor-latin; mso-symbol-font-family: Wingdings;"><span style="mso-char-type: symbol; mso-symbol-font-family: Wingdings;">à</span></span>
DISEASE<span style="font-family: "ms mincho" , serif; mso-bidi-font-family: "MS Mincho";">⊗</span>RESPONSE).
The combined simplicial set naturally lends itself to analytics for effective
treatments based on genomics and disease types.<o:p></o:p></div>
<h2>
Datacomb<o:p></o:p></h2>
<div class="MsoNormal">
The basis for the ideas presented here arise from Category
Theory (CT), which ensures logical consistency within data model schema. The
interconnected set of simplicial entities is described as a <i style="mso-bidi-font-style: normal;">simplicial complex</i> (partial overlaps of
different simplicial elements) and is a well-defined object in CT<a href="applewebdata://77DBCF5D-F7B1-49B6-A4F7-E9159879AEBC#_ftn4" name="_ftnref4" style="mso-footnote-id: ftn4;" title=""><span class="MsoFootnoteReference"><span style="mso-special-character: footnote;"><!--[if !supportFootnotes]--><span class="MsoFootnoteReference"><span style="font-family: "times new roman" , serif; font-size: 12.0pt; line-height: 130%;">[4]</span></span><!--[endif]--></span></span></a>,
and is at the heart of the formal definition of what we call a <b style="mso-bidi-font-weight: normal;">Datacomb</b>. The complex possesses a
formal query algebra for any subset of simplicial entities, and can be used to
extract any geometric (connected) subset of data, including measurable things
like frequencies. Note also that any graph data-model is automatically a subset
of a datacomb since it is just the 1-D skeleton (vertices and edges) of the complex.
The datacomb model can be implemented on top of a few different storage
technologies, such as multi-array DBs, RDBs, key-valued NOSQL DBs, graph DBs,
and (materialized) column-stores (relational systems may not be practical since
they require explicit types and type-specific keying). The simplicial logic
that is required to interface with them can be layered on top of the existing
technologies, so that a common API can be installed on different storage
technologies. In fact, RDF could actually be used as a universal description
for internal structures in any data system (not only triplestores). All in all,
the datacomb approach is a more rigorously defined solution for complex data
sets than offered by the data lake meme, one with real definable specifications
and multiple analytic and mining applications.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The datacomb can be applied to several different settings: most
naturally, it can be mapped to any existing data-array storage systems already
in place, with the extension to more flexibly and automatically handle complex-typed
objects, useful for precomputing data for use in downstream analytics. In
relational DB instances, frequently materialized joins can be more formally and
efficiently captured and accessed using a datacomb framework, making it easier and
faster to query on conjoined content, as well as recalling the atomic entities
on demand. Datacombs serve as the common superset for both data-arrays and
relational data, and therefore form a powerful higher-order framework that covers
both data analytics and full sets of non-numeric data. Inasmuch, the datacomb
offers a lot of advantages to organize and define datasets for any machine-learning
tasks, by flexibly formatting raw data into pre-processed structures required
by many ML platforms. <o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
In addition, when dealing with closely related entities
(e.g., lists of genes and their coded proteins), instead of ambiguously
choosing one or the other identifier (e.g., P204392) for recalling the whole
set of related data records, a simplex of the related entities would provide a
much more even and efficient way to get all the matches. It would then be keyed
by any one entity (vertices), or the hashed-sum of the full set (k-cell). This
would go a long way to solving the biomedical disambiguation problem. This is
the formal equivalent of earlier attempts like SRS<a href="applewebdata://77DBCF5D-F7B1-49B6-A4F7-E9159879AEBC#_ftn5" name="_ftnref5" style="mso-footnote-id: ftn5;" title=""><span class="MsoFootnoteReference"><span style="mso-special-character: footnote;"><!--[if !supportFootnotes]--><span class="MsoFootnoteReference"><span style="font-family: "times new roman" , serif; font-size: 12.0pt; line-height: 130%;">[5]</span></span><!--[endif]--></span></span></a>
to connect multiple related molecular entities. <o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Datacombs can also handle non-local data by serving as local
caches of all the intra- and inter-relations between data records (e.g.,
genomic data references), providing something much more substantial in function
and structure than existing data lake models, analogous to a universal data
switchboard. A cloud-based implementation should be very effective by managing
all the relations between simple and complex entities from thousands (or more)
of different sources. It would then effectively solve what the semantic web
initiative had always alluded to do but never did: explicitly handling of
complex entity logic (indexing, typing, and filtering) from data that resides
in multiple sources, which are usually thought to be (yet unsupported) in the
purview of ontologies.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Many organizations intending to utilize their collections of
data more effectively are positioning themselves around big data. Yet most of
the data environments are a mixture of different classes of technologies,
developed/installed at different times, for different goals, and
accessed/managed by different groups. Trying to unify this heterogeneous mix
will have a broad range of costs depending on the type of technology used and the
urgency for completing it (and of course thoroughness of the solution). This
easily can range from $100,000’s to $millions; but the cost of doing this incorrectly
within a time limit may be even orders of magnitude greater (over $100
millions) due to the business impact of a non-optimal solution, and the new added
cost—and additional time—of doing it right the second time. The looming
challenge facing many organizations means they need to properly and confidently
choose the best approach, fully considering both the maturity of the
technologies, and enhanced paradigms for reducing development and maintenance
costs. There is concern that no database product from any traditional company
is quite ready for the challenge. The consumer must therefore rely on their own
knowledge of their precise needs and determine what level of innovation in
which they will be willing to invest. A brave new world is emerging for
information technologies.<o:p></o:p></div>
<div class="MsoNormal" style="line-height: 13.0pt; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<br /></div>
<div class="MsoNormal" style="line-height: 13.0pt; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<u>References<o:p></o:p></u></div>
<div class="MsoNormal">
1 –Stein, Brian; Morrison, Alan (2014). in <a href="http://www.pwc.com/en_US/us/technology-forecast/2014/cloud-computing/assets/pdf/pwc-technology-forecast-data-lakes.pdf"><i style="mso-bidi-font-style: normal;"><span style="color: windowtext; text-decoration: none; text-underline: none;">Data lakes and the promise of unsiloed data</span></i></a> (pdf) (Report).
Technology Forecast: Rethinking integration. PricewaterhouseCooper.<o:p></o:p></div>
<div class="MsoNormal" style="line-height: 13.0pt; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
2 – David I. Spivak.<span style="mso-spacerun: yes;">
</span><i style="mso-bidi-font-style: normal;">Simplicial Databases</i>, <a href="https://arxiv.org/abs/0904.2012">https://arxiv.org/abs/0904.2012</a>, 2009<span style="color: black; font-family: "times" , serif; font-size: 13.0pt;"> <o:p></o:p></span></div>
<div class="MsoNormal" style="line-height: 16.0pt; mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;">
<span style="color: black; font-family: "times" , serif; font-size: 13.0pt;">3 - </span><span style="color: black; mso-ascii-font-family: "Times New Roman"; mso-bidi-font-family: "Times New Roman"; mso-bidi-font-size: 13.0pt; mso-hansi-font-family: "Times New Roman";">Fichtenholtz, AM, Camarda,
ND, Neumann EK. <i style="mso-bidi-font-style: normal;">Knowledge-Based
Bioinformatics Predicting Significance of Unknown Variants In Glial Tumors
Through Sub-Class Enrichment</i>. pp 297-308, Pacific Symposium on Biocomputing
2016. </span><span style="color: black; font-family: "times" , serif; font-size: 11.0pt;"><o:p></o:p></span></div>
<!--[if !mso]>
<style>
v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style>
<![endif]--><!--[if gte mso 9]><xml>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]-->
<!--[if gte mso 9]><xml>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves>false</w:TrackMoves>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>EN-US</w:LidThemeOther>
<w:LidThemeAsian>X-NONE</w:LidThemeAsian>
<w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:EnableOpenTypeKerning/>
<w:DontFlipMirrorIndents/>
<w:OverrideTableStyleHps/>
</w:Compatibility>
<m:mathPr>
<m:mathFont m:val="Cambria Math"/>
<m:brkBin m:val="before"/>
<m:brkBinSub m:val="--"/>
<m:smallFrac m:val="off"/>
<m:dispDef/>
<m:lMargin m:val="0"/>
<m:rMargin m:val="0"/>
<m:defJc m:val="centerGroup"/>
<m:wrapIndent m:val="1440"/>
<m:intLim m:val="subSup"/>
<m:naryLim m:val="undOvr"/>
</m:mathPr></w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="false"
DefSemiHidden="false" DefQFormat="false" DefPriority="99"
LatentStyleCount="382">
<w:LsdException Locked="false" Priority="0" QFormat="true" Name="Normal"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 1"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 2"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 3"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 4"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 5"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 6"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 7"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 8"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="heading 9"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 4"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 5"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 6"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 7"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 8"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index 9"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" Name="toc 1"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" Name="toc 2"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" Name="toc 3"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" Name="toc 4"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" Name="toc 5"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" Name="toc 6"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" Name="toc 7"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" Name="toc 8"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" Name="toc 9"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Normal Indent"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="footnote text"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="annotation text"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="header"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="footer"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="index heading"/>
<w:LsdException Locked="false" Priority="35" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="caption"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="table of figures"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="envelope address"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="envelope return"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="footnote reference"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="annotation reference"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="line number"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="page number"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="endnote reference"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="endnote text"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="table of authorities"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="macro"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="toa heading"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Bullet"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Number"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List 4"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List 5"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Bullet 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Bullet 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Bullet 4"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Bullet 5"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Number 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Number 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Number 4"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Number 5"/>
<w:LsdException Locked="false" Priority="10" QFormat="true" Name="Title"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Closing"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Signature"/>
<w:LsdException Locked="false" Priority="1" SemiHidden="true"
UnhideWhenUsed="true" Name="Default Paragraph Font"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Body Text"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Body Text Indent"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Continue"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Continue 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Continue 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Continue 4"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="List Continue 5"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Message Header"/>
<w:LsdException Locked="false" Priority="11" QFormat="true" Name="Subtitle"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Salutation"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Date"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Body Text First Indent"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Body Text First Indent 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Note Heading"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Body Text 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Body Text 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Body Text Indent 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Body Text Indent 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Block Text"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Hyperlink"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="FollowedHyperlink"/>
<w:LsdException Locked="false" Priority="22" QFormat="true" Name="Strong"/>
<w:LsdException Locked="false" Priority="20" QFormat="true" Name="Emphasis"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Document Map"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Plain Text"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="E-mail Signature"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Top of Form"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Bottom of Form"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Normal (Web)"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Acronym"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Address"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Cite"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Code"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Definition"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Keyboard"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Preformatted"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Sample"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Typewriter"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="HTML Variable"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Normal Table"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="annotation subject"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="No List"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Outline List 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Outline List 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Outline List 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Simple 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Simple 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Simple 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Classic 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Classic 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Classic 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Classic 4"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Colorful 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Colorful 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Colorful 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Columns 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Columns 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Columns 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Columns 4"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Columns 5"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Grid 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Grid 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Grid 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Grid 4"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Grid 5"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Grid 6"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Grid 7"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Grid 8"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table List 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table List 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table List 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table List 4"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table List 5"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table List 6"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table List 7"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table List 8"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table 3D effects 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table 3D effects 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table 3D effects 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Contemporary"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Elegant"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Professional"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Subtle 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Subtle 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Web 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Web 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Web 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Balloon Text"/>
<w:LsdException Locked="false" Priority="39" Name="Table Grid"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Table Theme"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Note Level 1"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Note Level 2"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Note Level 3"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Note Level 4"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Note Level 5"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Note Level 6"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Note Level 7"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Note Level 8"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Note Level 9"/>
<w:LsdException Locked="false" SemiHidden="true" Name="Placeholder Text"/>
<w:LsdException Locked="false" Priority="1" QFormat="true" Name="No Spacing"/>
<w:LsdException Locked="false" Priority="60" Name="Light Shading"/>
<w:LsdException Locked="false" Priority="61" Name="Light List"/>
<w:LsdException Locked="false" Priority="62" Name="Light Grid"/>
<w:LsdException Locked="false" Priority="63" Name="Medium Shading 1"/>
<w:LsdException Locked="false" Priority="64" Name="Medium Shading 2"/>
<w:LsdException Locked="false" Priority="65" Name="Medium List 1"/>
<w:LsdException Locked="false" Priority="66" Name="Medium List 2"/>
<w:LsdException Locked="false" Priority="67" Name="Medium Grid 1"/>
<w:LsdException Locked="false" Priority="68" Name="Medium Grid 2"/>
<w:LsdException Locked="false" Priority="69" Name="Medium Grid 3"/>
<w:LsdException Locked="false" Priority="70" Name="Dark List"/>
<w:LsdException Locked="false" Priority="71" Name="Colorful Shading"/>
<w:LsdException Locked="false" Priority="72" Name="Colorful List"/>
<w:LsdException Locked="false" Priority="73" Name="Colorful Grid"/>
<w:LsdException Locked="false" Priority="60" Name="Light Shading Accent 1"/>
<w:LsdException Locked="false" Priority="61" Name="Light List Accent 1"/>
<w:LsdException Locked="false" Priority="62" Name="Light Grid Accent 1"/>
<w:LsdException Locked="false" Priority="63" Name="Medium Shading 1 Accent 1"/>
<w:LsdException Locked="false" Priority="64" Name="Medium Shading 2 Accent 1"/>
<w:LsdException Locked="false" Priority="65" Name="Medium List 1 Accent 1"/>
<w:LsdException Locked="false" SemiHidden="true" Name="Revision"/>
<w:LsdException Locked="false" Priority="34" QFormat="true"
Name="List Paragraph"/>
<w:LsdException Locked="false" Priority="29" QFormat="true" Name="Quote"/>
<w:LsdException Locked="false" Priority="30" QFormat="true"
Name="Intense Quote"/>
<w:LsdException Locked="false" Priority="66" Name="Medium List 2 Accent 1"/>
<w:LsdException Locked="false" Priority="67" Name="Medium Grid 1 Accent 1"/>
<w:LsdException Locked="false" Priority="68" Name="Medium Grid 2 Accent 1"/>
<w:LsdException Locked="false" Priority="69" Name="Medium Grid 3 Accent 1"/>
<w:LsdException Locked="false" Priority="70" Name="Dark List Accent 1"/>
<w:LsdException Locked="false" Priority="71" Name="Colorful Shading Accent 1"/>
<w:LsdException Locked="false" Priority="72" Name="Colorful List Accent 1"/>
<w:LsdException Locked="false" Priority="73" Name="Colorful Grid Accent 1"/>
<w:LsdException Locked="false" Priority="60" Name="Light Shading Accent 2"/>
<w:LsdException Locked="false" Priority="61" Name="Light List Accent 2"/>
<w:LsdException Locked="false" Priority="62" Name="Light Grid Accent 2"/>
<w:LsdException Locked="false" Priority="63" Name="Medium Shading 1 Accent 2"/>
<w:LsdException Locked="false" Priority="64" Name="Medium Shading 2 Accent 2"/>
<w:LsdException Locked="false" Priority="65" Name="Medium List 1 Accent 2"/>
<w:LsdException Locked="false" Priority="66" Name="Medium List 2 Accent 2"/>
<w:LsdException Locked="false" Priority="67" Name="Medium Grid 1 Accent 2"/>
<w:LsdException Locked="false" Priority="68" Name="Medium Grid 2 Accent 2"/>
<w:LsdException Locked="false" Priority="69" Name="Medium Grid 3 Accent 2"/>
<w:LsdException Locked="false" Priority="70" Name="Dark List Accent 2"/>
<w:LsdException Locked="false" Priority="71" Name="Colorful Shading Accent 2"/>
<w:LsdException Locked="false" Priority="72" Name="Colorful List Accent 2"/>
<w:LsdException Locked="false" Priority="73" Name="Colorful Grid Accent 2"/>
<w:LsdException Locked="false" Priority="60" Name="Light Shading Accent 3"/>
<w:LsdException Locked="false" Priority="61" Name="Light List Accent 3"/>
<w:LsdException Locked="false" Priority="62" Name="Light Grid Accent 3"/>
<w:LsdException Locked="false" Priority="63" Name="Medium Shading 1 Accent 3"/>
<w:LsdException Locked="false" Priority="64" Name="Medium Shading 2 Accent 3"/>
<w:LsdException Locked="false" Priority="65" Name="Medium List 1 Accent 3"/>
<w:LsdException Locked="false" Priority="66" Name="Medium List 2 Accent 3"/>
<w:LsdException Locked="false" Priority="67" Name="Medium Grid 1 Accent 3"/>
<w:LsdException Locked="false" Priority="68" Name="Medium Grid 2 Accent 3"/>
<w:LsdException Locked="false" Priority="69" Name="Medium Grid 3 Accent 3"/>
<w:LsdException Locked="false" Priority="70" Name="Dark List Accent 3"/>
<w:LsdException Locked="false" Priority="71" Name="Colorful Shading Accent 3"/>
<w:LsdException Locked="false" Priority="72" Name="Colorful List Accent 3"/>
<w:LsdException Locked="false" Priority="73" Name="Colorful Grid Accent 3"/>
<w:LsdException Locked="false" Priority="60" Name="Light Shading Accent 4"/>
<w:LsdException Locked="false" Priority="61" Name="Light List Accent 4"/>
<w:LsdException Locked="false" Priority="62" Name="Light Grid Accent 4"/>
<w:LsdException Locked="false" Priority="63" Name="Medium Shading 1 Accent 4"/>
<w:LsdException Locked="false" Priority="64" Name="Medium Shading 2 Accent 4"/>
<w:LsdException Locked="false" Priority="65" Name="Medium List 1 Accent 4"/>
<w:LsdException Locked="false" Priority="66" Name="Medium List 2 Accent 4"/>
<w:LsdException Locked="false" Priority="67" Name="Medium Grid 1 Accent 4"/>
<w:LsdException Locked="false" Priority="68" Name="Medium Grid 2 Accent 4"/>
<w:LsdException Locked="false" Priority="69" Name="Medium Grid 3 Accent 4"/>
<w:LsdException Locked="false" Priority="70" Name="Dark List Accent 4"/>
<w:LsdException Locked="false" Priority="71" Name="Colorful Shading Accent 4"/>
<w:LsdException Locked="false" Priority="72" Name="Colorful List Accent 4"/>
<w:LsdException Locked="false" Priority="73" Name="Colorful Grid Accent 4"/>
<w:LsdException Locked="false" Priority="60" Name="Light Shading Accent 5"/>
<w:LsdException Locked="false" Priority="61" Name="Light List Accent 5"/>
<w:LsdException Locked="false" Priority="62" Name="Light Grid Accent 5"/>
<w:LsdException Locked="false" Priority="63" Name="Medium Shading 1 Accent 5"/>
<w:LsdException Locked="false" Priority="64" Name="Medium Shading 2 Accent 5"/>
<w:LsdException Locked="false" Priority="65" Name="Medium List 1 Accent 5"/>
<w:LsdException Locked="false" Priority="66" Name="Medium List 2 Accent 5"/>
<w:LsdException Locked="false" Priority="67" Name="Medium Grid 1 Accent 5"/>
<w:LsdException Locked="false" Priority="68" Name="Medium Grid 2 Accent 5"/>
<w:LsdException Locked="false" Priority="69" Name="Medium Grid 3 Accent 5"/>
<w:LsdException Locked="false" Priority="70" Name="Dark List Accent 5"/>
<w:LsdException Locked="false" Priority="71" Name="Colorful Shading Accent 5"/>
<w:LsdException Locked="false" Priority="72" Name="Colorful List Accent 5"/>
<w:LsdException Locked="false" Priority="73" Name="Colorful Grid Accent 5"/>
<w:LsdException Locked="false" Priority="60" Name="Light Shading Accent 6"/>
<w:LsdException Locked="false" Priority="61" Name="Light List Accent 6"/>
<w:LsdException Locked="false" Priority="62" Name="Light Grid Accent 6"/>
<w:LsdException Locked="false" Priority="63" Name="Medium Shading 1 Accent 6"/>
<w:LsdException Locked="false" Priority="64" Name="Medium Shading 2 Accent 6"/>
<w:LsdException Locked="false" Priority="65" Name="Medium List 1 Accent 6"/>
<w:LsdException Locked="false" Priority="66" Name="Medium List 2 Accent 6"/>
<w:LsdException Locked="false" Priority="67" Name="Medium Grid 1 Accent 6"/>
<w:LsdException Locked="false" Priority="68" Name="Medium Grid 2 Accent 6"/>
<w:LsdException Locked="false" Priority="69" Name="Medium Grid 3 Accent 6"/>
<w:LsdException Locked="false" Priority="70" Name="Dark List Accent 6"/>
<w:LsdException Locked="false" Priority="71" Name="Colorful Shading Accent 6"/>
<w:LsdException Locked="false" Priority="72" Name="Colorful List Accent 6"/>
<w:LsdException Locked="false" Priority="73" Name="Colorful Grid Accent 6"/>
<w:LsdException Locked="false" Priority="19" QFormat="true"
Name="Subtle Emphasis"/>
<w:LsdException Locked="false" Priority="21" QFormat="true"
Name="Intense Emphasis"/>
<w:LsdException Locked="false" Priority="31" QFormat="true"
Name="Subtle Reference"/>
<w:LsdException Locked="false" Priority="32" QFormat="true"
Name="Intense Reference"/>
<w:LsdException Locked="false" Priority="33" QFormat="true" Name="Book Title"/>
<w:LsdException Locked="false" Priority="37" SemiHidden="true"
UnhideWhenUsed="true" Name="Bibliography"/>
<w:LsdException Locked="false" Priority="39" SemiHidden="true"
UnhideWhenUsed="true" QFormat="true" Name="TOC Heading"/>
<w:LsdException Locked="false" Priority="41" Name="Plain Table 1"/>
<w:LsdException Locked="false" Priority="42" Name="Plain Table 2"/>
<w:LsdException Locked="false" Priority="43" Name="Plain Table 3"/>
<w:LsdException Locked="false" Priority="44" Name="Plain Table 4"/>
<w:LsdException Locked="false" Priority="45" Name="Plain Table 5"/>
<w:LsdException Locked="false" Priority="40" Name="Grid Table Light"/>
<w:LsdException Locked="false" Priority="46" Name="Grid Table 1 Light"/>
<w:LsdException Locked="false" Priority="47" Name="Grid Table 2"/>
<w:LsdException Locked="false" Priority="48" Name="Grid Table 3"/>
<w:LsdException Locked="false" Priority="49" Name="Grid Table 4"/>
<w:LsdException Locked="false" Priority="50" Name="Grid Table 5 Dark"/>
<w:LsdException Locked="false" Priority="51" Name="Grid Table 6 Colorful"/>
<w:LsdException Locked="false" Priority="52" Name="Grid Table 7 Colorful"/>
<w:LsdException Locked="false" Priority="46"
Name="Grid Table 1 Light Accent 1"/>
<w:LsdException Locked="false" Priority="47" Name="Grid Table 2 Accent 1"/>
<w:LsdException Locked="false" Priority="48" Name="Grid Table 3 Accent 1"/>
<w:LsdException Locked="false" Priority="49" Name="Grid Table 4 Accent 1"/>
<w:LsdException Locked="false" Priority="50" Name="Grid Table 5 Dark Accent 1"/>
<w:LsdException Locked="false" Priority="51"
Name="Grid Table 6 Colorful Accent 1"/>
<w:LsdException Locked="false" Priority="52"
Name="Grid Table 7 Colorful Accent 1"/>
<w:LsdException Locked="false" Priority="46"
Name="Grid Table 1 Light Accent 2"/>
<w:LsdException Locked="false" Priority="47" Name="Grid Table 2 Accent 2"/>
<w:LsdException Locked="false" Priority="48" Name="Grid Table 3 Accent 2"/>
<w:LsdException Locked="false" Priority="49" Name="Grid Table 4 Accent 2"/>
<w:LsdException Locked="false" Priority="50" Name="Grid Table 5 Dark Accent 2"/>
<w:LsdException Locked="false" Priority="51"
Name="Grid Table 6 Colorful Accent 2"/>
<w:LsdException Locked="false" Priority="52"
Name="Grid Table 7 Colorful Accent 2"/>
<w:LsdException Locked="false" Priority="46"
Name="Grid Table 1 Light Accent 3"/>
<w:LsdException Locked="false" Priority="47" Name="Grid Table 2 Accent 3"/>
<w:LsdException Locked="false" Priority="48" Name="Grid Table 3 Accent 3"/>
<w:LsdException Locked="false" Priority="49" Name="Grid Table 4 Accent 3"/>
<w:LsdException Locked="false" Priority="50" Name="Grid Table 5 Dark Accent 3"/>
<w:LsdException Locked="false" Priority="51"
Name="Grid Table 6 Colorful Accent 3"/>
<w:LsdException Locked="false" Priority="52"
Name="Grid Table 7 Colorful Accent 3"/>
<w:LsdException Locked="false" Priority="46"
Name="Grid Table 1 Light Accent 4"/>
<w:LsdException Locked="false" Priority="47" Name="Grid Table 2 Accent 4"/>
<w:LsdException Locked="false" Priority="48" Name="Grid Table 3 Accent 4"/>
<w:LsdException Locked="false" Priority="49" Name="Grid Table 4 Accent 4"/>
<w:LsdException Locked="false" Priority="50" Name="Grid Table 5 Dark Accent 4"/>
<w:LsdException Locked="false" Priority="51"
Name="Grid Table 6 Colorful Accent 4"/>
<w:LsdException Locked="false" Priority="52"
Name="Grid Table 7 Colorful Accent 4"/>
<w:LsdException Locked="false" Priority="46"
Name="Grid Table 1 Light Accent 5"/>
<w:LsdException Locked="false" Priority="47" Name="Grid Table 2 Accent 5"/>
<w:LsdException Locked="false" Priority="48" Name="Grid Table 3 Accent 5"/>
<w:LsdException Locked="false" Priority="49" Name="Grid Table 4 Accent 5"/>
<w:LsdException Locked="false" Priority="50" Name="Grid Table 5 Dark Accent 5"/>
<w:LsdException Locked="false" Priority="51"
Name="Grid Table 6 Colorful Accent 5"/>
<w:LsdException Locked="false" Priority="52"
Name="Grid Table 7 Colorful Accent 5"/>
<w:LsdException Locked="false" Priority="46"
Name="Grid Table 1 Light Accent 6"/>
<w:LsdException Locked="false" Priority="47" Name="Grid Table 2 Accent 6"/>
<w:LsdException Locked="false" Priority="48" Name="Grid Table 3 Accent 6"/>
<w:LsdException Locked="false" Priority="49" Name="Grid Table 4 Accent 6"/>
<w:LsdException Locked="false" Priority="50" Name="Grid Table 5 Dark Accent 6"/>
<w:LsdException Locked="false" Priority="51"
Name="Grid Table 6 Colorful Accent 6"/>
<w:LsdException Locked="false" Priority="52"
Name="Grid Table 7 Colorful Accent 6"/>
<w:LsdException Locked="false" Priority="46" Name="List Table 1 Light"/>
<w:LsdException Locked="false" Priority="47" Name="List Table 2"/>
<w:LsdException Locked="false" Priority="48" Name="List Table 3"/>
<w:LsdException Locked="false" Priority="49" Name="List Table 4"/>
<w:LsdException Locked="false" Priority="50" Name="List Table 5 Dark"/>
<w:LsdException Locked="false" Priority="51" Name="List Table 6 Colorful"/>
<w:LsdException Locked="false" Priority="52" Name="List Table 7 Colorful"/>
<w:LsdException Locked="false" Priority="46"
Name="List Table 1 Light Accent 1"/>
<w:LsdException Locked="false" Priority="47" Name="List Table 2 Accent 1"/>
<w:LsdException Locked="false" Priority="48" Name="List Table 3 Accent 1"/>
<w:LsdException Locked="false" Priority="49" Name="List Table 4 Accent 1"/>
<w:LsdException Locked="false" Priority="50" Name="List Table 5 Dark Accent 1"/>
<w:LsdException Locked="false" Priority="51"
Name="List Table 6 Colorful Accent 1"/>
<w:LsdException Locked="false" Priority="52"
Name="List Table 7 Colorful Accent 1"/>
<w:LsdException Locked="false" Priority="46"
Name="List Table 1 Light Accent 2"/>
<w:LsdException Locked="false" Priority="47" Name="List Table 2 Accent 2"/>
<w:LsdException Locked="false" Priority="48" Name="List Table 3 Accent 2"/>
<w:LsdException Locked="false" Priority="49" Name="List Table 4 Accent 2"/>
<w:LsdException Locked="false" Priority="50" Name="List Table 5 Dark Accent 2"/>
<w:LsdException Locked="false" Priority="51"
Name="List Table 6 Colorful Accent 2"/>
<w:LsdException Locked="false" Priority="52"
Name="List Table 7 Colorful Accent 2"/>
<w:LsdException Locked="false" Priority="46"
Name="List Table 1 Light Accent 3"/>
<w:LsdException Locked="false" Priority="47" Name="List Table 2 Accent 3"/>
<w:LsdException Locked="false" Priority="48" Name="List Table 3 Accent 3"/>
<w:LsdException Locked="false" Priority="49" Name="List Table 4 Accent 3"/>
<w:LsdException Locked="false" Priority="50" Name="List Table 5 Dark Accent 3"/>
<w:LsdException Locked="false" Priority="51"
Name="List Table 6 Colorful Accent 3"/>
<w:LsdException Locked="false" Priority="52"
Name="List Table 7 Colorful Accent 3"/>
<w:LsdException Locked="false" Priority="46"
Name="List Table 1 Light Accent 4"/>
<w:LsdException Locked="false" Priority="47" Name="List Table 2 Accent 4"/>
<w:LsdException Locked="false" Priority="48" Name="List Table 3 Accent 4"/>
<w:LsdException Locked="false" Priority="49" Name="List Table 4 Accent 4"/>
<w:LsdException Locked="false" Priority="50" Name="List Table 5 Dark Accent 4"/>
<w:LsdException Locked="false" Priority="51"
Name="List Table 6 Colorful Accent 4"/>
<w:LsdException Locked="false" Priority="52"
Name="List Table 7 Colorful Accent 4"/>
<w:LsdException Locked="false" Priority="46"
Name="List Table 1 Light Accent 5"/>
<w:LsdException Locked="false" Priority="47" Name="List Table 2 Accent 5"/>
<w:LsdException Locked="false" Priority="48" Name="List Table 3 Accent 5"/>
<w:LsdException Locked="false" Priority="49" Name="List Table 4 Accent 5"/>
<w:LsdException Locked="false" Priority="50" Name="List Table 5 Dark Accent 5"/>
<w:LsdException Locked="false" Priority="51"
Name="List Table 6 Colorful Accent 5"/>
<w:LsdException Locked="false" Priority="52"
Name="List Table 7 Colorful Accent 5"/>
<w:LsdException Locked="false" Priority="46"
Name="List Table 1 Light Accent 6"/>
<w:LsdException Locked="false" Priority="47" Name="List Table 2 Accent 6"/>
<w:LsdException Locked="false" Priority="48" Name="List Table 3 Accent 6"/>
<w:LsdException Locked="false" Priority="49" Name="List Table 4 Accent 6"/>
<w:LsdException Locked="false" Priority="50" Name="List Table 5 Dark Accent 6"/>
<w:LsdException Locked="false" Priority="51"
Name="List Table 6 Colorful Accent 6"/>
<w:LsdException Locked="false" Priority="52"
Name="List Table 7 Colorful Accent 6"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Mention"/>
<w:LsdException Locked="false" SemiHidden="true" UnhideWhenUsed="true"
Name="Smart Hyperlink"/>
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:12.0pt;
mso-para-margin-left:0in;
line-height:130%;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:"Times New Roman",serif;
mso-ascii-font-family:"Times New Roman";
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:"Times New Roman";
mso-hansi-theme-font:minor-latin;
mso-bidi-font-family:"Times New Roman";
mso-bidi-theme-font:minor-bidi;
color:black;
mso-themecolor:text1;
mso-fareast-language:JA;}
</style>
<![endif]--><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1028"/>
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1"/>
</o:shapelayout></xml><![endif]-->
<!--StartFragment-->
<!--EndFragment--><br />
<div style="mso-element: footnote-list;">
<!--[if !supportFootnotes]--><br clear="all" />
<hr align="left" size="1" width="33%" />
<!--[endif]-->
<br />
<div id="ftn1" style="mso-element: footnote;">
<div class="MsoFootnoteText">
<a href="applewebdata://77DBCF5D-F7B1-49B6-A4F7-E9159879AEBC#_ftnref1" name="_ftn1" style="mso-footnote-id: ftn1;" title=""><span class="MsoFootnoteReference"><span style="mso-special-character: footnote;"><!--[if !supportFootnotes]--><span class="MsoFootnoteReference"><span style="font-family: "times new roman" , serif; font-size: 10.0pt; line-height: 130%;">[1]</span></span><!--[endif]--></span></span></a> See <a href="https://en.wikipedia.org/wiki/SS_Edmund_Fitzgerald">https://en.wikipedia.org/wiki/SS_Edmund_Fitzgerald</a>
<o:p></o:p></div>
</div>
<div id="ftn2" style="mso-element: footnote;">
<div class="MsoFootnoteText">
<a href="applewebdata://77DBCF5D-F7B1-49B6-A4F7-E9159879AEBC#_ftnref2" name="_ftn2" style="mso-footnote-id: ftn2;" title=""><span class="MsoFootnoteReference"><span style="mso-special-character: footnote;"><!--[if !supportFootnotes]--><span class="MsoFootnoteReference"><span style="font-family: "times new roman" , serif; font-size: 10.0pt; line-height: 130%;">[2]</span></span><!--[endif]--></span></span></a>
Regularized structures that are semantically flexible, as with honeycombs in
beehives <o:p></o:p></div>
</div>
<div id="ftn3" style="mso-element: footnote;">
<div class="MsoFootnoteText">
<a href="applewebdata://77DBCF5D-F7B1-49B6-A4F7-E9159879AEBC#_ftnref3" name="_ftn3" style="mso-footnote-id: ftn3;" title=""><span class="MsoFootnoteReference"><span style="mso-special-character: footnote;"><!--[if !supportFootnotes]--><span class="MsoFootnoteReference"><span style="font-family: "times new roman" , serif; font-size: 10.0pt; line-height: 130%;">[3]</span></span><!--[endif]--></span></span></a> <span style="font-size: 10.5pt; mso-bidi-font-size: 12.0pt;">One could argue that
EVENT=DATE×LOCATION should be used rather than DATE, but often it is not needed
since location does not change within a day.</span><o:p></o:p></div>
</div>
<div id="ftn4" style="mso-element: footnote;">
<div class="MsoFootnoteText">
<a href="applewebdata://77DBCF5D-F7B1-49B6-A4F7-E9159879AEBC#_ftnref4" name="_ftn4" style="mso-footnote-id: ftn4;" title=""><span class="MsoFootnoteReference"><span style="mso-special-character: footnote;"><!--[if !supportFootnotes]--><span class="MsoFootnoteReference"><span style="font-family: "times new roman" , serif; font-size: 10.0pt; line-height: 130%;">[4]</span></span><!--[endif]--></span></span></a> They
are at the heart of new methodologies including topological data analysis (TDA)<o:p></o:p></div>
</div>
<div id="ftn5" style="mso-element: footnote;">
<div class="MsoFootnoteText">
<a href="applewebdata://77DBCF5D-F7B1-49B6-A4F7-E9159879AEBC#_ftnref5" name="_ftn5" style="mso-footnote-id: ftn5;" title=""><span class="MsoFootnoteReference"><span style="mso-special-character: footnote;"><!--[if !supportFootnotes]--><span class="MsoFootnoteReference"><span style="font-family: "times new roman" , serif; font-size: 10.0pt; line-height: 130%;">[5]</span></span><!--[endif]--></span></span></a> <a href="https://www.ncbi.nlm.nih.gov/pubmed/12176845">https://www.ncbi.nlm.nih.gov/pubmed/12176845</a>
<o:p></o:p></div>
</div>
</div>
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4038835287140851931.post-5536664999071533272010-06-02T16:15:00.000-07:002010-06-04T15:55:49.300-07:00Is Linked Data too brittle?"Once we've linked all public data together using RDF, the world will have unprecedented access to real usable data and then things will begin to happen." - OH<br />
<br />
Sounds great on the surface, but so far my experience suggests this solves less then 25% of the information problem - here are my thoughts why...<br />
<ul><li>powerful data access demands powerful data interfaces - we aren't there yet by a long shot!</li>
<li>non-standard URIs prevent commercial acceptance (e.g. in the life sciences) - a social issue!</li>
<li>and most importantly: semantic linking offers little improvement if one simply converts one data syntax (tabular) to another (RDF) - here's where I think we can improve things now!</li>
</ul>Most tabular semantics are terrible (and often devoid of it), and they were defined so as to quickly arrange and store information to operate within a row-column access protocol (for a more in-depth discussion see <a href="http://recombinantdata.blogspot.com/2010/06/from-tables-to-rdf.html">From Tables to RDF)</a>. But just as bad is the manifestation of a data table to work across the Web. As RDF, data now becomes a kind of global <b>Truth</b> when it really is most often just one <b>Facet</b> of contextual facts associated with some of the contained objects.<br />
<br />
Some may argue that NamedGraphs and Reification can come to the rescue here by providing appropriate <span class="Apple-style-span" style="font-style: italic;">Fact Semantics<span class="Apple-style-span" style="font-style: normal;">- perhaps, but unfortunately they do not appear to be part of any projects like <b>LOD <span class="Apple-style-span" style="font-weight: normal;">and from what I can tell these possibilities are non-normative, which is opposite of what public efforts need<span class="Apple-style-span" style="font-style: italic;"><span class="Apple-style-span" style="font-style: normal;">. Projecting data into RDF without consideration of </span>Context<span class="Apple-style-span" style="font-style: normal;"> or <span class="Apple-style-span" style="font-style: italic;">Fact Semantics<span class="Apple-style-span" style="font-style: normal;"> leads to creating mountains of <span class="Apple-style-span" style="font-style: italic; font-weight: bold;">brittle data <span class="Apple-style-span" style="font-weight: normal;"><span class="Apple-style-span" style="font-style: normal;">that can only be used in limited cases, i.e., only around the context they were created under, such as a gene expression study. Researchers trying to build up knowledge about </span><span class="Apple-style-span">genes</span><span class="Apple-style-span" style="font-style: normal;"> in general will have a tougher time separating universal truths from contextual ones (e.g., experimental results)</span></span><span class="Apple-style-span" style="font-style: normal; font-weight: normal;">. And since RDF conversions are happening now all over the web, if one does not take care, we all could get contaminated with irregular facts based on the brittleness of the implied data semantics- a very real </span><span class="Apple-style-span" style="font-weight: normal;">Tower of Semantic Babel</span><span class="Apple-style-span" style="font-style: normal; font-weight: normal;">!</span></span></span></span></span></span></span></b></span></span><br />
<br />
In the case of gene expression data which contains genes and their tissues-specific expression measurements, such data must be viewed in the context of the experiment (i.e., the conditions, interventions, tissue sampling, background genetics, etc). Simply turning an expression set into gene-expression-value RDF triples would be an inappropriate form for web publishing: it makes the gene information brittle and of limited use! Unfortunately, I have not seen any recorded discussion on how to address this, since a lot of efforts are about convincing as many people as possible "<i>to convert their data to RDF</i>". I think this is dangerous prescription and a data integrity bubble is growing that will eventually burst!<br />
<br />
Let's step back a bit and review the history...<br />
<br />
The shift in describing the Semantic Web from a system of <i>information semantics</i> to <i>linking data across resources</i> was a technically subtle but strategically important move. Strong efforts by the W3C trying for many years to explain the need for information semantics were met with confusion and disagreement as to what semantics meant (the irony of poor semantics of semantics is not missed).<br />
<br />
At the end of the day, the message of reducing syntactical ambiguity of information (every data type needed a different parser, e.g., XML) was lost on most people (parsers keep people employed!). The notion of turning HTML links from formless web links into clear relation types was not obvious to many. Basically, people felt the web obviously "looked" as if it had semantics (the blue colored links were situated at meaningful locations in text), so why all this extra semantic work? Who really needs <i>machine-readable data</i>? It already goes through web servers and browsers so isn't it already <i>machine-readable</i>?<br />
<br />
By shifting focus to "linking data", those individuals involved in data interchange and storage (the IT guys) were brought into the discussion, and they seemed to have been able to better grasp the significance of using a standard like RDF. By saying the linked data enables open connecting and handling of data from diverse locations on the web, many of the subtleties of the Semantic Web began to make more real-world sense to folks. Specifically, most IT developers have struggled for years in companies to provide standardized means of integrating there databases with little practical results to show. This <b>Linked Data</b> idea actually looked like it might have promise... hurray!!<br />
<br />
Still, there was some confusion around "what is a URI exactly?", is it an identifier, is it a web location, what do I find when I go there? IMHO this could have been handled better (another post eventually), by discussing the semantic theoretics of URI <i>before</i> moving to discuss RDF (TBL design discussion on URI were not very intuitive to most data experts). I think the issues around URI have begun to get settled and most people are OK with it now-- for the most part, religious wars around LSIDs and other URN approaches seem to have subsided.<br />
<br />
However, all these discussions have focused primarily on mapping existing data structures (linked tables) to a web-based way of doing things. OK for some, but many in the life sciences need the newly converted information to be in a form that is ready for day-to-day research (e.g. tab-delimited formats), and not just with public sector data. Data semantics should clearly empower informaticists beyond what the can quickly do with tables and perl scripts - they need gene information to be readily applied to SNP analysis, gene expression studies, or molecular structure analyses. If commercial groups are to get involved, the issues around <i>fact semantics</i> and <i>data brittleness</i> need to be addressed ASAP!<br />
<br />
My own efforts involving <i><b>Data Articulation</b></i> try to address this by offering a strategy that realizes there is no singular way of describing connected information, yet some forms may be more appropriate for public resource publishing, while others are more suited for deep computational analytics and mining. <i>Data articulation</i> provides a method of taking contextualized data forms (including Named Graphs) and generating internal forms (e.g., workspaces) optimized for computational objectives. In addition, while this approach can take advantage of <i>ontologies</i>, it cannot by itself be captured in any single ontology (it's actually meta-ontological). That's because <i>d</i><span class="Apple-style-span" style="font-style: italic;">ata articulation<span class="Apple-style-span" style="font-style: normal;"> is really about applying the right </span>rule transform<span class="Apple-style-span" style="font-style: normal;"> (SPARQL construct) for the right semantics and context. In fact, it may not even require any complex ontologies to be available to or part of the data sets; perhaps ontologies can be "</span>injected<span class="Apple-style-span" style="font-style: normal;">"</span><span class="Apple-style-span" style="font-style: normal;"> at the time they are actually required rather than being non-modal and global.</span></span><br />
<br />
A good example comes from mining and analyzing pathway data that can be obtained in the <i>BioPax OWL</i> format from Reactome and other sources. BioPax supports a lot of semantic structures including recursive <i>protein complex structures</i>; data articulation allows us to create reaction steps from Reactome-BioPax that include the proteins as direct participants of a reaction. This allows much faster pathway queries and traversals and improved pathway visualizations (topic for another blog). These efficient forms are not necessarily what you would wish to publish, but could be (with the proper context) included explicitly within the set. Indeed, I think there is a strong relation between <span class="Apple-style-span" style="font-style: italic;">data articulation <span class="Apple-style-span" style="font-style: normal;">and semantic data visualization, which I am in the midst of exploring with BBN (yeah, the original guys).</span></span><br />
<br />
I strongly believe<i> data articulation</i> is key for taking data from context-rich forms from around the Web, and flexibly transforming them into the proper scientific semantic forms for a specific task. For this to work, the initial source forms on the web must explicitly include all <i>contextual</i> and <i>fact semantics</i>, and we'll need to develop proper semantic standards that will work correctly with different data domains coming from their corresponding communities (life sciences, financial, news-media, etc). For now, <span class="Apple-style-span" style="font-style: italic;">data articulation</span> is a de facto part of the solutions my company provides, but I hope it becomes common place. There is strong demand for it from my clients when they are presented with the issues of data utilization and life cycles.<br />
<br />
<div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><span class="Apple-style-span" style="font-style: italic;"><span class="Apple-style-span" style="font-style: normal;"><span class="Apple-style-span" style="font-style: italic;"><span class="Apple-style-span" style="font-style: normal;"><span class="Apple-style-span" style="font-style: italic; font-weight: bold;"><span class="Apple-style-span" style="font-style: normal; font-weight: normal;">As more new </span><span class="Apple-style-span" style="font-weight: normal;">linked data apprentices</span><span class="Apple-style-span" style="font-style: normal; font-weight: normal;"> convert their tables into RDF, piles of <b><i>brittle data<span class="Apple-style-span" style="font-weight: normal;"><span class="Apple-style-span" style="font-style: normal;"> will </span></span></i><span class="Apple-style-span" style="font-weight: normal;">contin</span><span class="Apple-style-span" style="font-weight: normal;">ue to grow, and may actually impede the uptake and use of <i>linked data</i></span></b>. For some of us who have advocated semantic approaches for over 10 years, this is a serious concern. We need to be making realistic plans about what kind of semantics need to accompany public and proprietary data sets when converted. Perhaps we should propose an new semantic linked data challenge?</span></span></span></span></span></span></div><div><br />
</div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4038835287140851931.post-81858110025839270282010-06-02T15:09:00.000-07:002010-06-02T15:09:44.364-07:00From Tables to RDFA lot of us have converted basic tabular data into RDF in our local projects, but going beyond these simple examples, the discussions of how best to transform table data seem to be limited. For instance, when should column-based values be treated as direct predicates of a row subject, and when should cells be treated as objects linked by double-key predicates (e.g. one from a gene-probe object and one from a sample object). In the case of gene expression data, clearly the latter should is preferred. But where on the web are these useful rules and pattern written down for interested SW newbies? I hope the following discussion may somehow promote the formation of better RDF data pattern resources...<br />
<br />
Most data tabular semantics are quite poor or even non-existent. They are defined to work within particular established information technologies like RDBM, and minimize focus on content meaning (i.e., technologies before content). This can be clearly understood from an economics point of view, where selling a DB technology scales better than building the superior "intelligent" solution for each data set (maybe that's why SW took so long to catch on?). In any case, existing data tables probably lack the necessary semantics most of us in the SW community are used to expecting. In some cases, better semantics can be added since it's a class-general object-attribute adjustment; in other cases, it may require metadata and context that was never properly captured and now is lost for good. Nonetheless, we need to be aware of these issues going forward with RDF-izing both legacy and new data systems.<br />
<br />
As a useful example, a table containing rows of patients with certain symptoms or adverse events to drugs SHOULD NOT be RDF-encoded where the patient has a direct symptom-attribute! Why, because the symptom or AE occurred and was observed at a specific time, therefore the patient should really be linked to an observation with metadata on time, place, test, and physician, and then the observed symptoms linked into the observation object. CDISCs SDTM was designed to handled this context of visits to the clinic and clinical findings; much of this comes from the BRIDG model that SDTM, HL7 RCRIM, and NCI follow. In this case semantics are available, but it also means converting SDTM data as row-column-cell triples will not work, since implicit (anonymous) finding and observation nodes need to be properly inserted in between attributes (described in the draft DSE Note at W3C HCLSIG).<br />
<br />
But other cases exist where the semantics have yet to be defined. For example DrugBank is a data model for critical information about a drug at the time of its approval. It would therefore make sense to "date" the individual records with this approval date, but that means associating the "creation date" with the <i>DrugCard record</i>, not the approved <i>Drug</i> itself. Any new facts gained about a drug over time, new indications, label changes, adverse events, etc, should be associated to the drug with the proper context, (possibly a versioned DrugCard). Therefore DrugBank has at least 2 classes of primary subjects: DrugCard records and Drugs, both which require URIs. In addition, DrugCard records will need to be versioned and linked to previous forms. This often cannot be inferred by non-domain informaticists looking just at the data tables, but rather requires working side-by-side with drug experts in the domain. To date, this has been woefully unsupported, even in groups like HCLS where input from pharmaceutical experts in occasional.<br />
<br />
Serious LOD efforts should work more closely with the key domain experts to properly preserve and correct the source data semantics. It is also clear we need to create URIs with the proper authorities than the current quickie approach, since it doens't look convincing for a company to see "www.fu-berlin.de" as the data domain. I have had to convert many of the LODD source data sets into proper semantic forms mainly because the available LODD sets contain flaws and poor semantics that prevent commercial use of them. These are necessary principles to follow if we intend to offer more data value by using <i>semantic linked data</i> standards.Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-4038835287140851931.post-1973440191145662452009-02-23T09:04:00.000-08:002009-02-24T05:16:49.848-08:00The Graveyards of KnowledgeWeb content has its blessings: it is easy to publish and style-edit. The rise of <span class="blsp-spelling-error" id="SPELLING_ERROR_0">wikis</span> and blogs indicates the Web has come of age... <div><br /></div><div>But there is also a dark side to some of this as well. There are some lessons to learn from approaches that are not so successful. Content Management is an essential part of any company's existence. Tools that easily enable users to create spaces for uploading thematic content have been gratefully embraced. Yet too often it is easy to upload a document, send a notice to all you have done so, and then loose track of it. We think we're putting it in a safe and accessible place, but human's by themselves can't keep track of thousands of digital assets. </div><div><br /></div><div>One colleague of mine at Aventis called a commonly used content management system, "a Graveyard of Knowledge". Technical folks also refer to this as "a technology mouse trap": information goes in but it rarely come out. Of course many of us have been told "that's what search engines are for". But what do you 'search on' to find precisely that one doc you sort of remember in terms of bits and pieces? Once your content management system holds a reasonable 10,000 items, those word pairings used in the search won't always work completely. You find some docs, not quite the right ones, and miss the important ones, and what's worse: you can't even estimate how much is not recovered! And if it's about the metadata and links, who is responsible for that? IT can't do it since it's about knowing the content.</div><div><br /><span style="font-weight:bold;">Governance, stewardship, ownership</span><div>There is no substitute for taking responsibility of handling content you've either created or requested. You as owner, know what it contains and for what it is relevant. Every digital creation should have a strong link back to its author (yes, I do mean RDF triples). This puts back the 'human-value' into the digital equation. Not only does it allow a reader to go back to the source, but it can also provide information on the circumstances and resolutions of the discussed issues.</div><div><br /></div><div>Data Stewardship has a special meaning in these days of content management and linking data: data, metadata, and annotations should be the responsibility of each contributor. In cases of some internal databases this translates into knowing a lot about the content, how it is updated, what domain QA principles are in place (rather than simply checking for completed data fields), and most importantly how well data consumers are able to utilize it.</div><div><br /></div><div>The support provided by RDF and data linking could be applied along specific policies to improve these issues. By themselves they won't solve them, since there needs to be an accompanying change in the culture, but not only for IT. The scientific producers/consumers should should be taking up the stewardship role more often, since it is their content, and so technologies must become useable enough to make their tasks possible. </div><div><br /></div><div>All scientists from now on need to become <span class="Apple-style-span" style="font-style: italic;">Data Stewards.</span> Consequently, all support systems need to be designed to work easily within their domain, i.e. no need for additional complicated applications or configuration tasks. And their are great examples of this already happening: internal<span class="Apple-style-span" style="font-weight: bold;"> Knowledge Wikis</span>. One example is Pfizerpedia, a system heavily used by Pfizer's researchers based on MediaWiki. Scientists already use them and in many cases, demand access to them.</div><div><br /></div><div>This wave is promising and should be allowed to grow, but a major element still missing is to easily allow direct links within these wikis to data records and metadata descriptors. These links will serve not only to improve human-requested searches, but machine-driven discovery as well, which is enormously scalable. Once these are integrated into existing content systems, real <span class="Apple-style-span" style="font-weight: bold;">Knowledge Environments</span> will begin to take shape in companies, and their usage should have a pronounce benefit in company innovation. Perhaps one research group in the company will be able to find the results of what was already performed by another group 8 months earlier, and successfully find a new therapy in half the time? Who can afford not to improve these days.</div></div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4038835287140851931.post-27447423002563854872008-12-30T11:37:00.000-08:002010-09-15T11:07:24.015-07:00What is Recombinant Data?I'm kicking-off this blog with a discussion of a general theme, but one that will come up again in subsequent topics. In fact, it's the name of this blog site: Recombinant Data. The reason I went so far as to name this site accordingly, is because the idea behind "Recombinant Data" is very powerful, yet it is counter to practices by software developers for the last several years. It therefore really deserves its own web site for clarification and building on examples, as well as ongoing community discussions. The first mention of Recombinant Data was by Eric Miller, while he was the W3C liaison for the Healthcare and Life Sciences Interest Group. Since then, I've used it countless times in presentations to various groups, since it is an essential cornerstone of the Semantic Web initiative [the topic of many future posts]. <br />
<br />
First, a little bit of background: The established way of thinking about software and data has been that an application is the primary point of user experience and the data it creates (and reads) is a persistent artifact whose (user) value depends very much on the application "to read it and to know what to do with it". In other words, data semantics is interpreted by a specific application, and therefore only within the context of that app. Consequently, the efficient re-use of data (data interoperability) is impeded, and it is now at the mercy of specialized contracts or "standards" that must be created between application sets (e.g., Adobe-PDF or Office Suite). <br />
<br />
Perhaps this model is good-enough for apps used always the same way by millions of consumers for things like word processing or presentations. But if there is to be any hope for improved interoperability in emerging and complex areas such as healthcare, scientific research, or other knowledge managing fields, waiting for the "right standards" to emerge is like waiting for bacteria to grow wings...[more on standard in another blog]. Standards aren't wrong; they should (from now on) be about practice and semantics, rather than data formats and APIs!<br />
<br />
Recombinant Data (RD) takes a very different starting point: it is about structuring data with minimum syntactic rules (MSR), yet with enough semantics so that the data output from one app can be easily read and handled by another app, even though neither app has any specific contract apart form the MSR. And though semantics are necessary for understanding what the data is, only knowledge of enough semantics (patients are a kind of person) is required by an external app (myMail) to use the necessary part of the data (patient identifiers about me). Being able to use the right subset of semantics for additional operations by various apps allows for the semantic-invariant mixing and separation data: no matter what gets pulled together from different sources or apps, the collective set (merged graph) is consistent and logically meaningful. And here is where RD gets its name, borrowing heavily from the biological concept of Recombinant DNA: "two sets of genomes can recombine with one another, without losing or destroying any of their genetic code". In Recombinant Data's case, the logic within the data content is preserved.<br />
<br />
Implicit here is the free and open access of semantic definitions, such that an app (or the developer) "can learn more about a given data's semantics" when necessary. This translates into the open publishing of semantic schemas and ontologies, to be used from anywhere on the web. Another requirement is for open-world logic assumptions: not having something does not mean it doesn't exist (e.g., just because a data set does not state "my nickname is Phaedrus" doesn't mean it isn't). Recombinant Data does alter some of the basics about trusting the completeness of data, but this can be re-established through other mechanisms (provenance tracking, verification, proofs, NamedGraphs)... but that's for another day. As each issue is sufficiently addressed, we will see data become "application independent", epitomizing true and sustainable interoperability. Applications that can work with RD will also become much more powerful and beneficial to users, and could spawn a new generation of cool, incrementally extensible apps (hint to you vendors!). I also plan to discuss some of these possibilities in the future as well...<br />
<br />
In closing of this inaugural blog, I see the emergence of the Semantic Web strongly requiring the rethinking of the relations between applications and data. This applies evenly to both commercial and open source software and resources. In fact, it has some fascinating implications for apps running on personal laptops and hand-helds (that should be addressed on another blog). I will also point out that there are forces that are trying to prevent this from happening. Since current thinking with commercial vendors is that income is associated with licensing apps, and app independent data will free users from data-format imposed lock-in, they will view Recombinant Data as being antithema to their objectives. However, this is completely wrong, since improved app functionality is what people really want, and Recombinant Data should always trump other approaches for improving apps. We just need to get the eco-system positioned properly so that basic market forces can take over...Unknownnoreply@blogger.com2