enhancer: EnhancementStructureOverview.png stanbolenhancementstructure.html

buildbot Fri, 23 Sep 2011 04:08:50 -0700

Author: buildbot
Date: Fri Sep 23 11:08:25 2011
New Revision: 796163

Log:
Staging update by buildbot


Added:
    
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/EnhancementStructureOverview.png
   (with props)
Modified:
    
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/stanbolenhancementstructure.html

Added: 
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/EnhancementStructureOverview.png
==============================================================================
Binary file - no diff available.

Propchange: 
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/EnhancementStructureOverview.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Modified: 
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/stanbolenhancementstructure.html
==============================================================================
--- 
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/stanbolenhancementstructure.html
 (original)
+++ 
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/stanbolenhancementstructure.html
 Fri Sep 23 11:08:25 2011
@@ -51,32 +51,35 @@
 <h2 id="overview">Overview</h2>
 <p>The Stanbol Enhancement Structure is build around the following main 
Concepts. Each of this concepts covers a specific aspect related to the 
enhancement process of content.</p>
 <p>The following list gives an overview about the concepts used by the Stanbol 
Enhancement Strucutre:</p>
+<p><img alt="Overview about the Stanbol Enhancement Structure" 
src="/EnhancementStructureOverview.png" title="Overview of the Stanbol 
Enhancement Structure" /></p>
 <ul>
 <li>
 <p><strong>ContentItem:</strong> This is the resource representing the parsed 
content. The URI of this resource depends on how the content was parsed to the 
Stanbol Enhancer. In case an absolute URI is provided by the request, than this 
URI is used. In all other cased the Stanbol Enhancer creates an URI based on 
the configured prefix or the URL of the service. The documentation of the 
RESTful service should provide more information about that.</p>
 </li>
 <li>
-<p><strong>Content:</strong> Several content model distinguish between Content 
(data) and the ContentItem (Interpretation of the Data). The Enhancement 
Structure currently only defines ContentItem, because there is no need to 
describe the data for the purpose of the enhancement process. Other components 
(such as the /store endpoint) might need to formally describe the data. For 
such use cases the sic:content property will be used to refer from the 
ContentItem to the Content. The URI representing the Content will be the same 
to be used to retrieve its data via a RESTful service. </p>
+<p><strong>sb:Content:</strong> Several content model distinguish between 
Content (data) and the ContentItem (Interpretation of the Data). The 
Enhancement Structure currently only defines ContentItem, because there is no 
need to describe the data for the purpose of the enhancement process. Other 
components (such as the /store endpoint) might need to formally describe the 
data. For such use cases the sic:content property will be used to refer from 
the ContentItem to the Content. The URI representing the Content will be the 
same to be used to retrieve its data via a RESTful service. </p>
 </li>
 <li>
-<p><strong>Enhancement:</strong> This provides metadata about extractions 
created by EnhancementEngines or present within the content. This includes the 
creator (usually a EnhancementEngine), the creation time, as well as relations 
to other enhancements. Users of the Stanbol Enhancer will typically not care 
about such data because out of the their perspective they represent 
Meta-Meta-Data (meta data about the metadata).</p>
+<p><strong>sb:Enhancement:</strong> This provides metadata about extractions 
created by EnhancementEngines or present within the content. This includes the 
creator (usually a EnhancementEngine), the creation time, as well as relations 
to other enhancements. Users of the Stanbol Enhancer will typically not care 
about such data because out of the their perspective they represent 
Meta-Meta-Data (meta data about the metadata). Every feature, suggestion or 
other piece of information extracted by any EnhancementEngine need to attach 
the metadata defined for this concept.</p>
 </li>
 <li>
-<p><strong>Annotation:</strong> An annotation describe a feature present 
within the parsed content. Such feature can have three sources. (1) the can 
originate form metadata present in the parsed content, (2) the can be extracted 
by analyzing the content itself and (3) they can be based on further processing 
Annotations of type (1) and (2). The Annotation provides the label, the type 
(e.g. Person, Organization, Location ) the role (e.g. Tag, Category, Keyword), 
the confidence and (if available) the link to the entity representing the 
extracted feature. It is the central concept for users that need to present all 
the things extracted from the parsed content.</p>
+<p><strong>sb:Annotation:</strong> An annotation describe some piece of 
knowledge extracted from the parsed content and/or the metadata of the content. 
Information provided by Annotations include the label, type and the confidence. 
In addition Annotations need to link at least to a single Occurrence and may 
have one or more Suggestions. Annotations can also be related/dependent to 
other Annotations. The EnhancementStructure defines only a small set of 
different Annotation types. Implementors of EnhancementEngines that extract 
specific kind of things (e.g. coreferences, events, â¦) may need to define 
there own Annotation types. Such Extensions should be called "**Annotation" and 
be defined as rdfs:subclass of any Annotation type defined by this Enhancement 
structure.</p>
 </li>
 <li>
-<p><strong>Occurrence:</strong> An Occurrence describes the actual location of 
the feature within the content or the metadata. Based on the type of the 
content there will be different types of Occurrences. A "text occurrence" will 
contain information such as the selected-text, the start/end position of the 
selection and the surrounding text to provide some context. An "image 
accurrence" will provide the top left and the bottom right position of the 
selected rectangle. A "metadata occurrence" will describe the property used for 
the annotation (e.g. dc:creator) the used standard (e.g. DCterms) and the 
value.</p>
+<p><strong>sb:Suggestion</strong> An suggestion describes an Resource (Entity, 
Topic, Category â¦) that an EnhancementEngine suggests as a possible match for 
an Annotation. Suggestions are typically created by Engines that further 
process - semantic lifting - of Annotations. However EnhancementEngines might 
also create both - the Annotation and the Suggestions. Suggestions are always 
linked to a single Annotations (functional property). They  define the label, 
the ID (typically the URI of the Resource), the type(s) of the suggested 
Resource and the confidence of the suggestion.</p>
+</li>
+<li>
+<p><strong>sb:Occurrence:</strong> An Occurrence describes the actual location 
of an extracted feature within the content. This location may be within the 
content or within parsed metadata. Occurrences are always linked to a single 
Annotation (functional property). Based on the type of the content there will 
be different types of Occurrences. This EnhancementStructure currently focus on 
two types of Occurrences: (1) TextOccurrence and (2) MetadataOccurrence. For 
details on the model of such Occurrence types see the according sections. 
EnhancementEngines that support the extraction of Features from content types 
that are not covered by this Specification (e.g. Pictures, Sound, Video) need 
to define there own Occurrence types. Such types should use the name 
"***Occurrence" and be defined as rdfs:subClassOf any of the Occurrence types 
defined in this specification.</p>
 </li>
 </ul>
-<p>When using the Enhancement Structure one need usually need to combine 
several of the above concepts to create meaningful statement.
-As an example take a natural language processing engine that needs to express 
the the word "Paris" found within an sentence like "I will travel to Prais next 
week" portably refers to a location.
-To express that it will need to combine the concepts </p>
-<ul>
-<li>Enhancement: to express that this feature was extracted by the Natural 
Language Processing Engine at a given time ...</li>
-<li>Annotation: to express that "Paris" represents a "Location" and has the 
role "Tag"</li>
-<li>Occurrence: to express where the selected text "Paris" is located within 
the analyzed content</li>
+<p>Enhancements encoded based on this specification need to confirm to the 
following rules:</p>
+<ul>
+<li>sb:Annotation and sb:Suggestion MUST also be of type sb:Enhancement and 
include the required metadata defined by sb:Enhancement.</li>
+<li>sb:Occurrences, sb:Annotations and Suggestions MUST include rdf:type 
information for all parent types. e.g. when adding a sb:TextOccurrences the 
rdf:type MUST include sb:TextOccurrence AND sb:Occurrences. Consumers are 
expected to NOT using any kind of reasoner therefore adding such additional 
information is the only way to ensure that queries for occurrences, annotations 
or suggestions provide the expected results.</li>
 </ul>
-<p>The same is true for consuming Enhancements. A client interested in 
presenting Tags, Categories and Keywords needs only information provided by the 
Annotation concept. To be able to highlight the actual location of detected 
features within the content on needs to also process information provided by 
the Occurrence concept.</p>
+<hr />
+<p>The parts below are currently under work</p>
+<hr />
 <h2 id="specification">Specification</h2>
 <h3 id="namespaces_and_used_notations">Namespaces and used Notations</h3>
 <p>While the Stanbol Enhancement Structure does define some Concepts and 
Properties it also uses a lot of existing things from other ontologies. To 
improve the readability of this specification namespace prefixes + local names 
are used instead of the full URLs by this specification.</p>
@@ -187,6 +190,7 @@ The following code segment shows the kno
 <li><strong>sb:entity</strong>: In case an annotation describes an Entity, 
this property provides the URI for the entity</li>
 <li><strong>sb:entity-type</strong>: In case an annotation describes an 
Entity, this property provides the rdf:types of the linked entity</li>
 <li><strong>sb:suggestion</strong>: Links to an other annotation that provides 
a suggestion for this one. This indicates that the Stanbol Enhancer requests 
the client to decide between the provided options - e.g. by some user 
interaction.</li>
+<li><strong>sb:occurrence</strong>: Optionally links to one or more 
sb:Occurrence of this annotation within the parsed Content. Note that there are 
several types of Occurrences (TextOccurrence, ImageOccurrence, 
MetadataOccurrence â¦) defined. If this property is missing, that the 
Annotation is assumed to be about the whole content (as referred to by the 
sb:extracted-from property).</li>
 </ul>
 <p><strong>Annotations Type</strong> describe the type of the annotated 
feature based on a terminology standardized by Stanbol. Current types 
include</p>
 <ul>
@@ -202,29 +206,116 @@ The following code segment shows the kno
 <li>sb:Tag: The feature can be suggested as tag for the parsed content.</li>
 <li>sb:Category: The feature provides a categorization for the parsed 
content.</li>
 <li>sb:Keyword: The feature describes a keyword within the parsed content 
TODO: describe the difference between keywords and tags</li>
-<li>sb:Suggestion: The feature is a suggestion for an other Annotations. </li>
 </ul>
 <p><em>NOTE</em>: Such roles should make it more easy to support additional 
Annotations roles as suggested by <a 
href="https://issues.apache.org/jira/browse/STANBOL-48";>STANBOL-48</a> and <a 
href="https://issues.apache.org/jira/browse/STANBOL-12";>STANBOL-12</a> that 
includes <a 
href="https://issues.apache.org/jira/browse/STANBOL-28";>STANBOL-28</a> and <a 
href="https://issues.apache.org/jira/browse/STANBOL-29";>STANBOL-29</a>.</p>
-<p>For <strong>Suggestions</strong> there are some additional constraints as 
defined by the following code block</p>
-<div class="codehilite"><pre><span class="sr">&lt;a&gt;</span> <span 
class="n">rdf:type</span> <span class="n">sb:Annotation</span>
-<span class="sr">&lt;a&gt;</span> <span class="n">dc:role</span> <span 
class="o">!</span><span class="n">sb:Suggestion</span>
-<span class="sr">&lt;a&gt;</span> <span class="n">sb:suggestion</span> <span 
class="sr">&lt;a1&gt;</span>
-    <span class="sr">&lt;a1&gt;</span> <span class="n">rdf:type</span> <span 
class="n">sb:Annotation</span>
-    <span class="sr">&lt;a1&gt;</span> <span class="n">dc:role</span> <span 
class="n">sb:Suggestion</span>
-    <span class="sr">&lt;a1&gt;</span> <span class="n">sb:confidence</span> 
<span class="n">ordering</span><span class="o">^^</span><span 
class="n">xsd:float</span>
-</pre></div>
-
-
-<p>This means:</p>
-<ul>
-<li>an Annotation may only define suggestion if it does not have the dc:role 
sb:Suggestion. This prohibits nested suggestions</li>
-<li>an Annotation lined by sb:suggestion con considered to be of the dc:role 
sb:Suggestion - even that it does not define this role explicitly.</li>
-<li>Annotations used as suggestions MUST define some way to allow clients to 
show them in the right order (</li>
-<li>the confidence value of annotations used as suggestions should be used to 
order suggestions when presented to the user. However Applications need to 
consider that such values are on an ordinal scale meaning that a value of "4" 
does NOT mean that it is twice as likely than a suggestion with an confidence 
of "2"!</li>
-</ul>
+<h3 id="sbsuggestion">sb:Suggestion</h3>
+<p>Suggestions are used by the Stanbol Enhancer to suggest possible values for 
the resolution features extracted from the parsed content. 
+Currently there are two different use cases for Suggestions defined</p>
+<ul>
+<li>(1) Entity Resolution:* Suggests entities for an Feature extracted from 
the content. Typically such suggestions are calculated based on the name of the 
feature found within the content (e.g. the selected text of a 
sb:TextOccurrence).</li>
+<li>(2) Field Value Suggestion:* Suggest a value for a specific property. This 
kind of suggestion are useful if an relation between two extracted features is 
detected. A typical example would be a person "Steve Jobs" with the role "CEO" 
of the company "Apple Inc". Such relations can be detected by NLP tools. 
However suggestions like this are also central for semantic lifting of RDFa 
annotations as shown in the example below.</li>
+</ul>
+<p>sb:Suggestion uses the following properties</p>
+<ul>
+<li><strong>sb:entity</strong>: The id of the suggested Entity</li>
+<li><strong>sb:entity-type</strong>: The type(s) of the suggested Entity</li>
+<li><strong>sb:confidence</strong>: Needed to sort in case of multiple 
suggestions</li>
+<li><strong>sb:field</strong>: Defines the property this suggestion should 
become the value if accepted by the user</li>
+</ul>
+<p>In addition all sb:Suggestions are also of type sb:Enhancement to allow 
EnhancementEngine to provide enhancement metadata for them.</p>
+<p>for details how they are used please see the following Example</p>
+<p>==== Example ====</p>
+<p>As example lets assume that the following RDFa annotated content is parsed 
to the Stanbol Enhancer</p>
+<p><span typeof="cal:Vevent">
+       <h3 property="dc:title"> Stanbol Teleconference </h3>
+       <span property="cal:summary>
+           <p> Agenda: </p>
+           <ul>
+               <li> ... </li>
+           <ul>
+           <p> Participants: </p>
+           <ul>
+               <li typeof="foaf:Person" property="foaf:name">Rupert 
Westenthaler</li>
+               <li typeof="foaf:Person" property="foaf:name">Olivier 
Grisel</li>
+               <li> ... </li>
+           </ul>
+       </span>
+   </span></p>
+<p>(1) Suggest the Entities for Rupert and Olivier
+(2) Suggest to link Rupert and Olivier as values for "cal:attendee"</p>
+<p>Both for Rupert Westenthaler and Olivier Grisel an EntityAnnotation would 
be present - in that case created by the RDFa extractor, but in principle this 
could also work if the RDFa markup is missing. In such cases the 
EntityAnnotations could be created by an NLPEnhancementEngine.</p>
+<p><a1> rdf:type sb:EntityAnnotation
+   <a1> dc:title Rupert Westenthaler
+   <a1> sb:entity-type foaf:Person
+   <a1> sb:hasOccurrence <o1>
+   <a1> sb:hasSuggestion <s1></p>
+<p><a2> rdf:type sb:EntityAnnotation
+   <a2> dc:title Olivier Grisel
+   <a1> sb:entity-type foaf:Person
+   <a2> sb:hasOccurrence <o2>
+   <a2> sb:hasSuggestion <s2></p>
+<p>Lets ignore the occurrences - because how to create Occurrences for RDFa 
markup is a whole different story that needs to be specified - and concentrate 
on the suggestions.</p>
+<p><s1> rdf:type sb:Suggestion
+   <s1> sb:entity <a 
href="http://www.example.com/person/Rupert_Westenthaler";>http://www.example.com/person/Rupert_Westenthaler</a>
+   <s1> sb:entity-type foaf:Person, vCard:vCard, dbpedia-ont:Person
+   <s1> sb:confidence 123,456</p>
+<p><s2> rdf:type sb:Suggestion
+   <s2> sb:entity <a 
href="http://www.example.com/person/Olivier_Grisel";>http://www.example.com/person/Olivier_Grisel</a>
+   <s2> sb:entity-type foaf:Person, vCard:vCard, dbpedia-ont:Person
+   <s2> sb:confidence 234,567</p>
+<p>If the suggestion is accepted by the client the RDFa markup could be 
updated like this</p>
+<p><li about="http://www.example.com/person/Rupert_Westenthaler";
+       typeof="foaf:Person" property="foaf:name">Rupert Westenthaler</li>
+   <li about="http://www.example.com/person/Olivier_Grisel";
+       typeof="foaf:Person" property="foaf:name">Olivier Grisel</li></p>
+<p>Now lets have a detailed look at the suggestions to add Rupert and Olivier 
as a "cal:attendee" to the meeting.
+First we need to create an EntityAnnotation for the Meeting that would be 
created by the RDFa extractor</p>
+<p><a> rdf:type sb:EntityAnnotation
+   <a> dc:title "Stanbol Teleconference"
+   <a> sb:entity-type cal:Vevent
+   <a> sb:hasOccurrence <o>
+   <a> sb:hasSuggestion <s3>
+   <a> sb:hasSuggestion <s4></p>
+<p>Again lets skip the occurrence and look at the two suggestions. What I want 
to do here is to suggest to use the Annotations for Rupert (<a1>) and Olivier 
(<a2>) as values for the property "cal:attendee".</p>
+<p>It is important to suggest here the annotations <a1> and <a2> as values and 
NOT the suggested entities (e.g. <a 
href="http://www.example.com/person/Rupert_Westenthaler";>http://www.example.com/person/Rupert_Westenthaler</a>
 in case of <a1>) because the Stanbol Enhancer can not assume that the user 
will accepts the suggestions <s1> for <a1> and <s2> for <a2>.</p>
+<p>The following suggestions also use the sb:field property to tell the user 
that the suggestions is about values for the "cal:attendee" property.</p>
+<p><s3> rdf:type sb:Suggestion
+   <s3> sb:field cal:attendee
+   <s3> sb:entity <a1>
+   <s3> sb:entity-type sb:EntityAnnotation
+   <s3> sb:confidence 12,34</p>
+<p><s4> rdf:type sb:Suggestion
+   <s4> sb:field cal:attendee
+   <s4> sb:entity <a2>
+   <s4> sb:entity-type sb:EntityAnnotation
+   <s4> sb:confidence 12,34</p>
+<p>NOTE:</p>
+<ul>
+<li>I am not sure if it is a good Idea to use "sb:entity" to link to an 
annotation created by the Stanbol Enhancer because it might confuse users if 
the same property is used to link external and internal resources. However 
introducing an additional property such as "sb:value" seam also not better.</li>
+</ul>
+<p>Here the RDFa markup if the user accepts <s3> and <s4> but not <s1> and 
<s2></p>
+<p><span typeof="cal:Vevent">
+       [...]
+       <p> Participants: </p>
+       <ul property="cal:attendee">
+           <li typeof="foaf:Person" property="foaf:name">Rupert 
Westenthaler</li>
+           <li typeof="foaf:Person" property="foaf:name">Olivier Grisel</li>
+           <li> ... </li>
+       </ul>
+   </span></p>
+<p>and finally the RDFa markup if the all suggestions are accepted by the 
client side</p>
+<p><span typeof="cal:Vevent">
+       [...]
+       <p> Participants: </p>
+       <ul property="cal:attendee">
+           <li about="http://www.example.com/person/Rupert_Westenthaler";
+               typeof="foaf:Person" property="foaf:name">Rupert 
Westenthaler</li>
+           <li about="http://www.example.com/person/Olivier_Grisel";
+               typeof="foaf:Person" property="foaf:name">Olivier Grisel</li>
+       </ul>
+   </span></p>
 <h3 id="occurrences">Occurrences</h3>
-<p>By default detected Features are considered to be extracted from the whole 
content. While this assumption is appropriate for things like Categorizations 
and keywords for a lot of cases it is possible to specify the exact occurrence 
of features within the content and/or the metadata of the content.</p>
-<p>Typically Occurrences are used together with sb:Annotations and 
sb:Enhancement in cases an EnhancementEngine whats to describe the position of 
the extracted Feature within the analyzed content. So propertied defined by 
this two context should be considered when reading this section.</p>
+<p>By default detected Features are considered to be extracted from the whole 
content. While this assumption is appropriate for things like Categorizations 
and keywords for a lot of cases it is possible to specify the exact occurrence 
of features within the content and/or the metadata of the content. In such 
cases the sb:Annotation will define one or more values for the sb:occurrence 
value.</p>
 <p>Different Occurrence descriptions are needed to describe the position of a 
feature within different types of content or within the parsed metadata.</p>
 <p><strong>TextOccurrence:</strong> </p>
 <p>Describe the occurrence of a feature within an textual content.</p>

svn commit: r796163 - in /websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer: EnhancementStructureOverview.png stanbolenhancementstructure.html

Reply via email to