Author: buildbot
Date: Fri Sep 23 11:08:25 2011
New Revision: 796163
Log:
Staging update by buildbot
Added:
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/EnhancementStructureOverview.png
(with props)
Modified:
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/stanbolenhancementstructure.html
Added:
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/EnhancementStructureOverview.png
==============================================================================
Binary file - no diff available.
Propchange:
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/EnhancementStructureOverview.png
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Modified:
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/stanbolenhancementstructure.html
==============================================================================
---
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/stanbolenhancementstructure.html
(original)
+++
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/stanbolenhancementstructure.html
Fri Sep 23 11:08:25 2011
@@ -51,32 +51,35 @@
<h2 id="overview">Overview</h2>
<p>The Stanbol Enhancement Structure is build around the following main
Concepts. Each of this concepts covers a specific aspect related to the
enhancement process of content.</p>
<p>The following list gives an overview about the concepts used by the Stanbol
Enhancement Strucutre:</p>
+<p><img alt="Overview about the Stanbol Enhancement Structure"
src="/EnhancementStructureOverview.png" title="Overview of the Stanbol
Enhancement Structure" /></p>
<ul>
<li>
<p><strong>ContentItem:</strong> This is the resource representing the parsed
content. The URI of this resource depends on how the content was parsed to the
Stanbol Enhancer. In case an absolute URI is provided by the request, than this
URI is used. In all other cased the Stanbol Enhancer creates an URI based on
the configured prefix or the URL of the service. The documentation of the
RESTful service should provide more information about that.</p>
</li>
<li>
-<p><strong>Content:</strong> Several content model distinguish between Content
(data) and the ContentItem (Interpretation of the Data). The Enhancement
Structure currently only defines ContentItem, because there is no need to
describe the data for the purpose of the enhancement process. Other components
(such as the /store endpoint) might need to formally describe the data. For
such use cases the sic:content property will be used to refer from the
ContentItem to the Content. The URI representing the Content will be the same
to be used to retrieve its data via a RESTful service. </p>
+<p><strong>sb:Content:</strong> Several content model distinguish between
Content (data) and the ContentItem (Interpretation of the Data). The
Enhancement Structure currently only defines ContentItem, because there is no
need to describe the data for the purpose of the enhancement process. Other
components (such as the /store endpoint) might need to formally describe the
data. For such use cases the sic:content property will be used to refer from
the ContentItem to the Content. The URI representing the Content will be the
same to be used to retrieve its data via a RESTful service. </p>
</li>
<li>
-<p><strong>Enhancement:</strong> This provides metadata about extractions
created by EnhancementEngines or present within the content. This includes the
creator (usually a EnhancementEngine), the creation time, as well as relations
to other enhancements. Users of the Stanbol Enhancer will typically not care
about such data because out of the their perspective they represent
Meta-Meta-Data (meta data about the metadata).</p>
+<p><strong>sb:Enhancement:</strong> This provides metadata about extractions
created by EnhancementEngines or present within the content. This includes the
creator (usually a EnhancementEngine), the creation time, as well as relations
to other enhancements. Users of the Stanbol Enhancer will typically not care
about such data because out of the their perspective they represent
Meta-Meta-Data (meta data about the metadata). Every feature, suggestion or
other piece of information extracted by any EnhancementEngine need to attach
the metadata defined for this concept.</p>
</li>
<li>
-<p><strong>Annotation:</strong> An annotation describe a feature present
within the parsed content. Such feature can have three sources. (1) the can
originate form metadata present in the parsed content, (2) the can be extracted
by analyzing the content itself and (3) they can be based on further processing
Annotations of type (1) and (2). The Annotation provides the label, the type
(e.g. Person, Organization, Location ) the role (e.g. Tag, Category, Keyword),
the confidence and (if available) the link to the entity representing the
extracted feature. It is the central concept for users that need to present all
the things extracted from the parsed content.</p>
+<p><strong>sb:Annotation:</strong> An annotation describe some piece of
knowledge extracted from the parsed content and/or the metadata of the content.
Information provided by Annotations include the label, type and the confidence.
In addition Annotations need to link at least to a single Occurrence and may
have one or more Suggestions. Annotations can also be related/dependent to
other Annotations. The EnhancementStructure defines only a small set of
different Annotation types. Implementors of EnhancementEngines that extract
specific kind of things (e.g. coreferences, events, â¦) may need to define
there own Annotation types. Such Extensions should be called "**Annotation" and
be defined as rdfs:subclass of any Annotation type defined by this Enhancement
structure.</p>
</li>
<li>
-<p><strong>Occurrence:</strong> An Occurrence describes the actual location of
the feature within the content or the metadata. Based on the type of the
content there will be different types of Occurrences. A "text occurrence" will
contain information such as the selected-text, the start/end position of the
selection and the surrounding text to provide some context. An "image
accurrence" will provide the top left and the bottom right position of the
selected rectangle. A "metadata occurrence" will describe the property used for
the annotation (e.g. dc:creator) the used standard (e.g. DCterms) and the
value.</p>
+<p><strong>sb:Suggestion</strong> An suggestion describes an Resource (Entity,
Topic, Category â¦) that an EnhancementEngine suggests as a possible match for
an Annotation. Suggestions are typically created by Engines that further
process - semantic lifting - of Annotations. However EnhancementEngines might
also create both - the Annotation and the Suggestions. Suggestions are always
linked to a single Annotations (functional property). They define the label,
the ID (typically the URI of the Resource), the type(s) of the suggested
Resource and the confidence of the suggestion.</p>
+</li>
+<li>
+<p><strong>sb:Occurrence:</strong> An Occurrence describes the actual location
of an extracted feature within the content. This location may be within the
content or within parsed metadata. Occurrences are always linked to a single
Annotation (functional property). Based on the type of the content there will
be different types of Occurrences. This EnhancementStructure currently focus on
two types of Occurrences: (1) TextOccurrence and (2) MetadataOccurrence. For
details on the model of such Occurrence types see the according sections.
EnhancementEngines that support the extraction of Features from content types
that are not covered by this Specification (e.g. Pictures, Sound, Video) need
to define there own Occurrence types. Such types should use the name
"***Occurrence" and be defined as rdfs:subClassOf any of the Occurrence types
defined in this specification.</p>
</li>
</ul>
-<p>When using the Enhancement Structure one need usually need to combine
several of the above concepts to create meaningful statement.
-As an example take a natural language processing engine that needs to express
the the word "Paris" found within an sentence like "I will travel to Prais next
week" portably refers to a location.
-To express that it will need to combine the concepts </p>
-<ul>
-<li>Enhancement: to express that this feature was extracted by the Natural
Language Processing Engine at a given time ...</li>
-<li>Annotation: to express that "Paris" represents a "Location" and has the
role "Tag"</li>
-<li>Occurrence: to express where the selected text "Paris" is located within
the analyzed content</li>
+<p>Enhancements encoded based on this specification need to confirm to the
following rules:</p>
+<ul>
+<li>sb:Annotation and sb:Suggestion MUST also be of type sb:Enhancement and
include the required metadata defined by sb:Enhancement.</li>
+<li>sb:Occurrences, sb:Annotations and Suggestions MUST include rdf:type
information for all parent types. e.g. when adding a sb:TextOccurrences the
rdf:type MUST include sb:TextOccurrence AND sb:Occurrences. Consumers are
expected to NOT using any kind of reasoner therefore adding such additional
information is the only way to ensure that queries for occurrences, annotations
or suggestions provide the expected results.</li>
</ul>
-<p>The same is true for consuming Enhancements. A client interested in
presenting Tags, Categories and Keywords needs only information provided by the
Annotation concept. To be able to highlight the actual location of detected
features within the content on needs to also process information provided by
the Occurrence concept.</p>
+<hr />
+<p>The parts below are currently under work</p>
+<hr />
<h2 id="specification">Specification</h2>
<h3 id="namespaces_and_used_notations">Namespaces and used Notations</h3>
<p>While the Stanbol Enhancement Structure does define some Concepts and
Properties it also uses a lot of existing things from other ontologies. To
improve the readability of this specification namespace prefixes + local names
are used instead of the full URLs by this specification.</p>
@@ -187,6 +190,7 @@ The following code segment shows the kno
<li><strong>sb:entity</strong>: In case an annotation describes an Entity,
this property provides the URI for the entity</li>
<li><strong>sb:entity-type</strong>: In case an annotation describes an
Entity, this property provides the rdf:types of the linked entity</li>
<li><strong>sb:suggestion</strong>: Links to an other annotation that provides
a suggestion for this one. This indicates that the Stanbol Enhancer requests
the client to decide between the provided options - e.g. by some user
interaction.</li>
+<li><strong>sb:occurrence</strong>: Optionally links to one or more
sb:Occurrence of this annotation within the parsed Content. Note that there are
several types of Occurrences (TextOccurrence, ImageOccurrence,
MetadataOccurrence â¦) defined. If this property is missing, that the
Annotation is assumed to be about the whole content (as referred to by the
sb:extracted-from property).</li>
</ul>
<p><strong>Annotations Type</strong> describe the type of the annotated
feature based on a terminology standardized by Stanbol. Current types
include</p>
<ul>
@@ -202,29 +206,116 @@ The following code segment shows the kno
<li>sb:Tag: The feature can be suggested as tag for the parsed content.</li>
<li>sb:Category: The feature provides a categorization for the parsed
content.</li>
<li>sb:Keyword: The feature describes a keyword within the parsed content
TODO: describe the difference between keywords and tags</li>
-<li>sb:Suggestion: The feature is a suggestion for an other Annotations. </li>
</ul>
<p><em>NOTE</em>: Such roles should make it more easy to support additional
Annotations roles as suggested by <a
href="https://issues.apache.org/jira/browse/STANBOL-48">STANBOL-48</a> and <a
href="https://issues.apache.org/jira/browse/STANBOL-12">STANBOL-12</a> that
includes <a
href="https://issues.apache.org/jira/browse/STANBOL-28">STANBOL-28</a> and <a
href="https://issues.apache.org/jira/browse/STANBOL-29">STANBOL-29</a>.</p>
-<p>For <strong>Suggestions</strong> there are some additional constraints as
defined by the following code block</p>
-<div class="codehilite"><pre><span class="sr"><a></span> <span
class="n">rdf:type</span> <span class="n">sb:Annotation</span>
-<span class="sr"><a></span> <span class="n">dc:role</span> <span
class="o">!</span><span class="n">sb:Suggestion</span>
-<span class="sr"><a></span> <span class="n">sb:suggestion</span> <span
class="sr"><a1></span>
- <span class="sr"><a1></span> <span class="n">rdf:type</span> <span
class="n">sb:Annotation</span>
- <span class="sr"><a1></span> <span class="n">dc:role</span> <span
class="n">sb:Suggestion</span>
- <span class="sr"><a1></span> <span class="n">sb:confidence</span>
<span class="n">ordering</span><span class="o">^^</span><span
class="n">xsd:float</span>
-</pre></div>
-
-
-<p>This means:</p>
-<ul>
-<li>an Annotation may only define suggestion if it does not have the dc:role
sb:Suggestion. This prohibits nested suggestions</li>
-<li>an Annotation lined by sb:suggestion con considered to be of the dc:role
sb:Suggestion - even that it does not define this role explicitly.</li>
-<li>Annotations used as suggestions MUST define some way to allow clients to
show them in the right order (</li>
-<li>the confidence value of annotations used as suggestions should be used to
order suggestions when presented to the user. However Applications need to
consider that such values are on an ordinal scale meaning that a value of "4"
does NOT mean that it is twice as likely than a suggestion with an confidence
of "2"!</li>
-</ul>
+<h3 id="sbsuggestion">sb:Suggestion</h3>
+<p>Suggestions are used by the Stanbol Enhancer to suggest possible values for
the resolution features extracted from the parsed content.
+Currently there are two different use cases for Suggestions defined</p>
+<ul>
+<li>(1) Entity Resolution:* Suggests entities for an Feature extracted from
the content. Typically such suggestions are calculated based on the name of the
feature found within the content (e.g. the selected text of a
sb:TextOccurrence).</li>
+<li>(2) Field Value Suggestion:* Suggest a value for a specific property. This
kind of suggestion are useful if an relation between two extracted features is
detected. A typical example would be a person "Steve Jobs" with the role "CEO"
of the company "Apple Inc". Such relations can be detected by NLP tools.
However suggestions like this are also central for semantic lifting of RDFa
annotations as shown in the example below.</li>
+</ul>
+<p>sb:Suggestion uses the following properties</p>
+<ul>
+<li><strong>sb:entity</strong>: The id of the suggested Entity</li>
+<li><strong>sb:entity-type</strong>: The type(s) of the suggested Entity</li>
+<li><strong>sb:confidence</strong>: Needed to sort in case of multiple
suggestions</li>
+<li><strong>sb:field</strong>: Defines the property this suggestion should
become the value if accepted by the user</li>
+</ul>
+<p>In addition all sb:Suggestions are also of type sb:Enhancement to allow
EnhancementEngine to provide enhancement metadata for them.</p>
+<p>for details how they are used please see the following Example</p>
+<p>==== Example ====</p>
+<p>As example lets assume that the following RDFa annotated content is parsed
to the Stanbol Enhancer</p>
+<p><span typeof="cal:Vevent">
+ <h3 property="dc:title"> Stanbol Teleconference </h3>
+ <span property="cal:summary>
+ <p> Agenda: </p>
+ <ul>
+ <li> ... </li>
+ <ul>
+ <p> Participants: </p>
+ <ul>
+ <li typeof="foaf:Person" property="foaf:name">Rupert
Westenthaler</li>
+ <li typeof="foaf:Person" property="foaf:name">Olivier
Grisel</li>
+ <li> ... </li>
+ </ul>
+ </span>
+ </span></p>
+<p>(1) Suggest the Entities for Rupert and Olivier
+(2) Suggest to link Rupert and Olivier as values for "cal:attendee"</p>
+<p>Both for Rupert Westenthaler and Olivier Grisel an EntityAnnotation would
be present - in that case created by the RDFa extractor, but in principle this
could also work if the RDFa markup is missing. In such cases the
EntityAnnotations could be created by an NLPEnhancementEngine.</p>
+<p><a1> rdf:type sb:EntityAnnotation
+ <a1> dc:title Rupert Westenthaler
+ <a1> sb:entity-type foaf:Person
+ <a1> sb:hasOccurrence <o1>
+ <a1> sb:hasSuggestion <s1></p>
+<p><a2> rdf:type sb:EntityAnnotation
+ <a2> dc:title Olivier Grisel
+ <a1> sb:entity-type foaf:Person
+ <a2> sb:hasOccurrence <o2>
+ <a2> sb:hasSuggestion <s2></p>
+<p>Lets ignore the occurrences - because how to create Occurrences for RDFa
markup is a whole different story that needs to be specified - and concentrate
on the suggestions.</p>
+<p><s1> rdf:type sb:Suggestion
+ <s1> sb:entity <a
href="http://www.example.com/person/Rupert_Westenthaler">http://www.example.com/person/Rupert_Westenthaler</a>
+ <s1> sb:entity-type foaf:Person, vCard:vCard, dbpedia-ont:Person
+ <s1> sb:confidence 123,456</p>
+<p><s2> rdf:type sb:Suggestion
+ <s2> sb:entity <a
href="http://www.example.com/person/Olivier_Grisel">http://www.example.com/person/Olivier_Grisel</a>
+ <s2> sb:entity-type foaf:Person, vCard:vCard, dbpedia-ont:Person
+ <s2> sb:confidence 234,567</p>
+<p>If the suggestion is accepted by the client the RDFa markup could be
updated like this</p>
+<p><li about="http://www.example.com/person/Rupert_Westenthaler"
+ typeof="foaf:Person" property="foaf:name">Rupert Westenthaler</li>
+ <li about="http://www.example.com/person/Olivier_Grisel"
+ typeof="foaf:Person" property="foaf:name">Olivier Grisel</li></p>
+<p>Now lets have a detailed look at the suggestions to add Rupert and Olivier
as a "cal:attendee" to the meeting.
+First we need to create an EntityAnnotation for the Meeting that would be
created by the RDFa extractor</p>
+<p><a> rdf:type sb:EntityAnnotation
+ <a> dc:title "Stanbol Teleconference"
+ <a> sb:entity-type cal:Vevent
+ <a> sb:hasOccurrence <o>
+ <a> sb:hasSuggestion <s3>
+ <a> sb:hasSuggestion <s4></p>
+<p>Again lets skip the occurrence and look at the two suggestions. What I want
to do here is to suggest to use the Annotations for Rupert (<a1>) and Olivier
(<a2>) as values for the property "cal:attendee".</p>
+<p>It is important to suggest here the annotations <a1> and <a2> as values and
NOT the suggested entities (e.g. <a
href="http://www.example.com/person/Rupert_Westenthaler">http://www.example.com/person/Rupert_Westenthaler</a>
in case of <a1>) because the Stanbol Enhancer can not assume that the user
will accepts the suggestions <s1> for <a1> and <s2> for <a2>.</p>
+<p>The following suggestions also use the sb:field property to tell the user
that the suggestions is about values for the "cal:attendee" property.</p>
+<p><s3> rdf:type sb:Suggestion
+ <s3> sb:field cal:attendee
+ <s3> sb:entity <a1>
+ <s3> sb:entity-type sb:EntityAnnotation
+ <s3> sb:confidence 12,34</p>
+<p><s4> rdf:type sb:Suggestion
+ <s4> sb:field cal:attendee
+ <s4> sb:entity <a2>
+ <s4> sb:entity-type sb:EntityAnnotation
+ <s4> sb:confidence 12,34</p>
+<p>NOTE:</p>
+<ul>
+<li>I am not sure if it is a good Idea to use "sb:entity" to link to an
annotation created by the Stanbol Enhancer because it might confuse users if
the same property is used to link external and internal resources. However
introducing an additional property such as "sb:value" seam also not better.</li>
+</ul>
+<p>Here the RDFa markup if the user accepts <s3> and <s4> but not <s1> and
<s2></p>
+<p><span typeof="cal:Vevent">
+ [...]
+ <p> Participants: </p>
+ <ul property="cal:attendee">
+ <li typeof="foaf:Person" property="foaf:name">Rupert
Westenthaler</li>
+ <li typeof="foaf:Person" property="foaf:name">Olivier Grisel</li>
+ <li> ... </li>
+ </ul>
+ </span></p>
+<p>and finally the RDFa markup if the all suggestions are accepted by the
client side</p>
+<p><span typeof="cal:Vevent">
+ [...]
+ <p> Participants: </p>
+ <ul property="cal:attendee">
+ <li about="http://www.example.com/person/Rupert_Westenthaler"
+ typeof="foaf:Person" property="foaf:name">Rupert
Westenthaler</li>
+ <li about="http://www.example.com/person/Olivier_Grisel"
+ typeof="foaf:Person" property="foaf:name">Olivier Grisel</li>
+ </ul>
+ </span></p>
<h3 id="occurrences">Occurrences</h3>
-<p>By default detected Features are considered to be extracted from the whole
content. While this assumption is appropriate for things like Categorizations
and keywords for a lot of cases it is possible to specify the exact occurrence
of features within the content and/or the metadata of the content.</p>
-<p>Typically Occurrences are used together with sb:Annotations and
sb:Enhancement in cases an EnhancementEngine whats to describe the position of
the extracted Feature within the analyzed content. So propertied defined by
this two context should be considered when reading this section.</p>
+<p>By default detected Features are considered to be extracted from the whole
content. While this assumption is appropriate for things like Categorizations
and keywords for a lot of cases it is possible to specify the exact occurrence
of features within the content and/or the metadata of the content. In such
cases the sb:Annotation will define one or more values for the sb:occurrence
value.</p>
<p>Different Occurrence descriptions are needed to describe the position of a
feature within different types of content or within the parsed metadata.</p>
<p><strong>TextOccurrence:</strong> </p>
<p>Describe the occurrence of a feature within an textual content.</p>