Author: buildbot
Date: Tue Sep 27 07:56:23 2011
New Revision: 796286

Log:
Staging update by buildbot

Added:
    
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/enhancementexample.png
   (with props)
    
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/enhancementstructureoverview.png
   (with props)
Modified:
    
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/stanbolenhancementstructure.html

Added: 
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/enhancementexample.png
==============================================================================
Binary file - no diff available.

Propchange: 
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/enhancementexample.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: 
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/enhancementstructureoverview.png
==============================================================================
Binary file - no diff available.

Propchange: 
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/enhancementstructureoverview.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Modified: 
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/stanbolenhancementstructure.html
==============================================================================
--- 
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/stanbolenhancementstructure.html
 (original)
+++ 
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/stanbolenhancementstructure.html
 Tue Sep 27 07:56:23 2011
@@ -51,7 +51,7 @@
 <h2 id="overview">Overview</h2>
 <p>The Stanbol Enhancement Structure is build around the following main 
Concepts. Each of this concepts covers a specific aspect related to the 
enhancement process of content.</p>
 <p>The following list gives an overview about the concepts used by the Stanbol 
Enhancement Strucutre:</p>
-<p><img alt="Overview about the Stanbol Enhancement Structure" 
src="EnhancementStructureOverview.png" title="Overview of the Stanbol 
Enhancement Structure" /></p>
+<p><img alt="Overview about the Stanbol Enhancement Structure" 
src="enhancementstructureoverview.png" title="Overview of the Stanbol 
Enhancement Structure" /></p>
 <ul>
 <li>
 <p><strong>ContentItem:</strong> This is the resource representing the parsed 
content. The URI of this resource depends on how the content was parsed to the 
Stanbol Enhancer. In case an absolute URI is provided by the request, than this 
URI is used. In all other cased the Stanbol Enhancer creates an URI based on 
the configured prefix or the URL of the service. The documentation of the 
RESTful service should provide more information about that.</p>
@@ -77,9 +77,6 @@
 <li>sb:Annotation and sb:Suggestion MUST also be of type sb:Enhancement and 
include the required metadata defined by sb:Enhancement.</li>
 <li>sb:Occurrences, sb:Annotations and Suggestions MUST include rdf:type 
information for all parent types. e.g. when adding a sb:TextOccurrences the 
rdf:type MUST include sb:TextOccurrence AND sb:Occurrences. Consumers are 
expected to NOT using any kind of reasoner therefore adding such additional 
information is the only way to ensure that queries for occurrences, annotations 
or suggestions provide the expected results.</li>
 </ul>
-<hr />
-<p>The parts below are currently under work</p>
-<hr />
 <h2 id="specification">Specification</h2>
 <h3 id="namespaces_and_used_notations">Namespaces and used Notations</h3>
 <p>While the Stanbol Enhancement Structure does define some Concepts and 
Properties it also uses a lot of existing things from other ontologies. To 
improve the readability of this specification namespace prefixes + local names 
are used instead of the full URLs by this specification.</p>
@@ -107,19 +104,23 @@
 <li>{value}^^xsd:anyURI indicates that enhancement results will not provide 
additional knowledge about this resource. If the consumer needs more 
information about such resources he need to use other services to retrieve such 
knowledge or parse special parameters to tell Stanbol to explicitly include 
such knowledge in the response.</li>
 </ul>
 <h3 id="contentitem_ltci">ContentItem &lt;ci&gt;</h3>
-<p>The ContentItem &lt;ci&gt; represents a content enhanced by the Stanbol 
Enhancer. It is the central resource used to link all the enhancements created 
by the EnhancementEngines.
-The Stanbol Enhancement Structure does not force client to distinguish between 
content (data) and contentItem (interpretation of the data). Within the Stanbol 
Enhancer only the contentItem is needed, because the Content is accessed via 
the Java API. Client are free to use markup to explicitly identify these parts 
of documents that need to be interpreted as content (e.g. an element in the DOM 
tree). An example is provided below. </p>
+<p>The ContentItem &lt;ci&gt; represents a content parsed to the Stanbol 
Enhancer. It is the central resource used to link all the enhancements created 
by the EnhancementEngines.</p>
 <div class="codehilite"><pre><span class="sr">&lt;ci&gt;</span> <span 
class="n">rdf:type</span> <span class="n">sb:ContentItem</span>
-<span class="p">[</span><span class="sr">&lt;ci&gt;</span> <span 
class="n">rdf:type</span> <span class="n">sioc:Item</span><span 
class="p">]</span>
+<span class="p">[</span><span class="sr">&lt;ci&gt;</span> <span 
class="n">sb:embeds</span><span class="o">-</span><span 
class="n">knowledge</span> <span class="p">{</span><span 
class="n">knowlegeGraphId</span><span class="p">}]</span>
+<span class="p">[</span><span class="sr">&lt;ci&gt;</span> <span 
class="n">sb:has</span><span class="o">-</span><span class="n">section</span> 
<span class="n">sb:ContentItem</span><span class="p">]</span>
 <span class="p">[</span><span class="sr">&lt;ci&gt;</span> <span 
class="sr">&lt;{metadatafield}&gt;</span> <span class="p">{</span><span 
class="n">value</span><span class="p">(</span><span class="n">s</span><span 
class="p">)}]</span>
-<span class="p">[?</span><span class="n">parent</span> <span 
class="n">sioc:content</span> <span class="sr">&lt;ci&gt;</span><span 
class="p">]</span>
 </pre></div>
 
 
-<p>The ContentItem itself does not define any properties however it is used as 
domain (target type) of some properties within the Stanbol Enhancement 
structure. Information extracted from metadata parsed with the content (e.g. 
Dublin Core, EXIF, ID3 ...) can be added directly to the ContentItem 
&lt;ci&gt;.</p>
-<p><em>TODO:</em> Describe here how to deal with embedded knowledge (e.g. 
RDFa, MicroFormats …). Last time the discussion was to write such knowledge 
into an own Graph and do not add it to the returned Enhancement Structure. 
However this was only an suggestion and need to be reviewed.</p>
-<p>The usage of SIOC (Semantically-Interlinked Online Communities) is 
optionally and usually added by the client to embed information (e.g. as RDFa) 
to the content itself. However when parsing HTML with such markup to the 
Stanbol Enhancer such markup MUST BE used as default to determine those parts 
of the content that need to be enhanced.</p>
+<p>The ContentItem itself does only define two fields:</p>
+<ul>
+<li><strong>sb:embeds-knowledge</strong>: Documents might contain explicit 
knowledge (e.g. MicroData, RDFa). If such information can be extracted, than it 
will be stored in an own RDF graph. This property links to the ID of this RDF 
graph. Such knowledge is typically extracted during the pre-processing phase of 
the EnhancementProcess. Therefore EnhancementEngine do have access to this 
information.</li>
+<li><strong>sb:has-section</strong>: A ContentItem my define different 
sections. The Stanbol EnhancementEngine will create an own ContentItem with an 
own ID for such sections. The Stanbol Enhancer will first enhance the main 
content item and than all the sections. This feature is mainly intended to 
split up huge documents to feasible parts to enhance.</li>
+</ul>
+<p>In addition metadata extracted or parsed with the parsed content (e.g. 
Dublin Core, EXIF, ID3 ...) can also be directly added to the ContentItem 
&lt;ci&gt;. EnhancementEngines may used such information during the 
EnancementProcess.</p>
+<p><strong>Example: Embedded Knowledge</strong></p>
 <p><em>TODO</em>: Move this to an own section about RDFa support!</p>
+<p>This example shows how SIOC (Semantically-Interlinked Online Communities) 
and RDFa can be used to embed knowledge to tell Stanbol how to process parsed 
HTML markup.</p>
 <div class="codehilite"><pre><span class="nt">&lt;body</span> <span 
class="na">about=</span><span 
class="s">&quot;http://www.examplenews.com/featuredNews&quot;</span><span 
class="nt">&gt;&lt;table&gt;&lt;tr&gt;</span>
     <span class="nt">&lt;td&gt;</span><span class="c">&lt;!-- The menue: Not 
to be enhanced --&gt;</span> <span class="nt">&lt;/td&gt;</span>
     <span class="nt">&lt;td&gt;&lt;span</span> <span 
class="na">property=</span><span class="s">&quot;sic:content&quot;</span> <span 
class="na">about=</span><span 
class="s">&quot;http://www.examplenews.com/story123&quot;</span><span 
class="nt">&gt;</span> 
@@ -132,37 +133,55 @@ The Stanbol Enhancement Structure does n
 </pre></div>
 
 
-<p>By parsing this as Content the Stanbol Enhancer should gather the following 
knowledge</p>
-<div class="codehilite"><pre><span 
class="sr">&lt;http://www.examplenews.com/featuredNews&gt;</span> <span 
class="n">sic:content</span> <span 
class="sr">&lt;http://www.examplenews.com/story123&gt;</span>
-<span class="sr">&lt;http://www.examplenews.com/featuredNews&gt;</span> <span 
class="n">sic:content</span> <span 
class="sr">&lt;http://www.examplenews.com/interview456&gt;</span>
-</pre></div>
-
-
-<p>and enhance only the markup within the two span tags marked with 
sic:content.</p>
-<p><em>NOTE</em>: this would require support for multiple ContentItems 
(story123 and interview456 in that example)</p>
+<p>By parsing this as Content the Stanbol Enhancer should create:</p>
+<ul>
+<li>A sb:ContentItem for "http://www.examplenews.com/featuredNews"; with two 
section but an empty content.<ul>
+<li>The knowledge as defined by the above RDFa markup is included in an own 
RDF graph and linked with the "sb:embeds-knowledge" property</li>
+</ul>
+</li>
+<li>A sb:ContentItem representing the section 
"http://www.examplenews.com/story123"; <ul>
+<li>the HTML fragment enclosed by the according span-tag is the content</li>
+</ul>
+</li>
+<li>A sb:ContentItem representing the section 
"http://www.examplenews.com/interview456";<ul>
+<li>the HTML fragment enclosed by the according span-tag is the content</li>
+</ul>
+</li>
+</ul>
+<p>NOTE: This assumes the presence of </p>
+<ul>
+<li>a Components for extracting RDFa </li>
+<li>a Component that supports the creation of sb:ContentItems and fragments 
based on SIOC</li>
+</ul>
 <h3 id="enhancement">Enhancement</h3>
 <p>The concept "Enhancement" defines properties that allow Stanbol 
EnhancementEngines to formally describe information about the enhancement 
process. This information are crucial for EnhancemetnEngines to cooperate with 
each other but typical Stanbol users will not need to border with such 
information even that in some situation such knowledge might even be useful on 
the client side e.g. if someone wants to ignore all enhancements created by an 
specific enhancement engine, or to calculate all enhancements affected by the 
removal of an part of the content.</p>
 <p>The following code segments shows the knowledge typically described by 
using the Enhancement concept</p>
 <div class="codehilite"><pre><span class="sr">&lt;e&gt;</span> <span 
class="n">rdf:type</span> <span class="n">sb:Enhancement</span>
-<span class="p">[</span><span class="sr">&lt;e&gt;</span> <span 
class="n">rdf:type</span> <span class="n">sb:Annotation</span><span 
class="p">,</span> <span class="n">sb:Occurrence</span><span class="p">]</span>
 <span class="sr">&lt;e&gt;</span> <span class="n">dc:creator</span> <span 
class="n">enhancementEngine</span><span class="o">^^</span><span 
class="n">xsd:anyURI</span>
 <span class="sr">&lt;e&gt;</span> <span class="n">dc:contributor</span> <span 
class="n">enhancementEngine</span><span class="o">^^</span><span 
class="n">xsd:anyURI</span>
 <span class="sr">&lt;e&gt;</span> <span class="n">dc:created</span> <span 
class="n">date</span><span class="o">^^</span><span 
class="n">xsd:dateTime</span>
 <span class="sr">&lt;e&gt;</span> <span class="n">dc:modified</span> <span 
class="n">date</span><span class="o">^^</span><span 
class="n">xsd:dateTime</span>
-<span class="sr">&lt;e&gt;</span> <span class="n">dc:relation</span> <span 
class="sr">&lt;relatedEnhancement&gt;</span>
-<span class="sr">&lt;e&gt;</span> <span class="n">dc:requires</span> <span 
class="sr">&lt;dependsOnEnhancement&gt;</span>
+<span class="p">[</span><span class="sr">&lt;e&gt;</span> <span 
class="n">sb:relatedTo</span> <span 
class="sr">&lt;relatedEnhancement&gt;</span><span class="p">]</span>
+<span class="p">[</span><span class="sr">&lt;e&gt;</span> <span 
class="n">sb:dependsOn</span> <span 
class="sr">&lt;dependsOnEnhancement&gt;</span><span class="p">]</span>
 </pre></div>
 
 
 <p>The presence of the statement "&lt;e&gt; rdf:type sd:Enhancement" statement 
indicated that enhancement metadata are present for the resource &lt;e&gt;. 
This also means that if there is some configuration set to exclude such 
information, than all the above properties MUST be removed from the results of 
the enhancement process.
-The optional  rdf:types sb:Annotation and sb:Occurrent do only indicate, that 
typically any enhancement resource &lt;e&gt; is also of type sb:Annotation 
and/or sb:Occurrent. See the according sections and the usage examples for more 
information.</p>
-<p>All of the metadata used to describe the enhancement process do use the 
DCterms vocabulary. </p>
-<ul>
-<li>dc:creator and dc:contributor link to the EnhancementEngine(s) involved in 
creating the Enhancement. </li>
-<li>dc:created and dc:modified are intended to help sort enhancement based on 
enhancement activities performed during the enhancement process (something that 
might be useful especially in case EnhancementEngines do work asynchronously). 
</li>
-<li>dc:relation and dc:requires are used to describe relations between 
enhancements. dc:relation is used to state that an enhancement is related to an 
other one, but would be still valid if the other gets invalid or is removed. 
dc:requires is used to state that an enhancement depends on an other one and 
cascading delete/invalidation should be applied. As example an enhancement 
suggesting the entity "http://dbpedia.org/resources/Paris"; might depend on the 
Word "Paris" found in the Text and be related to an other enhancement stating 
that the document is about "http://dbpedai.org/resources/France";.</li>
+The metadata defined by sb:Enhancement MUST BE added for all sb:Annotation and 
sb:Suggestion instances created by an EnhancementEngine. This also includes any 
rdf:subClassOf of those two Concepts. </p>
+<p>The following figure shows an example of an sb:Annotation and a 
sb:Suggestion for Paris with the according metadata as defined by the 
sb:Enhancement concept.</p>
+<p><img alt="Example: sb:Annotation and sb:Suggestion including sb:Enhancement 
metadata" src="enhancementexample.png" title="Example: sb:Annotation and 
sb:Suggestion including sb:Enhancement metadata" /></p>
+<p>Note that sb:Annotation and sb:Suggestion are not sub-classes of 
sb:Annotation. EnhancementEngines need to add sb:Enhancement as an additional 
rdf:type to sb:Annotation and sb:Suggestion.</p>
+<p>Description of the properties defined/used by sb:Enhancement:</p>
+<ul>
+<li><strong>dc:creator</strong> and <strong>dc:contributor</strong> link to 
the EnhancementEngine(s) involved in creating the Enhancement.</li>
+<li><strong>dc:created and </strong>dc:modified** are intended to help sort 
enhancement based on enhancement activities performed during the enhancement 
process (something that might be useful especially in case EnhancementEngines 
do work asynchronously). </li>
+<li><strong>sb:relatedTo</strong> defines that an sb:Enhancement is related to 
an other. However also specifies that both enhancements are still valid if the 
other one is deleted.</li>
+<li><strong>sb:dependsOn</strong> defines that an sb:Enhancement depends on 
the other. If the other Enhancement is deleted (or rejected by a user) than all 
dependent sb:Enhancements MUST BE also removed/rejected. The above figure shows 
that sb:hasSuggestion as defined by sb:Annotation is an inverse relation to 
sb:dependsOn because suggestions depend on the annotation they are suggested 
for.</li>
 </ul>
-<p><em>NOTE</em>: With this version of the enhancement structure it is no 
longer expected from users to process dc:relation and dc:requires relations as 
it was the case with the FISE enhancement structure to query for 
EntityAnnotations for TextAnnotations.</p>
+<p>In addition EnhancementEngines might want/need to add additional metadata 
to the sb:Annotation and sb:Suggestion instances they create. Implementors of 
such EnhancementEngines are free to define there own Enhancemnt types. Such 
types MUST BE defined as rdfs:subClassOf sb:Enhancement and SHOULD use 
**Enhancement in there Concept name. EnhancementEngine MUST also add both the 
specific type AND sb:Enhancement as rdf:type values.</p>
+<hr />
+<p>Sections below are not yet updated</p>
+<hr />
 <h3 id="annotations">Annotations</h3>
 <p>The concept "Annotation" provides metadata about the extracted feature. 
This information are important both for the enhancement process and the users 
of the Stanbol Enhancer.
 The following code segment shows the knowledge typically provided by an 
Annotation &lt;a&gt;. A description of the properties is provided below:</p>


Reply via email to