Author: buildbot
Date: Tue Jan 17 08:47:45 2012
New Revision: 802791
Log:
Staging update by buildbot for stanbol
Modified:
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/STANBOL-414-specification.html
Modified:
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/STANBOL-414-specification.html
==============================================================================
---
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/STANBOL-414-specification.html
(original)
+++
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/STANBOL-414-specification.html
Tue Jan 17 08:47:45 2012
@@ -94,11 +94,20 @@ enhancement engines.</p>
<p>The RDFS schema used for the execution plan is defined as follows.</p>
<ul>
<li>Namespace: ep :
http://stanbol.apache.org/ontology/enhancer/executionplan#</li>
-<li><strong>ep:ExecutionNode</strong> : Class used for all Nodes representing
the execution of an Enhancement Engine.</li>
+<li><strong>ep:ExecutionPlan</strong> : Represent an execution plan defined by
all linked execution nodes.<ul>
+<li><strong>ep:hasExecutionNode</strong> (domain: ep:ExecutionPlan; range:
ep:ExecutionNode; inverseOf: ep:inExecutionPlan): links the execution plan with
all the execution nodes.</li>
+<li><strong>ep:chain</strong> (domain: ep:ExecutionPlan; range: xsd:string):
The name of the Chain this execution plan is used for.</li>
+</ul>
+</li>
+<li><strong>ep:ExecutionNode</strong> : Class used for all Nodes representing
the execution of an Enhancement Engine.<ul>
+<li><strong>ep:inExecutionPlan</strong> (domain: ep:ExecutionNode; range:
ep:ExecutionPlan ;inverseOf: ep:hasExecutionNode): functional property that
links the execution node with an execution plan</li>
<li><strong>ep:engine</strong> (domain: ep:ExecutionNode; range: xsd:string):
The property used to link to the Enhancement Engine by the name of the
engine.</li>
<li><strong>ep:dependsOn</strong> (domain: ep:ExecutionNode; range:
ep:ExecutionNode) Defines that the execution of this node depends on the
completion of the referenced one.</li>
<li><strong>ep:optional</strong> (domain: ep:ExecutionNode; range:
xsd:boolean) Can be used to specify that the execution of this
EnhancementEngine is optional. If this property is set to TRUE an engine will
be marked as executed even if it execution was not possible (e.g. because an
engine with this name was not active) or the execution failed (e.g. because of
the Exception). </li>
</ul>
+</li>
+</ul>
+<p>Note the the data for the ep:ExecutionPlan and the
ep:hasExecutionNode/ep:inExecutionPlan typically need not to be parsed as
configuration of a Chain. This information are typically automatically added
based on the assumption that all ep:ExecutionNode parsed in the configuration
for a chain are member of the execution plan for such chain. Therefore this
information is typically added by the Chain itself when the configuration is
parsed and validated.</p>
<h4 id="example">Example:</h4>
<p>This example shows an ExecutionPlan with three nodes for the "langId",
"ner", "dbpediaLinking" "geonamesLinking" and "zemanta" engine. Note that this
names refer to actual EnhancementEngine Services registered with the current
OSGI Environment.</p>
<p>This example assumes that</p>
@@ -110,27 +119,37 @@ enhancement engines.</p>
<li>"zemanta" is the singleton instance of the ZemantaEnhancementEngine</li>
</ul>
<p>The RDF graph of such a chain would look:</p>
-<div class="codehilite"><pre><span class="err">urn:node1</span>
+<div class="codehilite"><pre><span class="err">urn:execPlan</span>
+ <span class="err">rdf:type</span> <span class="err">ep:ExecutionPlan</span>
+ <span class="err">ep:hasExecutionNode</span> <span
class="err">urn:node1,</span> <span class="err">urn:node2,</span> <span
class="err">urn:node3,</span> <span class="err">urn:node4,</span> <span
class="err">urn:node5</span>
+ <span class="err">ep:chain</span> <span
class="err">"demoChain"</span>
+
+<span class="err">urn:node1</span>
<span class="err">rdf:type</span> <span
class="err">stanbol:ExecutionNode</span>
+ <span class="err">ep:inExecutionPlan</span> <span
class="err">urn:execPlan</span>
<span class="err">stanbol:engine</span> <span class="err">langId</span>
<span class="err">urn:node2</span>
<span class="err">rdf:type</span> <span class="err">ep:ExecutionNode</span>
+ <span class="err">ep:inExecutionPlan</span> <span
class="err">urn:execPlan</span>
<span class="err">ep:dependsOn</span> <span class="err">urn:node1</span>
<span class="err">ep:engine</span> <span class="err">ner</span>
<span class="err">urn:node3</span>
<span class="err">rdf:type</span> <span class="err">ep:ExecutionNode</span>
+ <span class="err">ep:inExecutionPlan</span> <span
class="err">urn:execPlan</span>
<span class="err">ep:dependsOn</span> <span class="err">urn:node1</span>
<span class="err">ep:engine</span> <span class="err">dbpediaLinking</span>
<span class="err">urn:node4</span>
<span class="err">rdf:type</span> <span class="err">ep:ExecutionNode</span>
+ <span class="err">ep:inExecutionPlan</span> <span
class="err">urn:execPlan</span>
<span class="err">ep:dependsOn</span> <span class="err">urn:node1</span>
<span class="err">ep:engine</span> <span class="err">geonamesLinking</span>
<span class="err">urn:node5</span>
<span class="err">rdf:type</span> <span class="err">ep:ExecutionNode</span>
+ <span class="err">ep:inExecutionPlan</span> <span
class="err">urn:execPlan</span>
<span class="err">ep:engine</span> <span class="err">zemanta</span>
<span class="err">ep:optional</span> <span
class="err">"true"^^xsd:boolean</span>
</pre></div>
@@ -217,6 +236,8 @@ from the ServiceReference</p>
<span class="o">+</span> <span class="n">getReferences</span><span
class="p">(</span><span class="n">String</span> <span
class="n">name</span><span class="p">)</span> <span class="p">:</span> <span
class="n">List</span><span class="sr"><ServiceReference></span>
<span class="sr">/** Getter for the Engine for the given name */</span>
<span class="o">+</span> <span class="n">getEngine</span><span
class="p">(</span><span class="n">String</span> <span
class="n">name</span><span class="p">)</span> <span class="p">:</span> <span
class="n">EnhancementEngine</span>
+<span class="sr">/** Getter for the names of the active engines */</span>
+<span class="o">+</span> <span class="n">getActiveEngineNames</span><span
class="p">()</span> <span class="p">:</span> <span class="n">Set</span><span
class="sr"><String></span>
</pre></div>
@@ -279,6 +300,7 @@ from the ServiceReference</p>
<p>This section describes canines to the Enhancement Process by the addition
of the Chains. It also provides a specification of how EnhancementEngines and
EnhancementJobManager implementations need to take care to allow asynchronous
and in parallel execution of multiple EnhancementEngines for the same
ContentItem. </p>
<p>Note that Work on asynchronous enhancement process is covered by <a
href="https://issues.apache.org/jira/browse/STANBOL-46">STANBOL-46</a></p>
<h3 id="enhancementjobmanager">EnhancementJobManager</h3>
+<p>syncronouse</p>
<p>This interface of the EnhancementJobManager will change due to the addition
of Chains and in future only contain a single Method allowing to enhance a
ContentItem by using the execution plan provided by the parsed Chain.</p>
<div class="codehilite"><pre><span class="o">+</span> <span
class="n">enhanceContent</span><span class="p">(</span><span
class="n">ContentItem</span> <span class="n">ci</span><span class="p">,</span>
<span class="n">Chain</span> <span class="n">chain</span><span
class="p">)</span>
</pre></div>
@@ -328,6 +350,128 @@ from the ServiceReference</p>
<p><strong>IMPORTANT:</strong> Do not try to get a write lock within a read
lock because this may be the cause of deadlocks. Thats because read locks can
be obtained simultaneously by multiple threads while write locks are exclusive.
So if two thread with a read lock try to also obtain a write lock they will
block each other. </p>
<p>EnhancementEngines that do NOT support EnhancementEngine#ENHANCE_ASYNC -
meaning that the canEnhance method only returns
EnhancementEngine#CANNOT_ENHANCE or EnhancementEngine#ENHANCE_SYNCHRONOUS - do
not need to obtain read and write locks. The EnhancementJobManager
implementation MUST ensure that they to have exclusive access to the
Enhancement Graph. This can be either done by obtaining a write lock before
calling such enhancement engines or by ensuring the no other engines are called
in parallel.</p>
<p>In cases where the EnhancementJobManager can execute multiple engines in
parallel it is good practice to first start the execution of Engines that do
support EnhancementEngine#ENHANCE_ASYNC. This will allow such engines to obtain
a read lock to read the data necessary for there calculations before the
EnhancementJobManager needs to obtain an exclusive write lock for calling
EnhancementEngines that do only support
EnhancementEngine#ENHANCE_SYNCHRONOUS.</p>
+<h3 id="execution_metadata">Execution Metadata</h3>
+<p>The EnhancementJobManager needs to provide metadata about the execution
process to the metadata of the processed ContentItem. Such data provide
information about the actual execution of the execution plan as provided by the
Chain. In the cause of asynchronous call to the Stanbol Enhancer this
information can also be used to provide information about the current state of
the elution to the requester as the EnhancementJobManager is required to update
such metadata on each time when an EnhancementEngine is started or has
completed/faild to process the enhanced ContentItem.</p>
+<p>The RDFS schema used for the execution plan is defined as follows.</p>
+<ul>
+<li>Namespace: em :
http://stanbol.apache.org/ontology/enhancer/executionMetadata#</li>
+<li><strong>em:Execution</strong> : Super class for all Executions<ul>
+<li><strong>em:executionPart</strong> (domain:Execution, range:
em:ChainExecution): Defines that this execution was part of the execution of a
chain</li>
+<li><strong>em:status</strong>(domain: em:Execution; range:
em:ExecutionStatus): The status of an Execution (used for both
em:EngineExection and em:ChainExecution</li>
+<li><strong>em:started</strong> (domain: em:Execution; range: xsd:dateTime):
Marks the start the the execution</li>
+<li><strong>em:completed</strong> (domain: em:Execution; range: xsd:dateTime):
Marks the completion of the execution</li>
+<li><strong>em:statusMessage</strong> (domain: em:Excecution; range:
xsd:string): A natural language description providing further information about
the status of this execution. Typically used to parse error messages if the
execution fails (em:status is set to em:StatusFailed).</li>
+</ul>
+</li>
+<li><strong>em:ChainExecution</strong> : Class used to describe the execution
of an enhancement Chain.<ul>
+<li><strong>em:defualtChain</strong> (domain: em:ChainExecution; range:
xsd:boolean): If the executed Chain is currently the default Chain of the
Stanbol Enhancer.</li>
+<li><strong>em:executionPlan</strong> (domain:ChainExecution; range:
ep:ExecutionPlan): Links to the execution plan as provided by the chain.</li>
+<li><strong>em:enhances</strong>(domain: em:ChainExecution; range:
rdf:Resource) : links the em:ChainExection with the URI of the processed
content item. The range needs to be updated as soon as the Stanbol Enhancement
Structure is defined.</li>
+<li><strong>em:enhancedBy</strong> (domain: rdf:Resource; range:
em:ChainExecution) : links the URI of the content item with the metadata about
the enhancement process. The range needs to be updated as soon as the Stanbol
Enhancement Structure is defined.</li>
+</ul>
+</li>
+<li><strong>em:EngineExecution</strong> : Class used to describe the execution
of an EnhancementEngine.<ul>
+<li><strong>em:executionNode</strong> (domain: em:EngineExecution; range:
ep:ExecutionNode): The node within the ExecutionPlan</li>
+</ul>
+</li>
+<li><strong>em:ExecutionStatus</strong> : Class describing the status of an
EngineExecution<ul>
+<li><strong>em:StatusSheduled</strong> : ExecutionStatis instance that
described that an execution is scheduled but has not yet started</li>
+<li><strong>em:StatusInProgress</strong> : ExecutuinStatus instance that
describes that the execution of the linked EngineExecution is in progress</li>
+<li><strong>em:StatusCompleted</strong> : ExecutionStatus instance describing
that the execution has already completed successfully</li>
+<li><strong>em:StatusFailed</strong> : ExecutionStatus indicating that the
execution has failed. Typically a em:statusMessage describing the reason for
the failed execution is provided for em:Executions with that state.</li>
+<li><strong>em:StatusSkiped</strong> : ExecutionStatus indicating that the
execution if an sp:ExecutionNode was skipped. This is only allowed for
execution nodes that are marked as optional. Typically also a em:statusMessage
with the reason should be provided.</li>
+</ul>
+</li>
+</ul>
+<h4 id="example_1">Example:</h4>
+<p>The following example uses the same example as used within the
ExecutionPlan section. To make the relations between the execution metadata and
the execution plan easier to see the triples of the execution plan are included
at the end of this example.</p>
+<p>This example describes the following situation:</p>
+<ul>
+<li>the execution of the content item with the URI 'urn:contentItem1' with the
default chain</li>
+<li>the default chain is represented by a Chain with the name "demoChain" the
ExecutionPlan has the URI 'urn:execPlan'</li>
+<li>the successful execution of the 'langid' engine (execution: 'urn:exec1',
node: 'urn:node1')</li>
+<li>the failed execution of the 'ner' engine (execution: 'urn:exec2', node:
'urn:node2'): As reason for the failure a message is provided that the NER
model for the language 'de' is not available</li>
+<li>the successful execution of the 'zemanta' engine (execution: 'urn:exec3',
node: 'urn:node5'): This engine was started in parallel to the 'ner' egine -
therefore before the chain failed.</li>
+<li>There is no execution of the dbpediaLinking (node: '') and geonamesLinking
(node: '') engines because the chain failed before such engines where
scheduled. This assumes the the EnhancementJobManagers does only add
em:EngineExecution resources when it starts the processing of an
ep:ExecutionNode defined in the execution plan. However EnhancementJobManager
can also create ep:Execution resources for all execution nodes. In that case
there would be also em:EngineExecution resources for the dbpediaLinking and
geonamesLinking engines with the em:status set to 'em:StatusSheduled'. </li>
+</ul>
+<p>The RDF graph with the Execution Metadata:</p>
+<div class="codehilite"><pre><span class="err">urn:exec</span>
+ <span class="err">rdf:type</span> <span
class="err">em:ChainExecution</span>
+ <span class="err">em:executionPlan</span> <span
class="err">urn:execPlan</span>
+ <span class="err">em:enhances</span> <span
class="err">urn:contentItem1</span>
+ <span class="err">em:defaultChain</span> <span
class="err">"true"</span>
+ <span class="err">em:started</span> <span
class="err">2012-01-11T12.13.14.156</span>
+ <span class="err">em:completed</span> <span
class="err">2012-01-11T12.13.15.157</span>
+ <span class="err">em:status</span> <span class="err">em:StatusFailed</span>
+ <span class="err">em:statusMessage</span> <span
class="err">"Unable</span> <span class="err">to</span> <span
class="err">execute</span> <span class="err">EnhancementEngine</span> <span
class="err">'new'</span> <span class="err">\</span>
+ <span class="err">(Message:</span> <span class="err">No</span> <span
class="err">NER</span> <span class="err">model</span> <span
class="err">for</span> <span class="err">language</span> <span
class="err">'de'</span> <span class="err">is</span> <span
class="err">available)."</span>
+ <span class="err">em:executionPart</span> <span
class="err">urn:exec1,</span> <span class="err">urn:exec2,</span> <span
class="err">urn:exec3,</span> <span class="err">urn:exec4,</span> <span
class="err">urn:exec5</span>
+
+<span class="err">urn:exec1</span>
+ <span class="err">rdf:type</span> <span
class="err">em:EngineExecution</span>
+ <span class="err">em:executionPart</span> <span class="err">urn:exec</span>
+ <span class="err">em:executionNode</span> <span
class="err">urn:node1</span>
+ <span class="err">em:status</span> <span
class="err">em:StatusCompleted</span>
+ <span class="err">em:started</span> <span
class="err">2012-01-11T12.13.14.160</span>
+ <span class="err">em:completed</span> <span
class="err">2012-01-11T12.13.14.250</span>
+
+<span class="err">urn:exec2</span>
+ <span class="err">rdf:type</span> <span
class="err">em:EngineExecution</span>
+ <span class="err">em:executionPart</span> <span class="err">urn:exec</span>
+ <span class="err">em:executionNode</span> <span
class="err">urn:node2</span>
+ <span class="err">em:status</span> <span class="err">StatusFailed</span>
+ <span class="err">em:statusMessage</span> <span
class="err">"No</span> <span class="err">NER</span> <span
class="err">model</span> <span class="err">for</span> <span
class="err">language</span> <span class="err">'de'</span> <span
class="err">is</span> <span class="err">available"</span>
+ <span class="err">em:started</span> <span
class="err">2012-01-11T12.13.14.253</span>
+ <span class="err">em:completed</span> <span
class="err">2012-01-11T12.13.14.289</span>
+
+<span class="err">urn:exec3</span>
+ <span class="err">rdf:type</span> <span
class="err">em:EngineExecution</span>
+ <span class="err">em:executionPart</span> <span class="err">urn:exec</span>
+ <span class="err">em:executionNode</span> <span
class="err">urn:node5</span>
+ <span class="err">em:status</span> <span class="err">StatusCompleted</span>
+ <span class="err">em:started</span> <span
class="err">2012-01-11T12.13.14.253</span>
+ <span class="err">em:completed</span> <span
class="err">2012-01-11T12.13.15.150</span>
+</pre></div>
+
+
+<p>The Execution Plan: (copy from the example provided in the ExecutionPlan
section)</p>
+<div class="codehilite"><pre><span class="err">urn:execPlan</span>
+ <span class="err">rdf:type</span> <span class="err">ep:ExecutionPlan</span>
+ <span class="err">ep:hasExecutionNode</span> <span
class="err">urn:node1,</span> <span class="err">urn:node2,</span> <span
class="err">urn:node3,</span> <span class="err">urn:node4,</span> <span
class="err">urn:node5</span>
+ <span class="err">ep:chain</span> <span
class="err">"demoChain"</span>
+
+<span class="err">urn:node1</span>
+ <span class="err">rdf:type</span> <span
class="err">stanbol:ExecutionNode</span>
+ <span class="err">ep:inExecutionPlan</span> <span
class="err">urn:execPlan</span>
+ <span class="err">stanbol:engine</span> <span class="err">langId</span>
+
+<span class="err">urn:node2</span>
+ <span class="err">rdf:type</span> <span class="err">ep:ExecutionNode</span>
+ <span class="err">ep:inExecutionPlan</span> <span
class="err">urn:execPlan</span>
+ <span class="err">ep:dependsOn</span> <span class="err">urn:node1</span>
+ <span class="err">ep:engine</span> <span class="err">ner</span>
+
+<span class="err">urn:node3</span>
+ <span class="err">rdf:type</span> <span class="err">ep:ExecutionNode</span>
+ <span class="err">ep:inExecutionPlan</span> <span
class="err">urn:execPlan</span>
+ <span class="err">ep:dependsOn</span> <span class="err">urn:node1</span>
+ <span class="err">ep:engine</span> <span class="err">dbpediaLinking</span>
+
+<span class="err">urn:node4</span>
+ <span class="err">rdf:type</span> <span class="err">ep:ExecutionNode</span>
+ <span class="err">ep:inExecutionPlan</span> <span
class="err">urn:execPlan</span>
+ <span class="err">ep:dependsOn</span> <span class="err">urn:node1</span>
+ <span class="err">ep:engine</span> <span class="err">geonamesLinking</span>
+
+<span class="err">urn:node5</span>
+ <span class="err">rdf:type</span> <span class="err">ep:ExecutionNode</span>
+ <span class="err">ep:inExecutionPlan</span> <span
class="err">urn:execPlan</span>
+ <span class="err">ep:engine</span> <span class="err">zemanta</span>
+ <span class="err">ep:optional</span> <span
class="err">"true"^^xsd:boolean</span>
+</pre></div>
+
+
+<p>Note that both the Execution Metadata AND the Execution Plan need to be
contained within the metadata of the ContentItem</p>
</div>
<div id="footer">