STANBOL-414-specification.html

buildbot Tue, 17 Jan 2012 00:48:15 -0800

Author: buildbot
Date: Tue Jan 17 08:47:45 2012
New Revision: 802791

Log:
Staging update by buildbot for stanbol


Modified:
    
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/STANBOL-414-specification.html

Modified: 
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/STANBOL-414-specification.html
==============================================================================
--- 
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/STANBOL-414-specification.html
 (original)
+++ 
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/STANBOL-414-specification.html
 Tue Jan 17 08:47:45 2012
@@ -94,11 +94,20 @@ enhancement engines.</p>
 <p>The RDFS schema used for the execution plan is defined as follows.</p>
 <ul>
 <li>Namespace: ep : 
http://stanbol.apache.org/ontology/enhancer/executionplan#</li>
-<li><strong>ep:ExecutionNode</strong> : Class used for all Nodes representing 
the execution of an Enhancement Engine.</li>
+<li><strong>ep:ExecutionPlan</strong> : Represent an execution plan defined by 
all linked execution nodes.<ul>
+<li><strong>ep:hasExecutionNode</strong> (domain: ep:ExecutionPlan; range: 
ep:ExecutionNode; inverseOf: ep:inExecutionPlan): links the execution plan with 
all the execution nodes.</li>
+<li><strong>ep:chain</strong> (domain: ep:ExecutionPlan; range: xsd:string): 
The name of the Chain this execution plan is used for.</li>
+</ul>
+</li>
+<li><strong>ep:ExecutionNode</strong> : Class used for all Nodes representing 
the execution of an Enhancement Engine.<ul>
+<li><strong>ep:inExecutionPlan</strong> (domain: ep:ExecutionNode; range: 
ep:ExecutionPlan ;inverseOf: ep:hasExecutionNode): functional property that 
links the execution node with an execution plan</li>
 <li><strong>ep:engine</strong> (domain: ep:ExecutionNode; range: xsd:string): 
The property used to link to the Enhancement Engine by the name of the 
engine.</li>
 <li><strong>ep:dependsOn</strong> (domain: ep:ExecutionNode; range: 
ep:ExecutionNode) Defines that the execution of this node depends on the 
completion of the referenced one.</li>
 <li><strong>ep:optional</strong> (domain: ep:ExecutionNode; range: 
xsd:boolean) Can be used to specify that the execution of this 
EnhancementEngine is optional. If this property is set to TRUE an engine will 
be marked as executed even if it execution was not possible (e.g. because an 
engine with this name was not active) or the execution failed (e.g. because of 
the Exception). </li>
 </ul>
+</li>
+</ul>
+<p>Note the the data for the ep:ExecutionPlan and the 
ep:hasExecutionNode/ep:inExecutionPlan typically need not to be parsed as 
configuration of a Chain. This information are typically automatically added 
based on the assumption that all ep:ExecutionNode parsed in the configuration 
for a chain are member of the execution plan for such chain. Therefore this 
information is typically added by the Chain itself when the configuration is 
parsed and validated.</p>
 <h4 id="example">Example:</h4>
 <p>This example shows an ExecutionPlan with three nodes for the "langId", 
"ner", "dbpediaLinking" "geonamesLinking" and "zemanta" engine. Note that this 
names refer to actual EnhancementEngine Services registered with the current 
OSGI Environment.</p>
 <p>This example assumes that</p>
@@ -110,27 +119,37 @@ enhancement engines.</p>
 <li>"zemanta" is the singleton instance of the ZemantaEnhancementEngine</li>
 </ul>
 <p>The RDF graph of such a chain would look:</p>
-<div class="codehilite"><pre><span class="err">urn:node1</span>
+<div class="codehilite"><pre><span class="err">urn:execPlan</span>
+    <span class="err">rdf:type</span> <span class="err">ep:ExecutionPlan</span>
+    <span class="err">ep:hasExecutionNode</span> <span 
class="err">urn:node1,</span> <span class="err">urn:node2,</span> <span 
class="err">urn:node3,</span> <span class="err">urn:node4,</span> <span 
class="err">urn:node5</span>
+    <span class="err">ep:chain</span> <span 
class="err">&quot;demoChain&quot;</span>
+
+<span class="err">urn:node1</span>
     <span class="err">rdf:type</span> <span 
class="err">stanbol:ExecutionNode</span>
+    <span class="err">ep:inExecutionPlan</span> <span 
class="err">urn:execPlan</span>
     <span class="err">stanbol:engine</span> <span class="err">langId</span>
 
 <span class="err">urn:node2</span>
     <span class="err">rdf:type</span> <span class="err">ep:ExecutionNode</span>
+    <span class="err">ep:inExecutionPlan</span> <span 
class="err">urn:execPlan</span>
     <span class="err">ep:dependsOn</span> <span class="err">urn:node1</span>
     <span class="err">ep:engine</span> <span class="err">ner</span>
 
 <span class="err">urn:node3</span>
     <span class="err">rdf:type</span> <span class="err">ep:ExecutionNode</span>
+    <span class="err">ep:inExecutionPlan</span> <span 
class="err">urn:execPlan</span>
     <span class="err">ep:dependsOn</span> <span class="err">urn:node1</span>
     <span class="err">ep:engine</span> <span class="err">dbpediaLinking</span>
 
 <span class="err">urn:node4</span>
     <span class="err">rdf:type</span> <span class="err">ep:ExecutionNode</span>
+    <span class="err">ep:inExecutionPlan</span> <span 
class="err">urn:execPlan</span>
     <span class="err">ep:dependsOn</span> <span class="err">urn:node1</span>
     <span class="err">ep:engine</span> <span class="err">geonamesLinking</span>
 
 <span class="err">urn:node5</span>
     <span class="err">rdf:type</span> <span class="err">ep:ExecutionNode</span>
+    <span class="err">ep:inExecutionPlan</span> <span 
class="err">urn:execPlan</span>
     <span class="err">ep:engine</span> <span class="err">zemanta</span>
     <span class="err">ep:optional</span> <span 
class="err">&quot;true&quot;^^xsd:boolean</span>
 </pre></div>
@@ -217,6 +236,8 @@ from the ServiceReference</p>
 <span class="o">+</span> <span class="n">getReferences</span><span 
class="p">(</span><span class="n">String</span> <span 
class="n">name</span><span class="p">)</span> <span class="p">:</span> <span 
class="n">List</span><span class="sr">&lt;ServiceReference&gt;</span>
 <span class="sr">/** Getter for the Engine for the given name */</span>
 <span class="o">+</span> <span class="n">getEngine</span><span 
class="p">(</span><span class="n">String</span> <span 
class="n">name</span><span class="p">)</span> <span class="p">:</span> <span 
class="n">EnhancementEngine</span>
+<span class="sr">/** Getter for the names of the active engines */</span>
+<span class="o">+</span> <span class="n">getActiveEngineNames</span><span 
class="p">()</span> <span class="p">:</span> <span class="n">Set</span><span 
class="sr">&lt;String&gt;</span>
 </pre></div>
 
 
@@ -279,6 +300,7 @@ from the ServiceReference</p>
 <p>This section describes canines to the Enhancement Process by the addition 
of the Chains. It also provides a specification of how EnhancementEngines and 
EnhancementJobManager implementations need to take care to allow asynchronous 
and in parallel execution of multiple EnhancementEngines for the same 
ContentItem. </p>
 <p>Note that Work on asynchronous enhancement process is covered by <a 
href="https://issues.apache.org/jira/browse/STANBOL-46";>STANBOL-46</a></p>
 <h3 id="enhancementjobmanager">EnhancementJobManager</h3>
+<p>syncronouse</p>
 <p>This interface of the EnhancementJobManager will change due to the addition 
of Chains and in future only contain a single Method allowing to enhance a 
ContentItem by using the execution plan provided by the parsed Chain.</p>
 <div class="codehilite"><pre><span class="o">+</span> <span 
class="n">enhanceContent</span><span class="p">(</span><span 
class="n">ContentItem</span> <span class="n">ci</span><span class="p">,</span> 
<span class="n">Chain</span> <span class="n">chain</span><span 
class="p">)</span>
 </pre></div>
@@ -328,6 +350,128 @@ from the ServiceReference</p>
 <p><strong>IMPORTANT:</strong> Do not try to get a write lock within a read 
lock because this may be the cause of deadlocks. Thats because read locks can 
be obtained simultaneously by multiple threads while write locks are exclusive. 
So if two thread with a read lock try to also obtain a write lock they will 
block each other. </p>
 <p>EnhancementEngines that do NOT support EnhancementEngine#ENHANCE_ASYNC - 
meaning that the canEnhance method only returns 
EnhancementEngine#CANNOT_ENHANCE or EnhancementEngine#ENHANCE_SYNCHRONOUS - do 
not need to obtain read and write locks. The EnhancementJobManager 
implementation MUST ensure that they to have exclusive access to the 
Enhancement Graph. This can be either done by obtaining a write lock before 
calling such enhancement engines or by ensuring the no other engines are called 
in parallel.</p>
 <p>In cases where the EnhancementJobManager can execute multiple engines in 
parallel it is good practice to first start the execution of Engines that do 
support EnhancementEngine#ENHANCE_ASYNC. This will allow such engines to obtain 
a read lock to read the data necessary for there calculations before the 
EnhancementJobManager needs to obtain an exclusive write lock for calling 
EnhancementEngines that do only support 
EnhancementEngine#ENHANCE_SYNCHRONOUS.</p>
+<h3 id="execution_metadata">Execution Metadata</h3>
+<p>The EnhancementJobManager needs to provide metadata about the execution 
process to the metadata of the processed ContentItem. Such data provide 
information about the actual execution of the execution plan as provided by the 
Chain. In the cause of asynchronous call to the Stanbol Enhancer this 
information can also be used to provide information about the current state of 
the elution to the requester as the EnhancementJobManager is required to update 
such metadata on each time when an EnhancementEngine is started or has 
completed/faild to process the enhanced ContentItem.</p>
+<p>The RDFS schema used for the execution plan is defined as follows.</p>
+<ul>
+<li>Namespace: em : 
http://stanbol.apache.org/ontology/enhancer/executionMetadata#</li>
+<li><strong>em:Execution</strong> : Super class for all Executions<ul>
+<li><strong>em:executionPart</strong> (domain:Execution, range: 
em:ChainExecution): Defines that this execution was part of the execution of a 
chain</li>
+<li><strong>em:status</strong>(domain: em:Execution; range: 
em:ExecutionStatus): The status of an Execution (used for both 
em:EngineExection and em:ChainExecution</li>
+<li><strong>em:started</strong> (domain: em:Execution; range: xsd:dateTime): 
Marks the start the the execution</li>
+<li><strong>em:completed</strong> (domain: em:Execution; range: xsd:dateTime): 
Marks the completion of the execution</li>
+<li><strong>em:statusMessage</strong> (domain: em:Excecution; range: 
xsd:string): A natural language description providing further information about 
the status of this execution. Typically used to parse error messages if the 
execution fails (em:status is set to em:StatusFailed).</li>
+</ul>
+</li>
+<li><strong>em:ChainExecution</strong> : Class used to describe the execution 
of an enhancement Chain.<ul>
+<li><strong>em:defualtChain</strong> (domain: em:ChainExecution; range: 
xsd:boolean): If the executed Chain is currently the default Chain of the 
Stanbol Enhancer.</li>
+<li><strong>em:executionPlan</strong> (domain:ChainExecution; range: 
ep:ExecutionPlan): Links to the execution plan as provided by the chain.</li>
+<li><strong>em:enhances</strong>(domain: em:ChainExecution; range: 
rdf:Resource) : links the em:ChainExection with the URI of the processed 
content item. The range needs to be updated as soon as the Stanbol Enhancement 
Structure is defined.</li>
+<li><strong>em:enhancedBy</strong> (domain: rdf:Resource; range: 
em:ChainExecution) : links the URI of the content item with the metadata about 
the enhancement process. The range needs to be updated as soon as the Stanbol 
Enhancement Structure is defined.</li>
+</ul>
+</li>
+<li><strong>em:EngineExecution</strong> : Class used to describe the execution 
of an EnhancementEngine.<ul>
+<li><strong>em:executionNode</strong> (domain: em:EngineExecution; range: 
ep:ExecutionNode): The node within the ExecutionPlan</li>
+</ul>
+</li>
+<li><strong>em:ExecutionStatus</strong> : Class describing the status of an 
EngineExecution<ul>
+<li><strong>em:StatusSheduled</strong> : ExecutionStatis instance that 
described that an execution is scheduled but has not yet started</li>
+<li><strong>em:StatusInProgress</strong> : ExecutuinStatus instance that 
describes that the execution of the linked EngineExecution is in progress</li>
+<li><strong>em:StatusCompleted</strong> : ExecutionStatus instance describing 
that the execution has already completed successfully</li>
+<li><strong>em:StatusFailed</strong> : ExecutionStatus indicating that the 
execution has failed. Typically a em:statusMessage describing the reason for 
the failed execution is provided for em:Executions with that state.</li>
+<li><strong>em:StatusSkiped</strong> : ExecutionStatus indicating that the 
execution if an sp:ExecutionNode was skipped. This is only allowed for 
execution nodes that are marked as optional. Typically also a em:statusMessage 
with the reason should be provided.</li>
+</ul>
+</li>
+</ul>
+<h4 id="example_1">Example:</h4>
+<p>The following example uses the same example as used within the 
ExecutionPlan section. To make the relations between the execution metadata and 
the execution plan easier to see the triples of the execution plan are included 
at the end of this example.</p>
+<p>This example describes the following situation:</p>
+<ul>
+<li>the execution of the content item with the URI 'urn:contentItem1' with the 
default chain</li>
+<li>the default chain is represented by a Chain with the name "demoChain" the 
ExecutionPlan has the URI 'urn:execPlan'</li>
+<li>the successful execution of the 'langid' engine (execution: 'urn:exec1', 
node: 'urn:node1')</li>
+<li>the failed execution of the 'ner' engine (execution: 'urn:exec2', node: 
'urn:node2'): As reason for the failure a message is provided that the NER 
model for the language 'de' is not available</li>
+<li>the successful execution of the 'zemanta' engine (execution: 'urn:exec3', 
node: 'urn:node5'): This engine was started in parallel to the 'ner' egine - 
therefore before the chain failed.</li>
+<li>There is no execution of the dbpediaLinking (node: '') and geonamesLinking 
(node: '') engines because the chain failed before such engines where 
scheduled. This assumes the the EnhancementJobManagers does only add 
em:EngineExecution resources when it starts the processing of an 
ep:ExecutionNode defined in the execution plan. However EnhancementJobManager 
can also create ep:Execution resources for all execution nodes. In that case 
there would be also em:EngineExecution resources for the dbpediaLinking and 
geonamesLinking engines with the em:status set to 'em:StatusSheduled'. </li>
+</ul>
+<p>The RDF graph with the Execution Metadata:</p>
+<div class="codehilite"><pre><span class="err">urn:exec</span>
+    <span class="err">rdf:type</span> <span 
class="err">em:ChainExecution</span>
+    <span class="err">em:executionPlan</span> <span 
class="err">urn:execPlan</span>
+    <span class="err">em:enhances</span> <span 
class="err">urn:contentItem1</span>
+    <span class="err">em:defaultChain</span> <span 
class="err">&quot;true&quot;</span>
+    <span class="err">em:started</span> <span 
class="err">2012-01-11T12.13.14.156</span>
+    <span class="err">em:completed</span> <span 
class="err">2012-01-11T12.13.15.157</span>
+    <span class="err">em:status</span> <span class="err">em:StatusFailed</span>
+    <span class="err">em:statusMessage</span> <span 
class="err">&quot;Unable</span> <span class="err">to</span> <span 
class="err">execute</span> <span class="err">EnhancementEngine</span> <span 
class="err">&#39;new&#39;</span> <span class="err">\</span>
+        <span class="err">(Message:</span> <span class="err">No</span> <span 
class="err">NER</span> <span class="err">model</span> <span 
class="err">for</span> <span class="err">language</span> <span 
class="err">&#39;de&#39;</span> <span class="err">is</span> <span 
class="err">available).&quot;</span>
+    <span class="err">em:executionPart</span> <span 
class="err">urn:exec1,</span> <span class="err">urn:exec2,</span> <span 
class="err">urn:exec3,</span> <span class="err">urn:exec4,</span> <span 
class="err">urn:exec5</span>
+
+<span class="err">urn:exec1</span>
+    <span class="err">rdf:type</span> <span 
class="err">em:EngineExecution</span>
+    <span class="err">em:executionPart</span> <span class="err">urn:exec</span>
+    <span class="err">em:executionNode</span> <span 
class="err">urn:node1</span>
+    <span class="err">em:status</span> <span 
class="err">em:StatusCompleted</span>
+    <span class="err">em:started</span> <span 
class="err">2012-01-11T12.13.14.160</span>
+    <span class="err">em:completed</span> <span 
class="err">2012-01-11T12.13.14.250</span>
+
+<span class="err">urn:exec2</span>
+    <span class="err">rdf:type</span> <span 
class="err">em:EngineExecution</span>
+    <span class="err">em:executionPart</span> <span class="err">urn:exec</span>
+    <span class="err">em:executionNode</span> <span 
class="err">urn:node2</span>
+    <span class="err">em:status</span> <span class="err">StatusFailed</span>
+    <span class="err">em:statusMessage</span> <span 
class="err">&quot;No</span> <span class="err">NER</span> <span 
class="err">model</span> <span class="err">for</span> <span 
class="err">language</span> <span class="err">&#39;de&#39;</span> <span 
class="err">is</span> <span class="err">available&quot;</span>
+    <span class="err">em:started</span> <span 
class="err">2012-01-11T12.13.14.253</span>
+    <span class="err">em:completed</span> <span 
class="err">2012-01-11T12.13.14.289</span>
+
+<span class="err">urn:exec3</span>
+    <span class="err">rdf:type</span> <span 
class="err">em:EngineExecution</span>
+    <span class="err">em:executionPart</span> <span class="err">urn:exec</span>
+    <span class="err">em:executionNode</span> <span 
class="err">urn:node5</span>
+    <span class="err">em:status</span> <span class="err">StatusCompleted</span>
+    <span class="err">em:started</span> <span 
class="err">2012-01-11T12.13.14.253</span>
+    <span class="err">em:completed</span> <span 
class="err">2012-01-11T12.13.15.150</span>
+</pre></div>
+
+
+<p>The Execution Plan: (copy from the example provided in the ExecutionPlan 
section)</p>
+<div class="codehilite"><pre><span class="err">urn:execPlan</span>
+    <span class="err">rdf:type</span> <span class="err">ep:ExecutionPlan</span>
+    <span class="err">ep:hasExecutionNode</span> <span 
class="err">urn:node1,</span> <span class="err">urn:node2,</span> <span 
class="err">urn:node3,</span> <span class="err">urn:node4,</span> <span 
class="err">urn:node5</span>
+    <span class="err">ep:chain</span> <span 
class="err">&quot;demoChain&quot;</span>
+
+<span class="err">urn:node1</span>
+    <span class="err">rdf:type</span> <span 
class="err">stanbol:ExecutionNode</span>
+    <span class="err">ep:inExecutionPlan</span> <span 
class="err">urn:execPlan</span>
+    <span class="err">stanbol:engine</span> <span class="err">langId</span>
+
+<span class="err">urn:node2</span>
+    <span class="err">rdf:type</span> <span class="err">ep:ExecutionNode</span>
+    <span class="err">ep:inExecutionPlan</span> <span 
class="err">urn:execPlan</span>
+    <span class="err">ep:dependsOn</span> <span class="err">urn:node1</span>
+    <span class="err">ep:engine</span> <span class="err">ner</span>
+
+<span class="err">urn:node3</span>
+    <span class="err">rdf:type</span> <span class="err">ep:ExecutionNode</span>
+    <span class="err">ep:inExecutionPlan</span> <span 
class="err">urn:execPlan</span>
+    <span class="err">ep:dependsOn</span> <span class="err">urn:node1</span>
+    <span class="err">ep:engine</span> <span class="err">dbpediaLinking</span>
+
+<span class="err">urn:node4</span>
+    <span class="err">rdf:type</span> <span class="err">ep:ExecutionNode</span>
+    <span class="err">ep:inExecutionPlan</span> <span 
class="err">urn:execPlan</span>
+    <span class="err">ep:dependsOn</span> <span class="err">urn:node1</span>
+    <span class="err">ep:engine</span> <span class="err">geonamesLinking</span>
+
+<span class="err">urn:node5</span>
+    <span class="err">rdf:type</span> <span class="err">ep:ExecutionNode</span>
+    <span class="err">ep:inExecutionPlan</span> <span 
class="err">urn:execPlan</span>
+    <span class="err">ep:engine</span> <span class="err">zemanta</span>
+    <span class="err">ep:optional</span> <span 
class="err">&quot;true&quot;^^xsd:boolean</span>
+</pre></div>
+
+
+<p>Note that both the Execution Metadata AND the Execution Plan need to be 
contained within the metadata of the ContentItem</p>
   </div>
   
   <div id="footer">

svn commit: r802791 - /websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/STANBOL-414-specification.html

Reply via email to