executionplan.mdtext enhancementjobmanager.mdtext executionmetadata.mdtext

rwesten Thu, 26 Jan 2012 07:30:34 -0800

Author: rwesten
Date: Thu Jan 26 15:30:10 2012
New Revision: 1236242

URL: http://svn.apache.org/viewvc?rev=1236242&view=rev
Log:
Added Documentation for EnhancementJobManager and ExecutionMetadata


Added:
    
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/enhancementjobmanager.mdtext
    
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/executionmetadata.mdtext
Modified:
    
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancementchain.mdtext
    
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/executionplan.mdtext

Modified: 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancementchain.mdtext
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancementchain.mdtext?rev=1236242&r1=1236241&r2=1236242&view=diff
==============================================================================
--- 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancementchain.mdtext
 (original)
+++ 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancementchain.mdtext
 Thu Jan 26 15:30:10 2012
@@ -1,6 +1,6 @@
 Title: Enhancement Chains
 
-An Enhancement Chain defines how Content parsed to the Stanbol Enhancer is 
processed. More concrete it defines what engines and in what order are used to 
process ContentItems. Chains are not responsible for the actual processing of 
ContentItems. They provide the [ExecutionPlan](executionplan.html) to the 
EnhancementJobManger that does the actual processing of the ContentItem.
+An Enhancement Chain defines how Content parsed to the Stanbol Enhancer is 
processed. More concrete it defines what engines and in what order are used to 
process ContentItems. Chains are not responsible for the actual processing of 
ContentItems. They provide the [ExecutionPlan](executionplan.html) to the 
[EnhancementJobManger](../enhancementjobmanager.html) that does the actual 
processing of the ContentItem.
 
 In the RESTful API enhancement chains can be accessed by there name under
 
@@ -35,7 +35,7 @@ When using the Java API Chains can be lo
     MGraph enhancementResults = ci.getMetadata();
 
 To enhance a ContentItem with the default chain the 
"enhanceContent(ContentItem ci)" can be used.
-<
+
 ## Chain Interface
 
 The Chain interface is very simplistic. It only defines three methods.
@@ -65,7 +65,7 @@ Because the configuration of a Chain mig
 
 ## Enhancement Chain Management
 
-This section describes how Enhancement Cahins are managed by the Stanbol 
Enhancer and how they can be selected/accessed. It also describes how the 
"default" Chain is determined.
+This section describes how Enhancement Chains are managed by the Stanbol 
Enhancer and how they can be selected/accessed. It also describes how the 
"default" Chain is determined.
 
 For every Stanbol Enhancer a single Chain MUST BE present. If this is not the 
case enhance request MUST throw a ChainException with an according error 
message. However typically multiple EnhancementChains will be configured. 
 
@@ -86,15 +86,15 @@ The default Chain is determined by the f
 
 1. the Chain with the name "default". If more than one Chain is present with 
that name, than the above rules for resolving name conflicts apply. If none
 2. the Chain with the highest "service.ranking". If several have the same 
ranking
-3. the Cahin with the lowest "service.id"
+3. the Chain with the lowest "service.id"
 
 If no chain is active a ChainException with an according message MUST BE 
thrown.
 
 All Stanbol launchers are configured with the [Default 
Chain](defaultchain.html) enabled. This registers itself with the name 
"default" and the lowest possible service ranking - Integer.MIN_VALUE. This 
default provides a Chain that considered all currently active 
EnhancementEngines and sorts them based on there ordering information (see the 
[Calculation of the Execution Plan based on the EnhancementEngine 
Ordering](weightedchain.html#calculation_of_the_executionplan) for details).
 
-### [ChainManager interface](chainmanager.html)
+### ChainManager interface
 
-This is the management interface for EnhancementChains that can be used by 
components to lookup chains based on there name. It also provides a getter for 
the default chain. There is also OSGI ServiceTracker like implementation that 
can be used to track only chains with specific names and to get even notified 
on any change of such chains.
+The [ChainManager](chainmanager.html) is the management interface for 
EnhancementChains that can be used by components to lookup chains based on 
there name. It also provides a getter for the default chain. There is also OSGI 
ServiceTracker like implementation that can be used to track only chains with 
specific names and to get even notified on any change of such chains.
 
 ## Chain implementations
 
@@ -104,6 +104,6 @@ The following Chain implementations are 
 * __[ListChain](listchain.html)__: Implementation that creates the 
ExecutionPlan by chaining the EnhancementEngines in the exact order as 
specified by the parsed list. This Chain does not support parallel execution of 
engines.
 * __[WeightedChain](weightedchain.html)__: This Chain implementation takes a 
List of Engines names as input and uses the 
"org.apache.stanbol.enhancer.engine.order " metadata provided by such engines 
to calculate the ExecutionGraph.
 * __[GraphChain](graphchain.html)__: This Chain implementation is based on a 
ExecutionGraph parsed os configuration.
-* __SingleEngineChain__: An Adapter that allows to execute a single 
EnhancementEngine within a Chain. This types of Chains will not be registered 
as OSGI service. Instances will be created on request for single 
EnhancementEngines and directly parsed to the EnhancementJobManager 
implementation. 
+* __SingleEngineChain__: An Adapter that allows to execute a single 
EnhancementEngine within a Chain. This types of Chains will not be registered 
as OSGI service. Instances will be created on request for single 
EnhancementEngines and directly parsed to the 
[EnhancementJobManager](../enhancementjobmanager.html) implementation. 
 
 

Modified: 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/executionplan.mdtext
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/executionplan.mdtext?rev=1236242&r1=1236241&r2=1236242&view=diff
==============================================================================
--- 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/executionplan.mdtext
 (original)
+++ 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/executionplan.mdtext
 Thu Jan 26 15:30:10 2012
@@ -1,6 +1,6 @@
 Title: ExecutionPlan
 
-The ExecutionPlan is represented as an RDF graph following the ExecutionPlan 
Ontology. It needs to be provided by the [Enhancement 
Chain](enhancementchain.html) and is used by the EnhancementJobManager to 
enhance ContentItems.
+The ExecutionPlan is represented as an RDF graph following the ExecutionPlan 
Ontology. It needs to be provided by the [Enhancement 
Chain](enhancementchain.html) and is used by the 
[EnhancementJobManager](../enhancementjobmanager.html) to enhance ContentItems 
and to write the [ExecutionMetadata](../executionmetadata.html).
 
 ## ExecutionPlan Ontology
 

Added: 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/enhancementjobmanager.mdtext
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/enhancementjobmanager.mdtext?rev=1236242&view=auto
==============================================================================
--- 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/enhancementjobmanager.mdtext
 (added)
+++ 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/enhancementjobmanager.mdtext
 Thu Jan 26 15:30:10 2012
@@ -0,0 +1,96 @@
+Title: EnhancementJobManager
+
+
+The EnhancementJobManager is component responsible for the execution of the 
[ExecutionPlan](../chains/executionplan.html) as provided by the [Enhancement 
Chain](../chains/enhancementchain.html) on the ContentItem.
+
+
+## EnhancementJobManager interface
+
+The interface of the EnhancementJobManager is very simple:
+
+    /** Enhances the content item by using the default Chain */
+    + enhanceContent(ContentItem ci)
+    /** Enhances the content item by using the parsed Chain */
+    + enhanceContent(ContentItem ci, Chain chain)
+
+Note that the parsed ContentItem will be changed during the enhancement 
process. EnhancementEngines will add extracted knowledge to the metadata of the 
content item. Also additional content parts may be added to the content item.
+
+
+## Enhancement Process
+
+While the [ExecutionPlan](../chains/executionplan.html) defines what 
EnhancementEgnies are used and how they depend on each the 
EnhancementJobManager is responsible for the actual execution of the 
enhancement process based on this plan. This section provides detailed 
information about requirements and expectations that MUST BE considered.
+
+The EnhancementJobManager is also responsible to create and update the 
[ExecutionMetadata](executionmetadata.html) in the metadata of the processed 
ContentItem.
+
+### Retrieving the ExecutionPlan
+
+The [ExecutionPlan](../chains/executionplan.html) is provided by the Chain in 
a final graph that is guaranteed to be not changed. However because the 
configuration of a Chain might be change at any time the EnhancementJobManager 
MUST retrieve the execution plan only once and used it during the whole 
enhancement process.
+
+Before the start of the enhancement process the EnhancementJobManager needs 
first to initialize the [ExecutionMetadata](executionmetadata.html) for the 
ContentItem. This includes
+
+1. copying the execution plan as returned by the Chain to the metadata of the 
content item
+2. create an 'em:ChainExecutin' instance and set the 'em:enhances' property to 
the URI of the ContentItem
+3. creating 'em:EngineExecution' instances for all 'ep:ExecutionNodes' and set 
the 'em:status' of those to 'em:StatusSheduled'. Also define such instances as 
'em:executionPart' to the chain execution and link them to the according 
execution node of the execution plan.
+
+See the documentation of the [ExecutionMetadata](executionmetadata.html) for 
more information.
+
+### Engine Execution
+
+The ExecutionPlan provides the necessary information what engines can be 
executed at any given state. The following code shows how to determine 
executable engines. 
+This code snippet assumes to be called after the execution of an 
EnhancementEngine has completed. Note that in a multi threaded environment 
access to the list of executed and running engines need to be synchronized.
+
+    Collection<NonLiteral> executed; //already executed Engines
+    Collection<NonLiteral> running; //currently running Engines
+
+    Collection<NonLiteral> next = ExecutionPlanUtils.getExecuteable(plan, 
executed);
+    for(NonLiteral node : next){
+        if(!running.contains(node)){
+            String engineName = 
EnhancementEngineHelper.getString(executionPlan,node, EX_ENGINE));
+            EnhancementEngine engine = tracker.getEngine(engineName);
+            if(engine != null){
+                // execute engine
+            } else {
+               //check if optional and throw error if not
+            }
+        } // else already running -> ignore
+    }
+
+Before executing an EnhancementEngine the EnhancementJobManager needs to check 
if and how the engine can enhance a content item. This is indicated by the 
integer returned by the "canEnhance(ContentItem ci)" method:
+
+* __CANNOT_ENHANCE__: Indicates that this engines can not process the parsed 
content item. In this case the EnhancementJobManager needs to skip this engine 
and mark the EngineExectuion as skipt with a status message that the 
EnhancementEngine was unable to process the content item. If this engine is 
marked as optional the enhancement process can continue if not, than the 
execution MUST be marked as failed and an according Exception needs to be 
thrown.
+* __ENHANCE_SYNCHRONOUS__: Indicates that the engines needs exclusive access 
to the parsed content item. The EnhancementJobManager needs to ensure that in 
some way. Typically by calling the "computeEnhancement(ContentItem ci)" method 
within an write lock.
+* __ENHANCE_ASYNC__: Indicates that this engine supports asynchronous 
execution and takes itself care to acquire read and write locks on the parsed 
content item. However this does not require the JobManager to execute the 
engine asynchronously.
+
+If the execution of an EnhancementEngine completes the JobManager needs to set 
the state of the execution to completed and update the other metadata 
accordingly.
+
+If a call to "computeEnhancement(ContentItem ci)" results in an Exception the 
EnhancementJobManager must mark the execution of the engine as failed with a 
decryption of the occurred exception. If the the execution of the affected 
engine was optional the enhancement process is continued. Otherwise the 
enhancement process needs to be stopped and the Error needs to rethrown by the 
"enhanceContent(..)" method.
+
+### Multi Threaded enhancement processes
+
+In case the EnhancementJobManager supports to simultaneously call 
EnhancementEngines for the same content item in multiple threads it is 
important to correctly use the ReadWriteLock as provided by the 
ContentItem.getLock() method.
+
+There are manny good examples on how to correctly use 
"java.util.concurrent.ReadWriteLock" available on the web.
+
+### Finalizing the EnhancementProcess
+
+After the execution is completed (successfully or failed) the 
EnhancementJobManager need to ensure that the 'em:status' and the 
'em:completed' of the 'em:ChainExecution' instance are set. If the execution 
failed also the 'em:statusMessage' should be available and contain a message 
that describes the problem.
+
+
+## EnhancementJobManager implementations
+
+EnhancementJobManager implementations need to register itself as OSGI 
services. By default the Stanbol Enhancer will use the implementation with the 
highest service ranking. The service ranking can be set by providing a 
configuration defining an integer value for the property "service.ranking"
+
+
+### EventJabManager
+
+This implementation is provided by the 
"org.apache.stanbol.enhancer.jobmanager.event" module and is currently used as 
default. It registers itself (by default) with a service ranking of '0'.
+
+This implementation supports an asynchronous enhancement process by using the 
["org.osgi.serivce.event"](http://www.osgi.org/javadoc/r4v42/org/osgi/service/event/package-summary.html)
 framework. 
+
+### WeightedJobManager
+
+This JobManager was used as default before the introduction of 
EnhancementChains. It does not support EnhancementChains and will enhance 
parsed ContentItems by calling all currently active EnhancementEngines in a 
sequential manner. It does also not have support for EnhancementMetadata.
+
+This implementation is provided by the 
"org.apache.stanbol.enhancer.jobmanager.weightedjobmanager" module and is no 
longer included within the Apache Stanbol launchers. This JobManager registers 
itself with a service ranking of "-1000". Users that want to use this job 
manager need to manually install this bundle and either deactivate other 
EnhancementJobManager implementation or reconfigure the service ranking of this 
one to an value > 0.
+
+

Added: 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/executionmetadata.mdtext
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/executionmetadata.mdtext?rev=1236242&view=auto
==============================================================================
--- 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/executionmetadata.mdtext
 (added)
+++ 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/executionmetadata.mdtext
 Thu Jan 26 15:30:10 2012
@@ -0,0 +1,121 @@
+Title: Execution Metadata
+
+The execution metadata are added by the 
[EnhancementJobManager](enhancementjobmanager.html) to the metadata of the 
ContentItem. This metadata provide information about the execution of the 
[ExecutionPlan](chains/executionplan.html) provided by the 
[Chain](chains/enhancementchain.html) and can be used by clients to get 
detailed information about the enhancement process of a content item.
+
+In the case of asynchronous calls to the enhancers RESTful interface (requests 
that immediately return and do not wait for the enhancement process to 
complete) this information might also be useful to provide information about 
the current state of the enhancement process.
+
+### Exection Metadata Ontology
+
+The RDFS schema used for the execution plan is defined as follows.
+
+ * Namespace: em : 
http://stanbol.apache.org/ontology/enhancer/executionMetadata#
+ * __em:Execution__ : Super class for all Executions
+     * __em:executionPart__ (domain:Execution, range: em:ChainExecution): 
Defines that this execution was part of the execution of a chain
+     * __em:status__(domain: em:Execution; range: em:ExecutionStatus): The 
status of an Execution (used for both em:EngineExection and em:ChainExecution
+     * __em:started__ (domain: em:Execution; range: xsd:dateTime): Marks the 
start the the execution
+     * __em:completed__ (domain: em:Execution; range: xsd:dateTime): Marks the 
completion of the execution
+     * __em:statusMessage__ (domain: em:Excecution; range: xsd:string): A 
natural language description providing further information about the status of 
this execution. Typically used to parse error messages if the execution fails 
(em:status is set to em:StatusFailed).
+ * __em:ChainExecution__ : Class used to describe the execution of an 
enhancement Chain.
+     * __em:defualtChain__ (domain: em:ChainExecution; range: xsd:boolean): If 
the executed Chain is currently the default Chain of the Stanbol Enhancer.
+     * __em:executionPlan__ (domain:ChainExecution; range: ep:ExecutionPlan): 
Links to the execution plan as provided by the chain.
+     * __em:enhances__(domain: em:ChainExecution; range: rdf:Resource) : links 
the em:ChainExection with the URI of the processed content item. The range 
needs to be updated as soon as the Stanbol Enhancement Structure is defined.
+     * __em:enhancedBy__ (domain: rdf:Resource; range: em:ChainExecution) : 
links the URI of the content item with the metadata about the enhancement 
process. The range needs to be updated as soon as the Stanbol Enhancement 
Structure is defined.
+ * __em:EngineExecution__ : Class used to describe the execution of an 
EnhancementEngine.
+     * __em:executionNode__ (domain: em:EngineExecution; range: 
ep:ExecutionNode): The node within the ExecutionPlan
+ * __em:ExecutionStatus__ : Class describing the status of an EngineExecution
+     * __em:StatusSheduled__ : ExecutionStatis instance that described that an 
execution is scheduled but has not yet started
+     * __em:StatusInProgress__ : ExecutuinStatus instance that describes that 
the execution of the linked EngineExecution is in progress
+     * __em:StatusCompleted__ : ExecutionStatus instance describing that the 
execution has already completed successfully
+     * __em:StatusFailed__ : ExecutionStatus indicating that the execution has 
failed. Typically a em:statusMessage describing the reason for the failed 
execution is provided for em:Executions with that state.
+     * __em:StatusSkiped__ : ExecutionStatus indicating that the execution if 
an sp:ExecutionNode was skipped. This is only allowed for execution nodes that 
are marked as optional. Typically also a em:statusMessage with the reason 
should be provided.
+
+
+### Example:
+
+The following example uses the same example as used within the 
[ExecutionPlan](chains/executionplan.html) section. To make the relations 
between the execution metadata and the execution plan easier to see the triples 
of the execution plan are included at the end of this example.
+
+This example describes the following situation:
+
+* the execution of the content item with the URI 'urn:contentItem1' with the 
default chain
+* the default chain is represented by a Chain with the name "demoChain" the 
ExecutionPlan has the URI 'urn:execPlan'
+* the successful execution of the 'langid' engine (execution: 'urn:exec1', 
node: 'urn:node1')
+* the failed execution of the 'ner' engine (execution: 'urn:exec2', node: 
'urn:node2'): As reason for the failure a message is provided that the NER 
model for the language 'de' is not available
+* the successful execution of the 'zemanta' engine (execution: 'urn:exec3', 
node: 'urn:node5'): This engine was started in parallel to the 'ner' egine - 
therefore before the chain failed.
+* There is no execution of the dbpediaLinking (node: '') and geonamesLinking 
(node: '') engines because the chain failed before such engines where 
scheduled. This assumes the the EnhancementJobManagers does only add 
em:EngineExecution resources when it starts the processing of an 
ep:ExecutionNode defined in the execution plan. However EnhancementJobManager 
can also create ep:Execution resources for all execution nodes. In that case 
there would be also em:EngineExecution resources for the dbpediaLinking and 
geonamesLinking engines with the em:status set to 'em:StatusSheduled'. 
+
+The RDF graph with the Execution Metadata:
+
+    urn:exec
+        rdf:type em:ChainExecution
+        em:executionPlan urn:execPlan
+        em:enhances urn:contentItem1
+        em:defaultChain "true"
+        em:started 2012-01-11T12.13.14.156
+        em:completed 2012-01-11T12.13.15.157
+        em:status em:StatusFailed
+        em:statusMessage "Unable to execute EnhancementEngine 'new' \
+            (Message: No NER model for language 'de' is available)."
+        em:executionPart urn:exec1, urn:exec2, urn:exec3, urn:exec4, urn:exec5
+
+    urn:exec1
+        rdf:type em:EngineExecution
+        em:executionPart urn:exec
+        em:executionNode urn:node1
+        em:status em:StatusCompleted
+        em:started 2012-01-11T12.13.14.160
+        em:completed 2012-01-11T12.13.14.250
+
+    urn:exec2
+        rdf:type em:EngineExecution
+        em:executionPart urn:exec
+        em:executionNode urn:node2
+        em:status StatusFailed
+        em:statusMessage "No NER model for language 'de' is available"
+        em:started 2012-01-11T12.13.14.253
+        em:completed 2012-01-11T12.13.14.289
+
+    urn:exec3
+        rdf:type em:EngineExecution
+        em:executionPart urn:exec
+        em:executionNode urn:node5
+        em:status StatusCompleted
+        em:started 2012-01-11T12.13.14.253
+        em:completed 2012-01-11T12.13.15.150
+
+The Execution Plan: (copy from the example provided in the ExecutionPlan 
section)
+    
+    urn:execPlan
+        rdf:type ep:ExecutionPlan
+        ep:hasExecutionNode urn:node1, urn:node2, urn:node3, urn:node4, 
urn:node5
+        ep:chain "demoChain"
+
+    urn:node1
+        rdf:type stanbol:ExecutionNode
+        ep:inExecutionPlan urn:execPlan
+        stanbol:engine langId
+
+    urn:node2
+        rdf:type ep:ExecutionNode
+        ep:inExecutionPlan urn:execPlan
+        ep:dependsOn urn:node1
+        ep:engine ner
+
+    urn:node3
+        rdf:type ep:ExecutionNode
+        ep:inExecutionPlan urn:execPlan
+        ep:dependsOn urn:node1
+        ep:engine dbpediaLinking
+
+    urn:node4
+        rdf:type ep:ExecutionNode
+        ep:inExecutionPlan urn:execPlan
+        ep:dependsOn urn:node1
+        ep:engine geonamesLinking
+
+    urn:node5
+        rdf:type ep:ExecutionNode
+        ep:inExecutionPlan urn:execPlan
+        ep:engine zemanta
+        ep:optional "true"^^xsd:boolean
+
+Note that both the Execution Metadata AND the Execution Plan need to be 
contained within the metadata of the ContentItem

svn commit: r1236242 - in /incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer: chains/enhancementchain.mdtext chains/executionplan.mdtext enhancementjobmanager.mdtext executionmetadata.mdtext

Reply via email to