Author: rwesten
Date: Thu Jan 26 15:30:10 2012
New Revision: 1236242
URL: http://svn.apache.org/viewvc?rev=1236242&view=rev
Log:
Added Documentation for EnhancementJobManager and ExecutionMetadata
Added:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/enhancementjobmanager.mdtext
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/executionmetadata.mdtext
Modified:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancementchain.mdtext
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/executionplan.mdtext
Modified:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancementchain.mdtext
URL:
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancementchain.mdtext?rev=1236242&r1=1236241&r2=1236242&view=diff
==============================================================================
---
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancementchain.mdtext
(original)
+++
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancementchain.mdtext
Thu Jan 26 15:30:10 2012
@@ -1,6 +1,6 @@
Title: Enhancement Chains
-An Enhancement Chain defines how Content parsed to the Stanbol Enhancer is
processed. More concrete it defines what engines and in what order are used to
process ContentItems. Chains are not responsible for the actual processing of
ContentItems. They provide the [ExecutionPlan](executionplan.html) to the
EnhancementJobManger that does the actual processing of the ContentItem.
+An Enhancement Chain defines how Content parsed to the Stanbol Enhancer is
processed. More concrete it defines what engines and in what order are used to
process ContentItems. Chains are not responsible for the actual processing of
ContentItems. They provide the [ExecutionPlan](executionplan.html) to the
[EnhancementJobManger](../enhancementjobmanager.html) that does the actual
processing of the ContentItem.
In the RESTful API enhancement chains can be accessed by there name under
@@ -35,7 +35,7 @@ When using the Java API Chains can be lo
MGraph enhancementResults = ci.getMetadata();
To enhance a ContentItem with the default chain the
"enhanceContent(ContentItem ci)" can be used.
-<
+
## Chain Interface
The Chain interface is very simplistic. It only defines three methods.
@@ -65,7 +65,7 @@ Because the configuration of a Chain mig
## Enhancement Chain Management
-This section describes how Enhancement Cahins are managed by the Stanbol
Enhancer and how they can be selected/accessed. It also describes how the
"default" Chain is determined.
+This section describes how Enhancement Chains are managed by the Stanbol
Enhancer and how they can be selected/accessed. It also describes how the
"default" Chain is determined.
For every Stanbol Enhancer a single Chain MUST BE present. If this is not the
case enhance request MUST throw a ChainException with an according error
message. However typically multiple EnhancementChains will be configured.
@@ -86,15 +86,15 @@ The default Chain is determined by the f
1. the Chain with the name "default". If more than one Chain is present with
that name, than the above rules for resolving name conflicts apply. If none
2. the Chain with the highest "service.ranking". If several have the same
ranking
-3. the Cahin with the lowest "service.id"
+3. the Chain with the lowest "service.id"
If no chain is active a ChainException with an according message MUST BE
thrown.
All Stanbol launchers are configured with the [Default
Chain](defaultchain.html) enabled. This registers itself with the name
"default" and the lowest possible service ranking - Integer.MIN_VALUE. This
default provides a Chain that considered all currently active
EnhancementEngines and sorts them based on there ordering information (see the
[Calculation of the Execution Plan based on the EnhancementEngine
Ordering](weightedchain.html#calculation_of_the_executionplan) for details).
-### [ChainManager interface](chainmanager.html)
+### ChainManager interface
-This is the management interface for EnhancementChains that can be used by
components to lookup chains based on there name. It also provides a getter for
the default chain. There is also OSGI ServiceTracker like implementation that
can be used to track only chains with specific names and to get even notified
on any change of such chains.
+The [ChainManager](chainmanager.html) is the management interface for
EnhancementChains that can be used by components to lookup chains based on
there name. It also provides a getter for the default chain. There is also OSGI
ServiceTracker like implementation that can be used to track only chains with
specific names and to get even notified on any change of such chains.
## Chain implementations
@@ -104,6 +104,6 @@ The following Chain implementations are
* __[ListChain](listchain.html)__: Implementation that creates the
ExecutionPlan by chaining the EnhancementEngines in the exact order as
specified by the parsed list. This Chain does not support parallel execution of
engines.
* __[WeightedChain](weightedchain.html)__: This Chain implementation takes a
List of Engines names as input and uses the
"org.apache.stanbol.enhancer.engine.order " metadata provided by such engines
to calculate the ExecutionGraph.
* __[GraphChain](graphchain.html)__: This Chain implementation is based on a
ExecutionGraph parsed os configuration.
-* __SingleEngineChain__: An Adapter that allows to execute a single
EnhancementEngine within a Chain. This types of Chains will not be registered
as OSGI service. Instances will be created on request for single
EnhancementEngines and directly parsed to the EnhancementJobManager
implementation.
+* __SingleEngineChain__: An Adapter that allows to execute a single
EnhancementEngine within a Chain. This types of Chains will not be registered
as OSGI service. Instances will be created on request for single
EnhancementEngines and directly parsed to the
[EnhancementJobManager](../enhancementjobmanager.html) implementation.
Modified:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/executionplan.mdtext
URL:
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/executionplan.mdtext?rev=1236242&r1=1236241&r2=1236242&view=diff
==============================================================================
---
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/executionplan.mdtext
(original)
+++
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/executionplan.mdtext
Thu Jan 26 15:30:10 2012
@@ -1,6 +1,6 @@
Title: ExecutionPlan
-The ExecutionPlan is represented as an RDF graph following the ExecutionPlan
Ontology. It needs to be provided by the [Enhancement
Chain](enhancementchain.html) and is used by the EnhancementJobManager to
enhance ContentItems.
+The ExecutionPlan is represented as an RDF graph following the ExecutionPlan
Ontology. It needs to be provided by the [Enhancement
Chain](enhancementchain.html) and is used by the
[EnhancementJobManager](../enhancementjobmanager.html) to enhance ContentItems
and to write the [ExecutionMetadata](../executionmetadata.html).
## ExecutionPlan Ontology
Added:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/enhancementjobmanager.mdtext
URL:
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/enhancementjobmanager.mdtext?rev=1236242&view=auto
==============================================================================
---
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/enhancementjobmanager.mdtext
(added)
+++
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/enhancementjobmanager.mdtext
Thu Jan 26 15:30:10 2012
@@ -0,0 +1,96 @@
+Title: EnhancementJobManager
+
+
+The EnhancementJobManager is component responsible for the execution of the
[ExecutionPlan](../chains/executionplan.html) as provided by the [Enhancement
Chain](../chains/enhancementchain.html) on the ContentItem.
+
+
+## EnhancementJobManager interface
+
+The interface of the EnhancementJobManager is very simple:
+
+ /** Enhances the content item by using the default Chain */
+ + enhanceContent(ContentItem ci)
+ /** Enhances the content item by using the parsed Chain */
+ + enhanceContent(ContentItem ci, Chain chain)
+
+Note that the parsed ContentItem will be changed during the enhancement
process. EnhancementEngines will add extracted knowledge to the metadata of the
content item. Also additional content parts may be added to the content item.
+
+
+## Enhancement Process
+
+While the [ExecutionPlan](../chains/executionplan.html) defines what
EnhancementEgnies are used and how they depend on each the
EnhancementJobManager is responsible for the actual execution of the
enhancement process based on this plan. This section provides detailed
information about requirements and expectations that MUST BE considered.
+
+The EnhancementJobManager is also responsible to create and update the
[ExecutionMetadata](executionmetadata.html) in the metadata of the processed
ContentItem.
+
+### Retrieving the ExecutionPlan
+
+The [ExecutionPlan](../chains/executionplan.html) is provided by the Chain in
a final graph that is guaranteed to be not changed. However because the
configuration of a Chain might be change at any time the EnhancementJobManager
MUST retrieve the execution plan only once and used it during the whole
enhancement process.
+
+Before the start of the enhancement process the EnhancementJobManager needs
first to initialize the [ExecutionMetadata](executionmetadata.html) for the
ContentItem. This includes
+
+1. copying the execution plan as returned by the Chain to the metadata of the
content item
+2. create an 'em:ChainExecutin' instance and set the 'em:enhances' property to
the URI of the ContentItem
+3. creating 'em:EngineExecution' instances for all 'ep:ExecutionNodes' and set
the 'em:status' of those to 'em:StatusSheduled'. Also define such instances as
'em:executionPart' to the chain execution and link them to the according
execution node of the execution plan.
+
+See the documentation of the [ExecutionMetadata](executionmetadata.html) for
more information.
+
+### Engine Execution
+
+The ExecutionPlan provides the necessary information what engines can be
executed at any given state. The following code shows how to determine
executable engines.
+This code snippet assumes to be called after the execution of an
EnhancementEngine has completed. Note that in a multi threaded environment
access to the list of executed and running engines need to be synchronized.
+
+ Collection<NonLiteral> executed; //already executed Engines
+ Collection<NonLiteral> running; //currently running Engines
+
+ Collection<NonLiteral> next = ExecutionPlanUtils.getExecuteable(plan,
executed);
+ for(NonLiteral node : next){
+ if(!running.contains(node)){
+ String engineName =
EnhancementEngineHelper.getString(executionPlan,node, EX_ENGINE));
+ EnhancementEngine engine = tracker.getEngine(engineName);
+ if(engine != null){
+ // execute engine
+ } else {
+ //check if optional and throw error if not
+ }
+ } // else already running -> ignore
+ }
+
+Before executing an EnhancementEngine the EnhancementJobManager needs to check
if and how the engine can enhance a content item. This is indicated by the
integer returned by the "canEnhance(ContentItem ci)" method:
+
+* __CANNOT_ENHANCE__: Indicates that this engines can not process the parsed
content item. In this case the EnhancementJobManager needs to skip this engine
and mark the EngineExectuion as skipt with a status message that the
EnhancementEngine was unable to process the content item. If this engine is
marked as optional the enhancement process can continue if not, than the
execution MUST be marked as failed and an according Exception needs to be
thrown.
+* __ENHANCE_SYNCHRONOUS__: Indicates that the engines needs exclusive access
to the parsed content item. The EnhancementJobManager needs to ensure that in
some way. Typically by calling the "computeEnhancement(ContentItem ci)" method
within an write lock.
+* __ENHANCE_ASYNC__: Indicates that this engine supports asynchronous
execution and takes itself care to acquire read and write locks on the parsed
content item. However this does not require the JobManager to execute the
engine asynchronously.
+
+If the execution of an EnhancementEngine completes the JobManager needs to set
the state of the execution to completed and update the other metadata
accordingly.
+
+If a call to "computeEnhancement(ContentItem ci)" results in an Exception the
EnhancementJobManager must mark the execution of the engine as failed with a
decryption of the occurred exception. If the the execution of the affected
engine was optional the enhancement process is continued. Otherwise the
enhancement process needs to be stopped and the Error needs to rethrown by the
"enhanceContent(..)" method.
+
+### Multi Threaded enhancement processes
+
+In case the EnhancementJobManager supports to simultaneously call
EnhancementEngines for the same content item in multiple threads it is
important to correctly use the ReadWriteLock as provided by the
ContentItem.getLock() method.
+
+There are manny good examples on how to correctly use
"java.util.concurrent.ReadWriteLock" available on the web.
+
+### Finalizing the EnhancementProcess
+
+After the execution is completed (successfully or failed) the
EnhancementJobManager need to ensure that the 'em:status' and the
'em:completed' of the 'em:ChainExecution' instance are set. If the execution
failed also the 'em:statusMessage' should be available and contain a message
that describes the problem.
+
+
+## EnhancementJobManager implementations
+
+EnhancementJobManager implementations need to register itself as OSGI
services. By default the Stanbol Enhancer will use the implementation with the
highest service ranking. The service ranking can be set by providing a
configuration defining an integer value for the property "service.ranking"
+
+
+### EventJabManager
+
+This implementation is provided by the
"org.apache.stanbol.enhancer.jobmanager.event" module and is currently used as
default. It registers itself (by default) with a service ranking of '0'.
+
+This implementation supports an asynchronous enhancement process by using the
["org.osgi.serivce.event"](http://www.osgi.org/javadoc/r4v42/org/osgi/service/event/package-summary.html)
framework.
+
+### WeightedJobManager
+
+This JobManager was used as default before the introduction of
EnhancementChains. It does not support EnhancementChains and will enhance
parsed ContentItems by calling all currently active EnhancementEngines in a
sequential manner. It does also not have support for EnhancementMetadata.
+
+This implementation is provided by the
"org.apache.stanbol.enhancer.jobmanager.weightedjobmanager" module and is no
longer included within the Apache Stanbol launchers. This JobManager registers
itself with a service ranking of "-1000". Users that want to use this job
manager need to manually install this bundle and either deactivate other
EnhancementJobManager implementation or reconfigure the service ranking of this
one to an value > 0.
+
+
Added:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/executionmetadata.mdtext
URL:
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/executionmetadata.mdtext?rev=1236242&view=auto
==============================================================================
---
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/executionmetadata.mdtext
(added)
+++
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/executionmetadata.mdtext
Thu Jan 26 15:30:10 2012
@@ -0,0 +1,121 @@
+Title: Execution Metadata
+
+The execution metadata are added by the
[EnhancementJobManager](enhancementjobmanager.html) to the metadata of the
ContentItem. This metadata provide information about the execution of the
[ExecutionPlan](chains/executionplan.html) provided by the
[Chain](chains/enhancementchain.html) and can be used by clients to get
detailed information about the enhancement process of a content item.
+
+In the case of asynchronous calls to the enhancers RESTful interface (requests
that immediately return and do not wait for the enhancement process to
complete) this information might also be useful to provide information about
the current state of the enhancement process.
+
+### Exection Metadata Ontology
+
+The RDFS schema used for the execution plan is defined as follows.
+
+ * Namespace: em :
http://stanbol.apache.org/ontology/enhancer/executionMetadata#
+ * __em:Execution__ : Super class for all Executions
+ * __em:executionPart__ (domain:Execution, range: em:ChainExecution):
Defines that this execution was part of the execution of a chain
+ * __em:status__(domain: em:Execution; range: em:ExecutionStatus): The
status of an Execution (used for both em:EngineExection and em:ChainExecution
+ * __em:started__ (domain: em:Execution; range: xsd:dateTime): Marks the
start the the execution
+ * __em:completed__ (domain: em:Execution; range: xsd:dateTime): Marks the
completion of the execution
+ * __em:statusMessage__ (domain: em:Excecution; range: xsd:string): A
natural language description providing further information about the status of
this execution. Typically used to parse error messages if the execution fails
(em:status is set to em:StatusFailed).
+ * __em:ChainExecution__ : Class used to describe the execution of an
enhancement Chain.
+ * __em:defualtChain__ (domain: em:ChainExecution; range: xsd:boolean): If
the executed Chain is currently the default Chain of the Stanbol Enhancer.
+ * __em:executionPlan__ (domain:ChainExecution; range: ep:ExecutionPlan):
Links to the execution plan as provided by the chain.
+ * __em:enhances__(domain: em:ChainExecution; range: rdf:Resource) : links
the em:ChainExection with the URI of the processed content item. The range
needs to be updated as soon as the Stanbol Enhancement Structure is defined.
+ * __em:enhancedBy__ (domain: rdf:Resource; range: em:ChainExecution) :
links the URI of the content item with the metadata about the enhancement
process. The range needs to be updated as soon as the Stanbol Enhancement
Structure is defined.
+ * __em:EngineExecution__ : Class used to describe the execution of an
EnhancementEngine.
+ * __em:executionNode__ (domain: em:EngineExecution; range:
ep:ExecutionNode): The node within the ExecutionPlan
+ * __em:ExecutionStatus__ : Class describing the status of an EngineExecution
+ * __em:StatusSheduled__ : ExecutionStatis instance that described that an
execution is scheduled but has not yet started
+ * __em:StatusInProgress__ : ExecutuinStatus instance that describes that
the execution of the linked EngineExecution is in progress
+ * __em:StatusCompleted__ : ExecutionStatus instance describing that the
execution has already completed successfully
+ * __em:StatusFailed__ : ExecutionStatus indicating that the execution has
failed. Typically a em:statusMessage describing the reason for the failed
execution is provided for em:Executions with that state.
+ * __em:StatusSkiped__ : ExecutionStatus indicating that the execution if
an sp:ExecutionNode was skipped. This is only allowed for execution nodes that
are marked as optional. Typically also a em:statusMessage with the reason
should be provided.
+
+
+### Example:
+
+The following example uses the same example as used within the
[ExecutionPlan](chains/executionplan.html) section. To make the relations
between the execution metadata and the execution plan easier to see the triples
of the execution plan are included at the end of this example.
+
+This example describes the following situation:
+
+* the execution of the content item with the URI 'urn:contentItem1' with the
default chain
+* the default chain is represented by a Chain with the name "demoChain" the
ExecutionPlan has the URI 'urn:execPlan'
+* the successful execution of the 'langid' engine (execution: 'urn:exec1',
node: 'urn:node1')
+* the failed execution of the 'ner' engine (execution: 'urn:exec2', node:
'urn:node2'): As reason for the failure a message is provided that the NER
model for the language 'de' is not available
+* the successful execution of the 'zemanta' engine (execution: 'urn:exec3',
node: 'urn:node5'): This engine was started in parallel to the 'ner' egine -
therefore before the chain failed.
+* There is no execution of the dbpediaLinking (node: '') and geonamesLinking
(node: '') engines because the chain failed before such engines where
scheduled. This assumes the the EnhancementJobManagers does only add
em:EngineExecution resources when it starts the processing of an
ep:ExecutionNode defined in the execution plan. However EnhancementJobManager
can also create ep:Execution resources for all execution nodes. In that case
there would be also em:EngineExecution resources for the dbpediaLinking and
geonamesLinking engines with the em:status set to 'em:StatusSheduled'.
+
+The RDF graph with the Execution Metadata:
+
+ urn:exec
+ rdf:type em:ChainExecution
+ em:executionPlan urn:execPlan
+ em:enhances urn:contentItem1
+ em:defaultChain "true"
+ em:started 2012-01-11T12.13.14.156
+ em:completed 2012-01-11T12.13.15.157
+ em:status em:StatusFailed
+ em:statusMessage "Unable to execute EnhancementEngine 'new' \
+ (Message: No NER model for language 'de' is available)."
+ em:executionPart urn:exec1, urn:exec2, urn:exec3, urn:exec4, urn:exec5
+
+ urn:exec1
+ rdf:type em:EngineExecution
+ em:executionPart urn:exec
+ em:executionNode urn:node1
+ em:status em:StatusCompleted
+ em:started 2012-01-11T12.13.14.160
+ em:completed 2012-01-11T12.13.14.250
+
+ urn:exec2
+ rdf:type em:EngineExecution
+ em:executionPart urn:exec
+ em:executionNode urn:node2
+ em:status StatusFailed
+ em:statusMessage "No NER model for language 'de' is available"
+ em:started 2012-01-11T12.13.14.253
+ em:completed 2012-01-11T12.13.14.289
+
+ urn:exec3
+ rdf:type em:EngineExecution
+ em:executionPart urn:exec
+ em:executionNode urn:node5
+ em:status StatusCompleted
+ em:started 2012-01-11T12.13.14.253
+ em:completed 2012-01-11T12.13.15.150
+
+The Execution Plan: (copy from the example provided in the ExecutionPlan
section)
+
+ urn:execPlan
+ rdf:type ep:ExecutionPlan
+ ep:hasExecutionNode urn:node1, urn:node2, urn:node3, urn:node4,
urn:node5
+ ep:chain "demoChain"
+
+ urn:node1
+ rdf:type stanbol:ExecutionNode
+ ep:inExecutionPlan urn:execPlan
+ stanbol:engine langId
+
+ urn:node2
+ rdf:type ep:ExecutionNode
+ ep:inExecutionPlan urn:execPlan
+ ep:dependsOn urn:node1
+ ep:engine ner
+
+ urn:node3
+ rdf:type ep:ExecutionNode
+ ep:inExecutionPlan urn:execPlan
+ ep:dependsOn urn:node1
+ ep:engine dbpediaLinking
+
+ urn:node4
+ rdf:type ep:ExecutionNode
+ ep:inExecutionPlan urn:execPlan
+ ep:dependsOn urn:node1
+ ep:engine geonamesLinking
+
+ urn:node5
+ rdf:type ep:ExecutionNode
+ ep:inExecutionPlan urn:execPlan
+ ep:engine zemanta
+ ep:optional "true"^^xsd:boolean
+
+Note that both the Execution Metadata AND the Execution Plan need to be
contained within the metadata of the ContentItem