STANBOL-414-specification.html

buildbot Mon, 09 Jan 2012 02:12:02 -0800

Author: buildbot
Date: Mon Jan  9 10:11:27 2012
New Revision: 802163

Log:
Staging update by buildbot


Added:
    
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/STANBOL-414-specification.html

Added: 
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/STANBOL-414-specification.html
==============================================================================
--- 
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/STANBOL-414-specification.html
 (added)
+++ 
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/STANBOL-414-specification.html
 Mon Jan  9 10:11:27 2012
@@ -0,0 +1,274 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 
"http://www.w3.org/TR/html4/loose.dtd";>
+<html>
+<head>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE- 2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+  <link href="/stanbol/css/stanbol.css" rel="stylesheet" type="text/css">
+  <title>Apache Stanbol - </title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <link rel="icon" type="image/png" 
href="/stanbol/images/stanbol-logo/stanbol-favicon.png"/>
+</head>
+
+<body>
+  <div id="navigation"> 
+  <img alt="Apache Stanbol" width="220" height="101" 
src="/stanbol/images/stanbol-logo/stanbol-2010-12-14.png"/>
+  <h1 id="stanbol_links">Stanbol links</h1>
+<ul>
+<li><a href="/stanbol/index.html">Home</a></li>
+<li><a href="/stanbol/docs/trunk/tutorial.html">Tutorial</a></li>
+<li><a href="/stanbol/docs/trunk/downloads.html">Downloads</a></li>
+<li><a href="/stanbol/docs/trunk/building.html">Building from Source</a></li>
+<li><a href="/stanbol/docs/trunk/">Documentation</a></li>
+<li><a href="/stanbol/docs/trunk/mailinglists.html">Mailing Lists</a></li>
+<li><a href="https://issues.apache.org/jira/browse/STANBOL";>Issue 
Tracker</a></li>
+<li><a href="/stanbol/team.html">Project Team</a></li>
+</ul>
+<h1 id="asf_links">ASF links</h1>
+<ul>
+<li><a href="http://www.apache.org";>Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0";>License</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html";>Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html";>Become a 
Sponsor</a></li>
+<li><a href="http://www.apache.org/security/";>Security</a></li>
+</ul>
+  </div>
+  
+  <div id="content">
+    <h1 class="title"></h1>
+    <p>This Documents provides the specification of the Java API for the 
extensions to the RESTful services to the Stanbol Enhancer as mentioned by <a 
href="https://issues.apache.org/jira/browse/STANBOL-414";>STANBOL-414</a>.</p>
+<h2 id="enhancement-chains">Enhancement Chains</h2>
+<p>A Chain represents a configuration that defines what engines and in what 
order are used to process ContentItems. Chains are registered as OSGI services 
and identified by the "stanbol.enhancer.chain.name" property.</p>
+<h3 id="chain">Chain</h3>
+<p>The Chain provides it's configuration in form of an RDF graph.</p>
+<div class="codehilite"><pre><span class="sr">/** Getter for the execution 
plan */</span>
+<span class="o">+</span> <span class="n">getExecutionPlan</span><span 
class="p">()</span> <span class="p">:</span> <span class="n">Graph</span>
+<span class="sr">/** Getter for the name of the Engines referenced by this 
Chain */</span>
+<span class="o">+</span> <span class="n">getEngines</span><span 
class="p">()</span> <span class="p">:</span> <span class="n">Set</span><span 
class="sr">&lt;String&gt;</span>
+</pre></div>
+
+
+<p>The getEngines method may return the list of engines name in any order. It 
is mainly intended for situations where only the engines used by a chain need 
to be known (e.g. visualized) but the actual chain needs not to be executed.</p>
+<p>The returned Graph holding the execution plan MUST BE read-only AND final. 
Meaning that a change in the configuration of a Chain MUST NOT change the graph 
returned by calls to the getExecutionPlan method.</p>
+<p>Because the configuration of a Chain might change at any time JobManager 
MUST retrieve the Graph holding the execution plan before they start the actual 
processing of the ContentItem. This plan MUST BE used for the whole enhancement 
process. Later changes to the configuration MUST NOT be reflected in the 
enhancement of a ContentItem.</p>
+<h3 id="chainmanager">ChainManager</h3>
+<p>The Chainmanager is a service that tracks all Chains registered as a 
service in the OSGI Environment of the Stanbol Enhancer. It provides an simple 
API to retrieve a chain based on its name</p>
+<div class="codehilite"><pre><span class="sr">/** Getter for the Chain for a 
given name */</span>
+<span class="o">+</span> <span class="n">getChain</span><span 
class="p">(</span><span class="n">Stirng</span> <span 
class="n">name</span><span class="p">)</span> <span class="p">:</span> <span 
class="n">Chain</span>
+<span class="sr">/** Getter for all Chains for a name sorted by service 
ranking */</span>
+<span class="o">+</span> <span class="n">getChains</span><span 
class="p">(</span><span class="n">String</span> <span 
class="n">name</span><span class="p">)</span> <span class="p">:</span> <span 
class="n">List</span><span class="sr">&lt;Chain&gt;</span>
+<span class="sr">/** Checks if there is a chain for the given name */</span>
+<span class="o">+</span> <span class="n">isChain</span><span 
class="p">(</span><span class="n">String</span> <span 
class="n">name</span><span class="p">)</span> <span class="p">:</span> <span 
class="n">boolean</span>
+<span class="sr">/** Getter for the default chain */</span>
+<span class="o">+</span> <span class="n">getDefault</span><span 
class="p">()</span> <span class="p">:</span> <span class="n">Chain</span>
+</pre></div>
+
+
+<p>The default Chain is used if no chain is specified in an request (e.g. when 
calling the /engines endpoint). The default Chain is the chain with the highest 
service ranking.</p>
+<p>ALTERNATIVE: The default Chain is the Chain with 
"stanbol.enhancer.chain.name=default" and the highest service ranking. If no 
Chain with the name "default" exists the Chain with the highest service ranking 
is assumed to be the default chain.</p>
+<p>The default configuration of Stanbol MUST provide a Chain instance with the 
name "stanbol.enhancer.chain.name=default" an service ranking of 
Integer.MIN_VALUE that includes all currently active 
+enhancement engines.</p>
+<h3 id="executionplan">ExecutionPlan</h3>
+<p>The execution plan need to be created by the chain based on it's current 
configuration. This plan is read only and MUST NOT be changed if the 
configuration of the Chain changes. This means that the Chain MUST create a new 
Graph instance if the execution plan changes as a result of a change in the 
configuration. It MUST NOT change any execution plan parsed to other components 
by the getExecutionPlan() method.</p>
+<p>The RDFS schema used for the execution plan is defined as follows.</p>
+<ul>
+<li>Namespace: ep : 
http://stanbol.apache.org/ontology/enhancer/executionplan#</li>
+<li><strong>ep:ExecutionNode</strong> : Class used for all Nodes representing 
the execution of an Enhancement Engine.</li>
+<li><strong>ep:engine</strong> (domain: ep:ExecutionNode; range: xsd:string): 
The property used to link to the Enhancement Engine by the name of the 
engine.</li>
+<li><strong>ep:dependsOn</strong> (domain: ep:ExecutionNode; range: 
ep:ExecutionNode) Defines that the execution of this node depends on the 
completion of the referenced one.</li>
+<li><strong>ep:optional</strong> (domain: ep:ExecutionNode; range: 
xsd:boolean) Can be used to specify that the execution of this 
EnhancementEngine is optional. If this property is set to TRUE an engine will 
be marked as executed even if it execution was not possible (e.g. because an 
engine with this name was not active) or the execution failed (e.g. because of 
the Exception). </li>
+</ul>
+<h4 id="example">Example:</h4>
+<p>This example shows an ExecutionPlan with three nodes for the "langId", 
"ner", "dbpediaLinking" "geonamesLinking" and "zemanta" engine. Note that this 
names refer to actual EnhancementEngine Services registered with the current 
OSGI Environment.</p>
+<p>This example assumes that</p>
+<ul>
+<li>"langId" is the singleton instance of LangIdEnhancementEngine</li>
+<li>"ner" is the default instance of the 
NamedEntityExtractionEnhancementEngine engine</li>
+<li>"dbpediaLinking" is an instance of the NamedEntityTaggingEngine configured 
to use the dbpedia.org ReferencedSite of the Entityhub</li>
+<li>"geonamesLinking" is an instance of the NamedEntityTaggingEngine 
configured to use the geonames.org ReferencedSite</li>
+<li>"zemanta" is the singleton instance of the ZemantaEnhancementEngine</li>
+</ul>
+<p>The RDF graph of such a chain would look:</p>
+<div class="codehilite"><pre><span class="err">urn:node1</span>
+    <span class="err">rdf:type</span> <span 
class="err">stanbol:ExecutionNode</span>
+    <span class="err">stanbol:engine</span> <span class="err">langId</span>
+
+<span class="err">urn:node2</span>
+    <span class="err">rdf:type</span> <span class="err">ep:ExecutionNode</span>
+    <span class="err">ep:dependsOn</span> <span class="err">urn:node1</span>
+    <span class="err">ep:engine</span> <span class="err">ner</span>
+
+<span class="err">urn:node3</span>
+    <span class="err">rdf:type</span> <span class="err">ep:ExecutionNode</span>
+    <span class="err">ep:dependsOn</span> <span class="err">urn:node1</span>
+    <span class="err">ep:engine</span> <span class="err">dbpediaLinking</span>
+
+<span class="err">urn:node4</span>
+    <span class="err">rdf:type</span> <span class="err">ep:ExecutionNode</span>
+    <span class="err">ep:dependsOn</span> <span class="err">urn:node1</span>
+    <span class="err">ep:engine</span> <span class="err">geonamesLinking</span>
+
+<span class="err">urn:node5</span>
+    <span class="err">rdf:type</span> <span class="err">ep:ExecutionNode</span>
+    <span class="err">ep:engine</span> <span class="err">zemanta</span>
+    <span class="err">ep:optional</span> <span 
class="err">&quot;true&quot;^^xsd:boolean</span>
+</pre></div>
+
+
+<p>This plan defines that the "langId" and the "zemanta" engine do not depend 
on anything and can therefore be executed from the start (even in parallel if 
the JobManager execution this chains supports this). The execution of the "ner" 
engine depends on the extraction of the language and the execution of the 
entity linking to dbpedia and geonames depends on the "ner" engine. Note that 
the execution of the "dbpediaLinking" and "geonamesLinking" could be also 
processed in parallel.</p>
+<h4 id="executionplan_utility">ExecutionPlan Utility:</h4>
+<p>The Enhancer MUST also define an Utility that provides the following 
utility</p>
+<div class="codehilite"><pre><span class="sr">/** Getter for the list of 
executable ep:ExecutionNodes */</span>
+<span class="o">+</span> <span class="n">getExecuteable</span><span 
class="p">(</span><span class="n">Graph</span> <span 
class="n">executionPlan</span><span class="p">,</span> <span 
class="n">Set</span><span class="sr">&lt;NonLiteral&gt;</span> <span 
class="n">completed</span><span class="p">)</span> <span class="p">:</span> 
<span class="n">Collection</span><span class="sr">&lt;NonLiteral&gt;</span>
+</pre></div>
+
+
+<p>This method takes an execution plan and the list of already executed nodes 
as input and return the list of ExecutionNodes that can be executed next. The 
existing utility methods within the EnhancementEngineHelper can be used to 
retrieve further information from the ex:ExecutionNode's returned by this 
method.</p>
+<p>Typically code using this utility will look like this (pseudo code)</p>
+<div class="codehilite"><pre><span class="n">Graph</span> <span 
class="n">executionPlan</span> <span class="o">=</span> <span 
class="n">chain</span><span class="o">.</span><span 
class="n">getExecuctionPlan</span><span class="p">();</span>
+<span class="n">Map</span><span class="o">&lt;</span><span 
class="n">String</span><span class="p">,</span> <span 
class="n">EnhancementEngine</span><span class="o">&gt;</span> <span 
class="n">engines</span> <span class="o">=</span> <span 
class="n">enhancementEngineManager</span><span class="o">.</span><span 
class="n">getActiveEngines</span><span class="p">(</span><span 
class="n">chain</span><span class="p">);</span>
+<span class="n">Collection</span><span class="sr">&lt;NonLiteral&gt;</span> 
<span class="n">executed</span> <span class="o">=</span> <span 
class="k">new</span> <span class="n">HashSet</span><span 
class="sr">&lt;NonLiteral&gt;</span><span class="p">();</span>
+<span class="n">Collection</span><span class="sr">&lt;NonLiteral&gt;</span> 
<span class="k">next</span><span class="p">;</span>
+<span class="k">while</span><span class="p">(</span><span 
class="o">!</span><span class="p">(</span><span class="k">next</span> <span 
class="o">=</span> <span class="n">ExecutionPlanUtils</span><span 
class="o">.</span><span class="n">getExecuteable</span><span 
class="p">(</span><span class="n">plan</span><span class="p">,</span> <span 
class="n">executed</span><span class="p">))</span><span class="o">.</span><span 
class="n">isEmpty</span><span class="p">()){</span>
+    <span class="k">for</span><span class="p">(</span><span 
class="n">NonLiteral</span> <span class="n">node</span> <span 
class="p">:</span> <span class="k">next</span><span class="p">){</span>
+        <span class="n">EnhancementEngine</span> <span class="n">engine</span> 
<span class="o">=</span> <span class="n">engines</span><span 
class="o">.</span><span class="n">get</span><span class="p">(</span>
+            <span class="n">EnhancementEngineHelper</span><span 
class="o">.</span><span class="n">getString</span><span class="p">(</span><span 
class="n">executionPlan</span><span class="p">,</span><span 
class="n">node</span><span class="p">,</span> <span 
class="n">EX_ENGINE</span><span class="p">));</span>
+        <span class="n">Boolean</span> <span class="n">optional</span> <span 
class="o">=</span> <span class="n">EnhancementEngineHelper</span><span 
class="o">.</span><span class="n">get</span><span class="p">(</span>
+            <span class="n">executionPlan</span><span class="p">,</span><span 
class="n">node</span><span class="p">,</span><span 
class="n">EX_OPTIONAL</span><span class="p">,</span><span 
class="n">Boolean</span><span class="o">.</span><span 
class="n">class</span><span class="p">,</span><span 
class="n">literalFactory</span><span class="p">);</span>
+        <span class="sr">/* Execute the Engine */</span>
+        <span class="n">completed</span><span class="o">.</span><span 
class="n">add</span><span class="p">(</span><span class="n">node</span><span 
class="p">);</span>
+    <span class="p">}</span>
+<span class="p">}</span>
+</pre></div>
+
+
+<h2 id="chain-implementations">Chain implementations</h2>
+<h3 id="weightedchain">WeightedChain</h3>
+<p>This Chain implementation takes a List of Engines names as input and uses 
the "org.apache.stanbol.enhancer.engine.order " metadata provided by such 
engines to calculate the ExecutionGraph.</p>
+<p>Similar the current WeightedJobManager implementation Engines would be 
dependent to each other based on decreasing order values. Engines with the same 
order value would could be executed in parallel.</p>
+<p>This implementation is targeted for easy configuration - just a list of the 
engine names contained within a chain - but has limited possibilities to 
control the execution order within an chain. However it is expected that it 
provides enough flexibility for most of the usage scenarios</p>
+<h3 id="graphchain">GraphChain</h3>
+<p>This Chain implementation is based on a ExecutionGraph parsed os 
configuration.</p>
+<p>TODO: define how users con provide such serialized graphs.</p>
+<p>NOTE: We could also provide the possibility that the execution graph is 
parsed as an additional parameter to a specific request to the enhancer.</p>
+<h3 id="defaultchain">DefaultChain</h3>
+<p>Implementation that keeps track of all currently active EnhancementEngine 
and registers itself as a Chain service with the 
"stanbol.enhancer.chain.name=default" an service ranking of 
Integer.MIN_VALUE.</p>
+<p>This will provide a Chain returned by ChainManager.getDefault() that will 
result in the same enhancement process as Stanbol used before the addition of 
Chains.</p>
+<p>Note that users can change the default chain by either stopping this 
component of adding an other Chain with "stanbol.enhancer.chain.name=default" 
and an higher service ranking. </p>
+<h3 id="singleenginechain">SingleEngineChain</h3>
+<p>This is basically an Adapter that allows to execute a single 
EnhancementEngine within a Chain. This types of Chains will not be registered 
as OSGI service. Instances will be created on request for single 
EnhancementEngines and directly parsed to the EnhancementJobManager 
implementation.</p>
+<p>Note that pre-existing metadata might still be parsed within a multipart 
content item as defined by STANBOL-414.</p>
+<h2 id="enhancement-engines">Enhancement Engines</h2>
+<p>This sections gives an overview about changes to the Java API for 
EnhancementEngines and also defines the new EnhancementEngineManager 
service.</p>
+<h3 id="enhancementengine">EnhancementEngine</h3>
+<p>With the extension to the Stanbol Enhancer engines will provide additional 
metadata.</p>
+<ul>
+<li><strong>Name:</strong> Defined by the value of the property 
"stanbol.enhancer.engine.name" it will be used to access Engines on the Stanbol 
RESTful interface</li>
+<li><strong>Service Ranking:</strong> The service ranking property defined by 
OSGI will be used to decide which engine to use in case several active 
EnhancementEngines do use the same name. In such cases only the Engine with the 
highest ranking will be used to enhance ContentItems.</li>
+<li><strong>Configuration:</strong> Each EnhacementEngien MAY provide an RDF 
graph with its configuration. This graph will be returned on GET request on the 
URL of the EnhancementEngine. If no configuration is known for the engine this 
MUST at least return a single triple with the name for the engine.</li>
+</ul>
+<p><em>TODO:</em> To correctly construct this graph the Engine needs to know 
this URL. This could e.g. be provided by some OSGI environment parameter set by 
the JerseyApplication. As an alternative we could also parse this URI as an 
parameter to the getEngineConfig method.</p>
+<p>This changes will result in the following adapted interface for Enhancement 
Engines.</p>
+<div class="codehilite"><pre><span class="sr">/** Getter for the value of the 
&quot;stanbol.enhancer.engine.name&quot; property */</span>
+<span class="o">+</span> <span class="n">getName</span><span 
class="p">()</span> <span class="p">:</span> <span class="n">String</span>
+<span class="sr">/** Getter for the service ranking of this engine **/</span>
+<span class="o">+</span> <span class="n">getRanking</span><span 
class="p">()</span> <span class="p">:</span> <span class="nb">int</span>
+<span class="sr">/** The configuration of the Engine as RDF Graph or NULL. 
**/</span>
+<span class="o">+</span> <span class="n">getEngineConfig</span><span 
class="p">()</span> <span class="p">:</span> <span class="n">Graph</span>
+<span class="o">+</span> <span class="n">canEnhance</span><span 
class="p">(</span><span class="n">ContentItem</span> <span 
class="n">ci</span><span class="p">)</span> <span class="p">:</span> <span 
class="nb">int</span>
+<span class="o">+</span> <span class="n">computeEnhacements</span><span 
class="p">(</span><span class="n">ContentItem</span> <span 
class="n">ci</span><span class="p">)</span>
+</pre></div>
+
+
+<h3 id="enhancementenginemanager">EnhancementEngineManager</h3>
+<p>New Utility that keeps track of all active EnhancementEngines and supports 
lookup for Enhancement Engines based on the "stanbol.enhancer.engine.name" 
property.</p>
+<div class="codehilite"><pre><span class="o">+</span> <span 
class="n">getEngine</span><span class="p">(</span><span class="n">String</span> 
<span class="n">name</span><span class="p">)</span> <span class="p">:</span> 
<span class="n">EnhancementEngine</span>
+<span class="o">+</span> <span class="n">getEngines</span><span 
class="p">(</span><span class="n">String</span> <span 
class="n">name</span><span class="p">)</span> <span class="p">:</span> <span 
class="n">List</span><span class="sr">&lt;EnhancementEngine&gt;</span>
+<span class="o">+</span> <span class="n">isEngine</span><span 
class="p">(</span><span class="n">String</span> <span 
class="n">name</span><span class="p">)</span> <span class="p">:</span> <span 
class="n">boolean</span>
+<span class="o">+</span> <span class="n">getActiveEngines</span><span 
class="p">(</span><span class="n">Chain</span> <span 
class="n">chain</span><span class="p">)</span> <span class="p">:</span> <span 
class="n">Map</span><span class="sr">&lt;String,EnhancementEngine&gt;</span>
+</pre></div>
+
+
+<h2 id="enhancement-process">Enhancement Process</h2>
+<p>This section describes canines to the Enhancement Process by the addition 
of the Chains. It also provides a specification of how EnhancementEngines and 
EnhancementJobManager implementations need to take care to allow asynchronous 
and in parallel execution of multiple EnhancementEngines for the same 
ContentItem. </p>
+<p>Note that Work on asynchronous enhancement process is covered by <a 
href="https://issues.apache.org/jira/browse/STANBOL-46";>STANBOL-46</a></p>
+<h3 id="enhancementjobmanager">EnhancementJobManager</h3>
+<p>This interface of the EnhancementJobManager will change due to the addition 
of Chains and in future only contain a single Method allowing to enhance a 
ContentItem by using the execution plan provided by the parsed Chain.</p>
+<div class="codehilite"><pre><span class="o">+</span> <span 
class="n">enhanceContent</span><span class="p">(</span><span 
class="n">ContentItem</span> <span class="n">ci</span><span class="p">,</span> 
<span class="n">Chain</span> <span class="n">chain</span><span 
class="p">)</span>
+</pre></div>
+
+
+<p>Information about the Enhancement Engines are now available by</p>
+<ul>
+<li><em>Chain#getEngines():</em> This returns the names of all Engines 
referenced by a Chain</li>
+<li><em>EnhancemetnEngineManager#getActuveEngines(Chain chain):</em> This 
retunes the currently active Engines based on the configuration of the 
chain.</li>
+</ul>
+<p>By combining the results of both methods it is easy to retrieve the List of 
Engines used by a Cahin and also to check if a Chain can be executed based on 
the currently active EnhancementEngines.</p>
+<p>The getter for the active EnhancementEngines now also takes the Chain as 
P</p>
+<p>The getter for the active enhancement engines is intended to be used to 
check if all Chains referenced by a Chain (see Chain#getEngines() method) are 
currently active.</p>
+<h3 id="contentitem">ContentItem</h3>
+<p>Also the Interface of the ContentItem needs to undergo a slight change to 
add the ability for read/write locks to the MGraph holding the metadata. For 
details how this see the following sections about Asynchronous Execution.</p>
+<p>Because of that the type of the return value of the getMetadata method 
needs to be changed from MGraph to LocakableMGraph</p>
+<div class="codehilite"><pre><span class="o">+</span> <span 
class="n">getMetadata</span><span class="p">()</span> <span class="p">:</span> 
<span class="n">LockableMGraph</span>
+</pre></div>
+
+
+<h3 id="asynchronousexecution">AsynchronousExecution</h3>
+<p>The "EnhancementEnigne#canEnhance(ContentItem ci) : int" method can 
indicate if an engine can or can not enhance an ContentItem. In addition this 
method can also indicate to the EnhancementJobManager if an Engine supports the 
asynchronous Execution. This section specifies how the EnhancementJobManager 
needs to use this information to support asynchronous and/or parallel execution 
of multiple EnhancementEngines.</p>
+<p>As soon as EnhancementEngines are executed asynchronously this might also 
result in situations where multiple Engines need to access the ContentItem 
concurrently. Therefore the access to the ContentItem - especially to the 
metadata - MUST BE synchronized. Implementors of EnhancementEngines MUST 
especially be careful if using Iterators as returned by the Clerezzas 
TripleCollection, MGraph and also the GraphNode utility. Because such Iterators 
will throw ConcurrentModificationExceptions if the underlaying graph is 
modified during iteration.</p>
+<p>Because of that Engines that support EnhancementEngine#ENHANCE_ASYNC need 
to use the <a 
href="http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/locks/ReadWriteLock.html";>ReadWriteLock</a>
 provided by the  LockableMGraph returned by ContentItem#getMetadata(). The 
following code snippets show how to use read and write locks with the metadata 
graph.</p>
+<div class="codehilite"><pre><span class="n">LocakableMGraph</span> <span 
class="n">metadata</span> <span class="o">=</span> <span 
class="n">ci</span><span class="o">.</span><span 
class="n">getMetadata</span><span class="p">();</span>
+<span class="n">Lock</span> <span class="n">readLock</span> <span 
class="o">=</span> <span class="n">metadata</span><span class="o">.</span><span 
class="n">getLock</span><span class="p">()</span><span class="o">.</span><span 
class="n">readLock</span><span class="p">();</span>
+<span class="n">readLock</span><span class="o">.</span><span 
class="n">lock</span><span class="p">();</span>
+<span class="n">try</span> <span class="p">{</span>
+    <span class="n">Iterator</span><span class="sr">&lt;Triple&gt;</span> 
<span class="n">it</span> <span class="o">=</span> <span 
class="n">metadata</span><span class="o">.</span><span 
class="n">filter</span><span class="p">(</span><span 
class="err">â¦</span><span class="p">);</span>
+    <span class="k">while</span><span class="p">(</span><span 
class="n">it</span><span class="o">.</span><span class="n">hasNext</span><span 
class="p">()){</span>
+        <span class="sr">/** process the triples */</span>
+    <span class="p">}</span>
+<span class="p">}</span> <span class="n">finally</span> <span 
class="p">{</span>
+    <span class="n">readlock</span><span class="o">.</span><span 
class="n">unlock</span><span class="p">();</span>
+<span class="p">}</span>
+
+<span class="n">Lock</span> <span class="n">writeLock</span> <span 
class="o">=</span> <span class="n">metadata</span><span class="o">.</span><span 
class="n">getLock</span><span class="p">()</span><span class="o">.</span><span 
class="n">writeLock</span><span class="p">();</span>
+<span class="n">writeLock</span><span class="o">.</span><span 
class="n">lock</span><span class="p">();</span>
+<span class="n">try</span> <span class="p">{</span>
+    <span class="sr">/** write new Enhancements to the Graph */</span>
+<span class="p">}</span> <span class="n">finally</span> <span 
class="p">{</span>
+    <span class="n">writelock</span><span class="o">.</span><span 
class="n">unlock</span><span class="p">();</span>
+<span class="p">}</span>
+</pre></div>
+
+
+<p><strong>IMPORTANT:</strong> Do not try to get a write lock within a read 
lock because this may be the cause of deadlocks. Thats because read locks can 
be obtained simultaneously by multiple threads while write locks are exclusive. 
So if two thread with a read lock try to also obtain a write lock they will 
block each other. </p>
+<p>EnhancementEngines that do NOT support EnhancementEngine#ENHANCE_ASYNC - 
meaning that the canEnhance method only returns 
EnhancementEngine#CANNOT_ENHANCE or EnhancementEngine#ENHANCE_SYNCHRONOUS - do 
not need to obtain read and write locks. The EnhancementJobManager 
implementation MUST ensure that they to have exclusive access to the 
Enhancement Graph. This can be either done by obtaining a write lock before 
calling such enhancement engines or by ensuring the no other engines are called 
in parallel.</p>
+<p>In cases where the EnhancementJobManager can execute multiple engines in 
parallel it is good practice to first start the execution of Engines that do 
support EnhancementEngine#ENHANCE_ASYNC. This will allow such engines to obtain 
a read lock to read the data necessary for there calculations before the 
EnhancementJobManager needs to obtain an exclusive write lock for calling 
EnhancementEngines that do only support 
EnhancementEngine#ENHANCE_SYNCHRONOUS.</p>
+  </div>
+  
+  <div id="footer">
+    <div class="copyright">
+      <p>
+        Copyright &copy; 2010 The Apache Software Foundation, Licensed under 
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0";>Apache 
License, Version 2.0</a>.
+        <br />
+        Apache, Stanbol and the Apache feather and Stanbol logos are 
trademarks of The Apache Software Foundation.
+      </p>
+    </div>
+  </div>
+  
+</body>
+</html>

svn commit: r802163 - /websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/STANBOL-414-specification.html

Reply via email to