Author: rwesten
Date: Fri Jan 27 08:06:28 2012
New Revision: 1236565

URL: http://svn.apache.org/viewvc?rev=1236565&view=rev
Log:
Added documentation for EnhancementEngine and EnhancementEngineManager

Added:
    
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/enhancementengine.mdtext
    
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/enhancementenginemanager.mdtext
Modified:
    
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/chainmanager.mdtext
    
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancementchain.mdtext

Modified: 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/chainmanager.mdtext
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/chainmanager.mdtext?rev=1236565&r1=1236564&r2=1236565&view=diff
==============================================================================
--- 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/chainmanager.mdtext
 (original)
+++ 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/chainmanager.mdtext
 Fri Jan 27 08:06:28 2012
@@ -1,6 +1,6 @@
 Title: ChainManager
 
-The ChainManager provides name based access to all active [Enhancement 
Chain](enhancementchain.html) and there ServiceReferences. This interface is 
typically used by components that need to lookup Chains based on there name. 
However the ChainsTracker implementation can also be used to track specific 
Chains.
+The ChainManager provides name based access to all active [Enhancement 
Chain](enhancementchain.html) and their ServiceReferences. This interface is 
typically used by components that need to lookup Chains based on their name. 
However the ChainsTracker implementation can also be used to track specific 
Chains.
 
 ### ChainManager interface
 

Modified: 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancementchain.mdtext
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancementchain.mdtext?rev=1236565&r1=1236564&r2=1236565&view=diff
==============================================================================
--- 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancementchain.mdtext
 (original)
+++ 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancementchain.mdtext
 Fri Jan 27 08:06:28 2012
@@ -13,7 +13,7 @@ Enhancement requestes issued to 
 
 are processed by using the default enhancement chain.
 
-When using the Java API Chains can be looked up as OSGI services. The the 
[ChainManager](chainmanager.html) service is designed to ease this by providing 
a API that allows to access Chains by their name. Because Chains are not 
responsible to perform the actual execution but only provide the 
[ExecutionPlan](executionplan.html) one needs to also lookup an 
EnhancementJobManager instance to enhance a contentItem
+When using the Java API Chains can be looked up as OSGI services. The 
[ChainManager](chainmanager.html) service is designed to ease this by providing 
a API that allows to access Chains by their name. Because Chains are not 
responsible to perform the actual execution but only provide the 
[ExecutionPlan](executionplan.html) one needs to also lookup an 
EnhancementJobManager instance to enhance a contentItem
 
     @Reference
     EnhancementJobManager jobManager;
@@ -49,7 +49,7 @@ The Chain interface is very simplistic. 
     /** Constant for the property used to for the name of the Chain */
     + PROPERTY_NAME : String
 
-Each Chain has an name assigned. This is typically provided by the chain 
configuration and MUST me set as value to the property 
"stanbol.enhancer.chain.name" of the service registration. The getter for the 
name MUST return the same value. Chain implementation will usually get the name 
typically by calling
+Each Chain has an name assigned. This is typically provided by the chain 
configuration and MUST be set as value to the property 
"stanbol.enhancer.chain.name" of the service registration. The getter for the 
name MUST return the same value. Chain implementation will usually get the name 
by calling
 
    this.name = (String)ComponentContext.getProperties(Chain.PROPERTY_NAME);
 
@@ -94,7 +94,7 @@ All Stanbol launchers are configured wit
 
 ### ChainManager interface
 
-The [ChainManager](chainmanager.html) is the management interface for 
EnhancementChains that can be used by components to lookup chains based on 
there name. It also provides a getter for the default chain. There is also OSGI 
ServiceTracker like implementation that can be used to track only chains with 
specific names and to get even notified on any change of such chains.
+The [ChainManager](chainmanager.html) is the management interface for 
EnhancementChains that can be used by components to lookup chains based on 
their name. It also provides a getter for the default chain. There is also OSGI 
ServiceTracker like implementation that can be used to track only chains with 
specific names and to get even notified on any change of such chains.
 
 ## Chain implementations
 

Added: 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/enhancementengine.mdtext
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/enhancementengine.mdtext?rev=1236565&view=auto
==============================================================================
--- 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/enhancementengine.mdtext
 (added)
+++ 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/enhancementengine.mdtext
 Fri Jan 27 08:06:28 2012
@@ -0,0 +1,127 @@
+Title: EnhancementEngine
+
+EnhancementEngines are the components that are responsible to enhance 
ContentItmes. They are called by the 
[EnhancementJobManager](enhancementjobmanager.html). EnhancementEngines do have 
full access to the parsed ContentItems. They are expected to modify the state 
of the content item.
+
+The RESTful interface of an EnhancementEngines can be accessed by
+
+    http://{host}:{port}/{stanbol-root}/enhancer/engine/{engine-name}
+
+e.g. an EnhancementEngine with the name "ner" running at a Apache Stanbol 
instance on local host with the default configuration will be accessible at
+
+    http://localhost:8080/enhancer/engine/ner
+
+When using the Java API EnhancementEngines can be liked up as OSGI services. 
The [EnhanceEngineManager](enhancementenginemanager.html) service is designed 
to ease this by providing a API that allows to access EnhancementEngine by 
their name.
+
+## EnhancementEngine Interface
+
+The interface for EnhancementEngines contains the following three methods:
+
+    /** Getter for the value of the "stanbol.enhancer.engine.name" property */
+    + getName() : String
+    /** Checks if this engine can enhance the parsed content item */
+    + canEnhance(ContentItem ci) : int
+    /** Enhances the parsed content item */
+    + computeEnhacements(ContentItem ci)
+   
+    /** The property used for the name of an engine */
+    PROPERTY_NAME : String
+    /** Indicates that this engine can not enhance an content item */
+    CANNOT_ENHANCE : int
+    /** Indicates support for synchronous enhancement */
+    ENHANCE_SYNCHRONOUS : int
+    /** Indicates support for asynchronous enhancement */
+    ENHANCE_ASYNC : int
+
+Each EnhancementEngine has an name assigned. This is typically provided by the 
engine configuration and MUST be set as value to the property 
"stanbol.enhancer.engine.name" in the service registration of the enhancement 
engine. The getter for the name MUST return the same value as the value set to 
this property. EnhancementEngine implementations will usually get the name by 
calling
+
+   this.name = 
(String)ComponentContext.getProperties(EnhancementEngine.PROPERTY_NAME);
+
+in the activate method.
+
+The "canEnahnce(ContentItem ci)" method is used by the 
[EnhancementJobManager](../enhancementjobmanager.html) to check if an engine is 
able to process a ContentItem. Calling this method MUST NOT change the state of 
the ContentItem and this method MUST also NOT acquire a write lock on the 
content item.
+
+The "computeEnhacements(ContentItem ci)" starts the processing of the parsed 
ContentItem by the engine. It is expected to change the state of the parsed 
ContentItem. Engines that support asynchronous processing need to take care to 
correctly apply read/write locks when reading/writing information from/to the 
content time. Engines that return ENHANCE_SYNCHRONOUS on calls to 
canEnhance(..) do not need to use locks. They can trust that they have 
exclusive read/write access to the content item.
+
+EnhancementEngiens do have full access to the ContentItem. Theoretically they 
would be even allowed to delete all metadata as well as all content parts from 
the parsed ContentItem. However typically the do only
+
+* read existing ContentParts
+* add new ContentParts
+* add new Enhancements to the metadata
+* some engines might also need to update/delete existing metadata.
+
+Both the "canEnhance(..)" and "computeEnhancements(..)" methods MUST be called 
by the [EnhancementJobManager](../enhancementjobmanager.html) after all the 
executions of all EnhancementEngines this one depends on are completed. This 
dependencies are defined by the [ExecutionPlan](../chains/executionplan.html) 
used by the EnhancementJobManager to enhance the ContentItem. Implementors of 
EnhancementEngines can therefore trust that all metadata expected to be added 
by other EnhancementEngines are already present within the metadata of the 
parsed ContentItems when "canEnhance(..)" or "computeEnhancements(..)" is 
called.
+
+### ServicesProperties Interface
+
+This interface is implemented by most of the current EnhancementEngines. It 
allows engines to expose additional properties to other component. This 
interface defines a single method
+    
+    /** Getter for the ServiceProperties */
+    Map<String,Object> getServiceProperties();
+
+but also predefines the property ENHANCEMENT_ENGINE_ORDERING = 
"org.apache.stanbol.enhancer.engine.order" that can be used by 
EnhancementEngine implementations to specify their typical ordering within the 
enhancement process.
+
+### Engine Ordering Information
+
+By implementing the ServicesProperties interface EnhancementEngines do have 
the possibility to expose additional metadata to other components. The 
ServicesProperties interface defines only a single method
+
+    /** Getter for the ServiceProperties */
+    Map<String,Object> getServiceProperties();
+
+and is implemented by most of the current EnhancementEngines. Its currently 
only use is to provide information about the engine ordering within the 
enhancement process. This information is exposed by using the key 
"org.apache.stanbol.enhancer.engine.order" that is defined as value by the 
constant ENHANCEMENT_ENGINE_ORDERING defined directly by the ServicesProperties 
interface. Values are expected to be integer within the ranges 
+
+* __ORDERING_PRE_PROCESSING__: All values >= 200 are considered for engines 
that do some kind of preprocessing of the Content. This includes e.g. the 
conversation of media formats such as extracting the plain text from HTML, 
keyframes from videos, wave form from mp3 ...; extracting metadata directly 
encoded within the parsed content such as ID3 tags from MP3 or RDFa, microdata 
provided by HTML content.
+* __ORDERING_CONTENT_EXTRACTION__: This range includes values form < 200 and 
>= 100 and shall be used by enhancement engine that need to analyze the parsed 
content to extract additional metadata. Examples would be Language detection, 
Natural Language Processing, Named Entity Recognition, Face Detection in 
Images, Speech to text …
+* __ORDERING_EXTRACTION_ENHANCEMENT__: This range includes values from < 100 
and >= 1 and shall be used by enhancement engines to provide semantic lifting 
of preexisting enhancement such as linking named entities extracted by an NER 
engine with entities defines in a controlled vocabulary or lifting artist 
names, song titles ... extracted from mp3 files with the according Entities 
defined in an music database.
+* __ORDERING_DEFAULT__: This represents the value 0 and shall be used as 
default value for all EnhancementEngines that do not provide ordering 
information or do not implement the ServicesProperties interface.
+* __ORDERING_POST_PROCESSING__: This range includes valued form < 0 and >= 
-100 and is intended to be used by all enhancement engines that do post 
processing of enhancement results such as schema translation, filtering of 
Enhancements ...  
+
+The Engine Ordering information as described here are used by the 
[DefaultChain](../chains/defaultchain.html) and the 
[WeightedChain](../chains/weightedchain.html) to calculate the 
[ExecutionPlan](../chains/executionplan.html).
+
+Basically this features allows the implementor of an EnhancementEngine to 
define the correct position of his engine within an typical enhancement chain 
and therefore ensure that users that add this engine to a Stanbol Enhancer 
installation to immediately use this engine with the 
[DefaultChain](../chains/defaultchain.html).
+
+However the Engine Ordering is not the only possibility for users to control 
the execution order. Enhancement chain implementations such as the 
[ListChain](../chains/listchain.html) and the 
[GraphChain](../chains/grpahchain.html) do also allow to directly define the 
oder of execution. For this chains the ordering information provided by 
EnhancementEngines are ignored.
+
+
+## EnhancementEngine management
+
+This section describes how EnhancementEngines are managed by the Stanbol 
Enhancer and how they can be selected/accessed by the 
[EnhancementJobManager](../enhancementjobmanager.html) execution a 
[Chain](../chains/enhancementchain.html).
+
+EnhancementEngines are registered as OSGI services and managed by using the 
following service properties:
+
+* __Name:__ Defined by the value of the property 
"stanbol.enhancer.engine.name" it will be used to access Engines on the Stanbol 
RESTful interface
+* __Service Ranking:__ The service ranking property defined by OSGI will be 
used to decide which engine to use in case several active EnhancementEngines do 
use the same name. In such cases only the Engine with the highest ranking will 
be used to enhance ContentItems.
+
+<!-- TODO: The Configuration is not yet defined 
+* __Configuration:__ Each EnhacementEngien MAY provide an RDF graph with its 
configuration. This graph will be returned on GET request on the URL of the 
EnhancementEngine. If no configuration is known for the engine this MUST at 
least return a single triple with the name for the engine.
+
+_TODO:_ To correctly construct this graph the Engine needs to know this URL. 
This could e.g. be provided by some OSGI environment parameter set by the 
JerseyApplication. As an alternative we could also parse this URI as an 
parameter to the getEngineConfig method.
+-->
+
+Other components such as enhancement Chains do refer to engines by their name. 
The actual EnhancementEngine instance is only looked up shortly before the 
execution.
+
+### EnhancementEngine Name Conflicts
+
+As EnhancementEngines are identified by the value of the 
"stanbol.enhancer.engine.name" property - the name - there might be cases where 
multiple EnhancementEngine are registered for the same name. In such cases the 
normal OSGI procedure to select the default service instance of several 
possible matches is used. This means that
+
+1. the EnhancementEngine with the highest "service.ranking" and
+2. the EnhancementEngine with the lowest "service.id"
+
+will be selected on requests for a EnhancementEngine with a given name. 
Requests on the RESTful service API will always answer with the 
EnhancementEngine selected as default. When using the Java API there are also 
means to retrieve all EnhancementEngines for a given name via the 
[EnhancementEngineManager](enhancementenginemanager.html) interface.
+
+Out of a user perspective there is one major use case for configuring multiple 
enhancement engines for the same name. This is to allow the definition of 
fallback engines if the main one becomes unavailable. e.g. lets assume that a 
user has a local cache of geonames.org loaded into the Entityhub and configures 
an [NamedEntityLinking](keywordlinkingengine.html) engine to perform semantic 
lifting of extracted locations. However Stanbol also provides the [geonames.org 
Engine](geonamesengine.html) that provides a similar functionality by directly 
accessing [geonames.org](http://geonames.org). By configuring both engines for 
the same name, but specifying a higher service ranking for the one using the 
local cache one can ensure that the local cache is used for the enhancement 
under normal circumstances. However in case the local cache becomes unavailable 
the other engine using the remote service will be used for enhancement.
+
+### EnhancementEngineManager interface
+
+The [EnhancementEngineManager](enhancementenginemanager.html) is the 
management interface for EnhancementEngines that can be used by components to 
lookup enhancement engines based on their name. There is also OSGI 
ServiceTracker like implementation that can be used to track only enhancement 
engines registered for a specific set of names. 
+
+## EnhancementEngine implementations
+
+A list of EnhancementEngine implementations maintained directly by the Apache 
Stanbol community can be found [here](../../engines.html).
+However the EnhancementEngine interface is designed in a way that it should be 
possible for advanced Apache Stanbol users to implement own EnhancementEngine 
implementations fulfilling their special needs.
+
+The Stanbol Community would be very happy if users decide to share thoughts 
about possible enhancement engines or even would like to contribute addition 
engines to the Apache Stanbol project.
+
+
+
+
+

Added: 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/enhancementenginemanager.mdtext
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/enhancementenginemanager.mdtext?rev=1236565&view=auto
==============================================================================
--- 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/enhancementenginemanager.mdtext
 (added)
+++ 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/enhancementenginemanager.mdtext
 Fri Jan 27 08:06:28 2012
@@ -0,0 +1,53 @@
+Title: EnhancementEngineManager
+
+The EnhancementEngineManager provides name based access to all active 
[EnhancementEngine](enhancementengine.html)s and their ServiceReferences. This 
interface is typically used by components that need to lookup 
EnhancementEngiens based on their name. However the EngineTracker 
implementation can also be used to track specific EnhancementEngines.
+
+### EnhancementEngineManager interface
+
+This is the Java API providing access to registered EnhancementEngines in the 
ways as described above. This interface includes the following methods:
+
+    /** Getter for all names with active engines */
+    getActiveEngineNames() : Set<String>
+    /** Getter for the ServiceReference to the engine 
+        with a given name */
+    getReference(String name) : ServiceReference
+    /** Getter for all ServiceReferences to engines 
+        with a given name sorted by service ranking */
+    getReferences(String name)
+    /** Getter for the engine with a given name */
+    + getEngine(Stirng name) : EnhancementEngine
+    /** Getter for all engines with a given name sorted 
+        by service ranking */
+    + getEngines(String name) : List<EnhancementEngine>
+    /** Getter for an engine based on a service reference */
+    + getEngine(ServiceReference ref) : EnhancementEgnie
+    /** Checks if there is an engine for the given name */
+    + isEngine(String name) : boolean
+
+There are two implementations of this interface available:
+
+#### EnhancementEngineManager Service
+
+This is an implementation of the EnhancementEngineManager interface that is 
registered as OSGI service. It can be used e.g. by using the @Reference 
annotation
+
+    @Reference
+    EnhancementEngineManager engineManager
+
+This service is provided by the "org.apache.stanbol.enhancer.enginemanger" 
module and is included in all Stanbol launchers.
+
+#### EnginesTracker
+
+This is an Utility similar to the standard OSGI ServiceTracker that allows to 
track some/all EnhancementEngines. It also supports the usage of a 
ServiceTrackerCustomizer so that users of that utility can directly react to 
changes of tracked EnhancementEngines.
+
+    //track only "myEngine" and "otherEngine"
+    EnginesTracker tracker = new EnginesTracker(
+        context, "myEngine","otherEngine");
+    tracker.open(); //start tracking
+ 
+    //the tracker need to be closed if no longer needed
+    tracker.close()
+    tracker = null;
+
+For most users the EnhancementEgingeManager service is sufficient and 
preferable. Direct use of the EngineTracker is only recommended if one needs 
only to track some specific engines and especially if one needs to get notified 
an changes of such engines.
+
+The implementation of the 
[WeightedChain](http://svn.apache.org/repos/asf/incubator/stanbol/trunk/enhancer/chain/weighted/src/main/java/org/apache/stanbol/enhancer/chain/weighted/impl/WeightedChain.java)
 is a good example for the intended usage of the EnginesTracker.


Reply via email to