Author: rwesten
Date: Fri Jan 27 08:06:28 2012
New Revision: 1236565
URL: http://svn.apache.org/viewvc?rev=1236565&view=rev
Log:
Added documentation for EnhancementEngine and EnhancementEngineManager
Added:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/enhancementengine.mdtext
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/enhancementenginemanager.mdtext
Modified:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/chainmanager.mdtext
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancementchain.mdtext
Modified:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/chainmanager.mdtext
URL:
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/chainmanager.mdtext?rev=1236565&r1=1236564&r2=1236565&view=diff
==============================================================================
---
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/chainmanager.mdtext
(original)
+++
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/chainmanager.mdtext
Fri Jan 27 08:06:28 2012
@@ -1,6 +1,6 @@
Title: ChainManager
-The ChainManager provides name based access to all active [Enhancement
Chain](enhancementchain.html) and there ServiceReferences. This interface is
typically used by components that need to lookup Chains based on there name.
However the ChainsTracker implementation can also be used to track specific
Chains.
+The ChainManager provides name based access to all active [Enhancement
Chain](enhancementchain.html) and their ServiceReferences. This interface is
typically used by components that need to lookup Chains based on their name.
However the ChainsTracker implementation can also be used to track specific
Chains.
### ChainManager interface
Modified:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancementchain.mdtext
URL:
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancementchain.mdtext?rev=1236565&r1=1236564&r2=1236565&view=diff
==============================================================================
---
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancementchain.mdtext
(original)
+++
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancementchain.mdtext
Fri Jan 27 08:06:28 2012
@@ -13,7 +13,7 @@ Enhancement requestes issued to
are processed by using the default enhancement chain.
-When using the Java API Chains can be looked up as OSGI services. The the
[ChainManager](chainmanager.html) service is designed to ease this by providing
a API that allows to access Chains by their name. Because Chains are not
responsible to perform the actual execution but only provide the
[ExecutionPlan](executionplan.html) one needs to also lookup an
EnhancementJobManager instance to enhance a contentItem
+When using the Java API Chains can be looked up as OSGI services. The
[ChainManager](chainmanager.html) service is designed to ease this by providing
a API that allows to access Chains by their name. Because Chains are not
responsible to perform the actual execution but only provide the
[ExecutionPlan](executionplan.html) one needs to also lookup an
EnhancementJobManager instance to enhance a contentItem
@Reference
EnhancementJobManager jobManager;
@@ -49,7 +49,7 @@ The Chain interface is very simplistic.
/** Constant for the property used to for the name of the Chain */
+ PROPERTY_NAME : String
-Each Chain has an name assigned. This is typically provided by the chain
configuration and MUST me set as value to the property
"stanbol.enhancer.chain.name" of the service registration. The getter for the
name MUST return the same value. Chain implementation will usually get the name
typically by calling
+Each Chain has an name assigned. This is typically provided by the chain
configuration and MUST be set as value to the property
"stanbol.enhancer.chain.name" of the service registration. The getter for the
name MUST return the same value. Chain implementation will usually get the name
by calling
this.name = (String)ComponentContext.getProperties(Chain.PROPERTY_NAME);
@@ -94,7 +94,7 @@ All Stanbol launchers are configured wit
### ChainManager interface
-The [ChainManager](chainmanager.html) is the management interface for
EnhancementChains that can be used by components to lookup chains based on
there name. It also provides a getter for the default chain. There is also OSGI
ServiceTracker like implementation that can be used to track only chains with
specific names and to get even notified on any change of such chains.
+The [ChainManager](chainmanager.html) is the management interface for
EnhancementChains that can be used by components to lookup chains based on
their name. It also provides a getter for the default chain. There is also OSGI
ServiceTracker like implementation that can be used to track only chains with
specific names and to get even notified on any change of such chains.
## Chain implementations
Added:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/enhancementengine.mdtext
URL:
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/enhancementengine.mdtext?rev=1236565&view=auto
==============================================================================
---
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/enhancementengine.mdtext
(added)
+++
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/enhancementengine.mdtext
Fri Jan 27 08:06:28 2012
@@ -0,0 +1,127 @@
+Title: EnhancementEngine
+
+EnhancementEngines are the components that are responsible to enhance
ContentItmes. They are called by the
[EnhancementJobManager](enhancementjobmanager.html). EnhancementEngines do have
full access to the parsed ContentItems. They are expected to modify the state
of the content item.
+
+The RESTful interface of an EnhancementEngines can be accessed by
+
+ http://{host}:{port}/{stanbol-root}/enhancer/engine/{engine-name}
+
+e.g. an EnhancementEngine with the name "ner" running at a Apache Stanbol
instance on local host with the default configuration will be accessible at
+
+ http://localhost:8080/enhancer/engine/ner
+
+When using the Java API EnhancementEngines can be liked up as OSGI services.
The [EnhanceEngineManager](enhancementenginemanager.html) service is designed
to ease this by providing a API that allows to access EnhancementEngine by
their name.
+
+## EnhancementEngine Interface
+
+The interface for EnhancementEngines contains the following three methods:
+
+ /** Getter for the value of the "stanbol.enhancer.engine.name" property */
+ + getName() : String
+ /** Checks if this engine can enhance the parsed content item */
+ + canEnhance(ContentItem ci) : int
+ /** Enhances the parsed content item */
+ + computeEnhacements(ContentItem ci)
+
+ /** The property used for the name of an engine */
+ PROPERTY_NAME : String
+ /** Indicates that this engine can not enhance an content item */
+ CANNOT_ENHANCE : int
+ /** Indicates support for synchronous enhancement */
+ ENHANCE_SYNCHRONOUS : int
+ /** Indicates support for asynchronous enhancement */
+ ENHANCE_ASYNC : int
+
+Each EnhancementEngine has an name assigned. This is typically provided by the
engine configuration and MUST be set as value to the property
"stanbol.enhancer.engine.name" in the service registration of the enhancement
engine. The getter for the name MUST return the same value as the value set to
this property. EnhancementEngine implementations will usually get the name by
calling
+
+ this.name =
(String)ComponentContext.getProperties(EnhancementEngine.PROPERTY_NAME);
+
+in the activate method.
+
+The "canEnahnce(ContentItem ci)" method is used by the
[EnhancementJobManager](../enhancementjobmanager.html) to check if an engine is
able to process a ContentItem. Calling this method MUST NOT change the state of
the ContentItem and this method MUST also NOT acquire a write lock on the
content item.
+
+The "computeEnhacements(ContentItem ci)" starts the processing of the parsed
ContentItem by the engine. It is expected to change the state of the parsed
ContentItem. Engines that support asynchronous processing need to take care to
correctly apply read/write locks when reading/writing information from/to the
content time. Engines that return ENHANCE_SYNCHRONOUS on calls to
canEnhance(..) do not need to use locks. They can trust that they have
exclusive read/write access to the content item.
+
+EnhancementEngiens do have full access to the ContentItem. Theoretically they
would be even allowed to delete all metadata as well as all content parts from
the parsed ContentItem. However typically the do only
+
+* read existing ContentParts
+* add new ContentParts
+* add new Enhancements to the metadata
+* some engines might also need to update/delete existing metadata.
+
+Both the "canEnhance(..)" and "computeEnhancements(..)" methods MUST be called
by the [EnhancementJobManager](../enhancementjobmanager.html) after all the
executions of all EnhancementEngines this one depends on are completed. This
dependencies are defined by the [ExecutionPlan](../chains/executionplan.html)
used by the EnhancementJobManager to enhance the ContentItem. Implementors of
EnhancementEngines can therefore trust that all metadata expected to be added
by other EnhancementEngines are already present within the metadata of the
parsed ContentItems when "canEnhance(..)" or "computeEnhancements(..)" is
called.
+
+### ServicesProperties Interface
+
+This interface is implemented by most of the current EnhancementEngines. It
allows engines to expose additional properties to other component. This
interface defines a single method
+
+ /** Getter for the ServiceProperties */
+ Map<String,Object> getServiceProperties();
+
+but also predefines the property ENHANCEMENT_ENGINE_ORDERING =
"org.apache.stanbol.enhancer.engine.order" that can be used by
EnhancementEngine implementations to specify their typical ordering within the
enhancement process.
+
+### Engine Ordering Information
+
+By implementing the ServicesProperties interface EnhancementEngines do have
the possibility to expose additional metadata to other components. The
ServicesProperties interface defines only a single method
+
+ /** Getter for the ServiceProperties */
+ Map<String,Object> getServiceProperties();
+
+and is implemented by most of the current EnhancementEngines. Its currently
only use is to provide information about the engine ordering within the
enhancement process. This information is exposed by using the key
"org.apache.stanbol.enhancer.engine.order" that is defined as value by the
constant ENHANCEMENT_ENGINE_ORDERING defined directly by the ServicesProperties
interface. Values are expected to be integer within the ranges
+
+* __ORDERING_PRE_PROCESSING__: All values >= 200 are considered for engines
that do some kind of preprocessing of the Content. This includes e.g. the
conversation of media formats such as extracting the plain text from HTML,
keyframes from videos, wave form from mp3 ...; extracting metadata directly
encoded within the parsed content such as ID3 tags from MP3 or RDFa, microdata
provided by HTML content.
+* __ORDERING_CONTENT_EXTRACTION__: This range includes values form < 200 and
>= 100 and shall be used by enhancement engine that need to analyze the parsed
content to extract additional metadata. Examples would be Language detection,
Natural Language Processing, Named Entity Recognition, Face Detection in
Images, Speech to text â¦
+* __ORDERING_EXTRACTION_ENHANCEMENT__: This range includes values from < 100
and >= 1 and shall be used by enhancement engines to provide semantic lifting
of preexisting enhancement such as linking named entities extracted by an NER
engine with entities defines in a controlled vocabulary or lifting artist
names, song titles ... extracted from mp3 files with the according Entities
defined in an music database.
+* __ORDERING_DEFAULT__: This represents the value 0 and shall be used as
default value for all EnhancementEngines that do not provide ordering
information or do not implement the ServicesProperties interface.
+* __ORDERING_POST_PROCESSING__: This range includes valued form < 0 and >=
-100 and is intended to be used by all enhancement engines that do post
processing of enhancement results such as schema translation, filtering of
Enhancements ...
+
+The Engine Ordering information as described here are used by the
[DefaultChain](../chains/defaultchain.html) and the
[WeightedChain](../chains/weightedchain.html) to calculate the
[ExecutionPlan](../chains/executionplan.html).
+
+Basically this features allows the implementor of an EnhancementEngine to
define the correct position of his engine within an typical enhancement chain
and therefore ensure that users that add this engine to a Stanbol Enhancer
installation to immediately use this engine with the
[DefaultChain](../chains/defaultchain.html).
+
+However the Engine Ordering is not the only possibility for users to control
the execution order. Enhancement chain implementations such as the
[ListChain](../chains/listchain.html) and the
[GraphChain](../chains/grpahchain.html) do also allow to directly define the
oder of execution. For this chains the ordering information provided by
EnhancementEngines are ignored.
+
+
+## EnhancementEngine management
+
+This section describes how EnhancementEngines are managed by the Stanbol
Enhancer and how they can be selected/accessed by the
[EnhancementJobManager](../enhancementjobmanager.html) execution a
[Chain](../chains/enhancementchain.html).
+
+EnhancementEngines are registered as OSGI services and managed by using the
following service properties:
+
+* __Name:__ Defined by the value of the property
"stanbol.enhancer.engine.name" it will be used to access Engines on the Stanbol
RESTful interface
+* __Service Ranking:__ The service ranking property defined by OSGI will be
used to decide which engine to use in case several active EnhancementEngines do
use the same name. In such cases only the Engine with the highest ranking will
be used to enhance ContentItems.
+
+<!-- TODO: The Configuration is not yet defined
+* __Configuration:__ Each EnhacementEngien MAY provide an RDF graph with its
configuration. This graph will be returned on GET request on the URL of the
EnhancementEngine. If no configuration is known for the engine this MUST at
least return a single triple with the name for the engine.
+
+_TODO:_ To correctly construct this graph the Engine needs to know this URL.
This could e.g. be provided by some OSGI environment parameter set by the
JerseyApplication. As an alternative we could also parse this URI as an
parameter to the getEngineConfig method.
+-->
+
+Other components such as enhancement Chains do refer to engines by their name.
The actual EnhancementEngine instance is only looked up shortly before the
execution.
+
+### EnhancementEngine Name Conflicts
+
+As EnhancementEngines are identified by the value of the
"stanbol.enhancer.engine.name" property - the name - there might be cases where
multiple EnhancementEngine are registered for the same name. In such cases the
normal OSGI procedure to select the default service instance of several
possible matches is used. This means that
+
+1. the EnhancementEngine with the highest "service.ranking" and
+2. the EnhancementEngine with the lowest "service.id"
+
+will be selected on requests for a EnhancementEngine with a given name.
Requests on the RESTful service API will always answer with the
EnhancementEngine selected as default. When using the Java API there are also
means to retrieve all EnhancementEngines for a given name via the
[EnhancementEngineManager](enhancementenginemanager.html) interface.
+
+Out of a user perspective there is one major use case for configuring multiple
enhancement engines for the same name. This is to allow the definition of
fallback engines if the main one becomes unavailable. e.g. lets assume that a
user has a local cache of geonames.org loaded into the Entityhub and configures
an [NamedEntityLinking](keywordlinkingengine.html) engine to perform semantic
lifting of extracted locations. However Stanbol also provides the [geonames.org
Engine](geonamesengine.html) that provides a similar functionality by directly
accessing [geonames.org](http://geonames.org). By configuring both engines for
the same name, but specifying a higher service ranking for the one using the
local cache one can ensure that the local cache is used for the enhancement
under normal circumstances. However in case the local cache becomes unavailable
the other engine using the remote service will be used for enhancement.
+
+### EnhancementEngineManager interface
+
+The [EnhancementEngineManager](enhancementenginemanager.html) is the
management interface for EnhancementEngines that can be used by components to
lookup enhancement engines based on their name. There is also OSGI
ServiceTracker like implementation that can be used to track only enhancement
engines registered for a specific set of names.
+
+## EnhancementEngine implementations
+
+A list of EnhancementEngine implementations maintained directly by the Apache
Stanbol community can be found [here](../../engines.html).
+However the EnhancementEngine interface is designed in a way that it should be
possible for advanced Apache Stanbol users to implement own EnhancementEngine
implementations fulfilling their special needs.
+
+The Stanbol Community would be very happy if users decide to share thoughts
about possible enhancement engines or even would like to contribute addition
engines to the Apache Stanbol project.
+
+
+
+
+
Added:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/enhancementenginemanager.mdtext
URL:
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/enhancementenginemanager.mdtext?rev=1236565&view=auto
==============================================================================
---
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/enhancementenginemanager.mdtext
(added)
+++
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/enhancementenginemanager.mdtext
Fri Jan 27 08:06:28 2012
@@ -0,0 +1,53 @@
+Title: EnhancementEngineManager
+
+The EnhancementEngineManager provides name based access to all active
[EnhancementEngine](enhancementengine.html)s and their ServiceReferences. This
interface is typically used by components that need to lookup
EnhancementEngiens based on their name. However the EngineTracker
implementation can also be used to track specific EnhancementEngines.
+
+### EnhancementEngineManager interface
+
+This is the Java API providing access to registered EnhancementEngines in the
ways as described above. This interface includes the following methods:
+
+ /** Getter for all names with active engines */
+ getActiveEngineNames() : Set<String>
+ /** Getter for the ServiceReference to the engine
+ with a given name */
+ getReference(String name) : ServiceReference
+ /** Getter for all ServiceReferences to engines
+ with a given name sorted by service ranking */
+ getReferences(String name)
+ /** Getter for the engine with a given name */
+ + getEngine(Stirng name) : EnhancementEngine
+ /** Getter for all engines with a given name sorted
+ by service ranking */
+ + getEngines(String name) : List<EnhancementEngine>
+ /** Getter for an engine based on a service reference */
+ + getEngine(ServiceReference ref) : EnhancementEgnie
+ /** Checks if there is an engine for the given name */
+ + isEngine(String name) : boolean
+
+There are two implementations of this interface available:
+
+#### EnhancementEngineManager Service
+
+This is an implementation of the EnhancementEngineManager interface that is
registered as OSGI service. It can be used e.g. by using the @Reference
annotation
+
+ @Reference
+ EnhancementEngineManager engineManager
+
+This service is provided by the "org.apache.stanbol.enhancer.enginemanger"
module and is included in all Stanbol launchers.
+
+#### EnginesTracker
+
+This is an Utility similar to the standard OSGI ServiceTracker that allows to
track some/all EnhancementEngines. It also supports the usage of a
ServiceTrackerCustomizer so that users of that utility can directly react to
changes of tracked EnhancementEngines.
+
+ //track only "myEngine" and "otherEngine"
+ EnginesTracker tracker = new EnginesTracker(
+ context, "myEngine","otherEngine");
+ tracker.open(); //start tracking
+
+ //the tracker need to be closed if no longer needed
+ tracker.close()
+ tracker = null;
+
+For most users the EnhancementEgingeManager service is sufficient and
preferable. Direct use of the EngineTracker is only recommended if one needs
only to track some specific engines and especially if one needs to get notified
an changes of such engines.
+
+The implementation of the
[WeightedChain](http://svn.apache.org/repos/asf/incubator/stanbol/trunk/enhancer/chain/weighted/src/main/java/org/apache/stanbol/enhancer/chain/weighted/impl/WeightedChain.java)
is a good example for the intended usage of the EnginesTracker.