Author: fchrist
Date: Fri Feb 10 16:54:10 2012
New Revision: 1242851
URL: http://svn.apache.org/viewvc?rev=1242851&view=rev
Log:
STANBOL-449 Minor edits and typos
Modified:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/index.mdtext
Modified:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/index.mdtext
URL:
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/index.mdtext?rev=1242851&r1=1242850&r2=1242851&view=diff
==============================================================================
---
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/index.mdtext
(original)
+++
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/index.mdtext
Fri Feb 10 16:54:10 2012
@@ -1,16 +1,16 @@
Title: Enhancement Engines
-Enhancement engines are the components that are responsible to enhance
ContentItmes. They are called by the
[EnhancementJobManager](../enhancementjobmanager.html). Enhancement engines do
have full access to the parsed [ContentItem](../contentitem.html)s. They are
expected to modify the state of the content item.
+Enhancement engines are the components that are responsible to enhance
[content itmes](../contentitem.html). They are called by the [Enhancement Job
Manager](../enhancementjobmanager.html). Enhancement engines do have full
access to the parsed content items. They are expected to modify their state.
-The RESTful interface of an EnhancementEngines can be accessed by
+The RESTful interface of an enhancement engine can be accessed by
http://{host}:{port}/{stanbol-root}/enhancer/engine/{engine-name}
-e.g. an EnhancementEngine with the name "ner" running at a Apache Stanbol
instance on local host with the default configuration will be accessible at
+e.g. an enhancement engine with the name "ner" running at a Apache Stanbol
instance on local host with the default configuration will be accessible at
http://localhost:8080/enhancer/engine/ner
-When using the Java API enhancement engines can be liked up as OSGI services.
The [EnhanceEngineManager](enhancementenginemanager.html) service is designed
to ease this by providing a API that allows to access enhancement engine by
their name.
+When using the Java API enhancement engines can be liked up as OSGI services.
The [Enhancement Engine Manager](enhancementenginemanager.html) service is
designed to ease this by providing a API that allows to access enhancement
engine by their name.
## Enhancement Engine Interface
@@ -32,42 +32,40 @@ The interface for enhancement engines co
/** Indicates support for asynchronous enhancement */
ENHANCE_ASYNC : int
-Each enhancement engine has an name assigned. This is typically provided by
the engine configuration and MUST be set as value to the property
"stanbol.enhancer.engine.name" in the service registration of the enhancement
engine. The getter for the name MUST return the same value as the value set to
this property. Enhancement engine implementations will usually get the name by
calling
+Each enhancement engine has a name. This is typically provided by the engine
configuration and MUST be set as value to the property
"stanbol.enhancer.engine.name" in the service registration of the enhancement
engine. The getter for the name MUST return the same value as the value set to
this property. Enhancement engine implementations will usually get the name by
calling:
this.name =
(String)ComponentContext.getProperties(EnhancementEngine.PROPERTY_NAME);
-in the activate method.
+The <code>canEnahnce(ContentItem ci)</code> method is used by the [Enhancement
Job Manager](../enhancementjobmanager.html) to check if an engine is able to
process a [Content Item](../contentitem.html). Calling this method MUST NOT
change the state of the content item and this method MUST also NOT acquire a
write lock on the content item.
-The "canEnahnce(ContentItem ci)" method is used by the
[EnhancementJobManager](../enhancementjobmanager.html) to check if an engine is
able to process a [ContentItem](../contentitem.html). Calling this method MUST
NOT change the state of the ContentItem and this method MUST also NOT acquire a
write lock on the content item.
+The <code>computeEnhacements(ContentItem ci)</code> starts the processing of
the parsed content item by the engine. It is expected to change the state of
the parsed content item. Engines that support asynchronous processing need to
take care to correctly apply read/write locks when reading/writing information
from/to the content item. Engines that return <code>ENHANCE_SYNCHRONOUS</code>
on calls to <code>canEnhance(..)</code> do not need to use locks. They can
trust that they have exclusive read/write access to the content item.
-The "computeEnhacements(ContentItem ci)" starts the processing of the parsed
ContentItem by the engine. It is expected to change the state of the parsed
ContentItem. Engines that support asynchronous processing need to take care to
correctly apply read/write locks when reading/writing information from/to the
content time. Engines that return ENHANCE_SYNCHRONOUS on calls to
canEnhance(..) do not need to use locks. They can trust that they have
exclusive read/write access to the content item.
+Enhancement engins do have full access to the content item. Theoretically,
they would be even allowed to delete all metadata as well as all content parts
from the parsed content item. However typically the do only
-EnhancementEngiens do have full access to the ContentItem. Theoretically they
would be even allowed to delete all metadata as well as all content parts from
the parsed ContentItem. However typically the do only
-
-* read existing ContentParts
-* add new ContentParts
-* add new Enhancements to the metadata
+* read existing content parts
+* add new content parts
+* add new enhancements to the metadata
* some engines might also need to update/delete existing metadata.
-Both the "canEnhance(..)" and "computeEnhancements(..)" methods MUST be called
by the [EnhancementJobManager](../enhancementjobmanager.html) after all the
executions of all enhancement engines this one depends on are completed. This
dependencies are defined by the [ExecutionPlan](../chains/executionplan.html)
used by the EnhancementJobManager to enhance the ContentItem. Implementors of
enhancement engines can therefore trust that all metadata expected to be added
by other enhancement engines are already present within the metadata of the
parsed ContentItems when "canEnhance(..)" or "computeEnhancements(..)" is
called.
+Both the <code>canEnhance(..)</code> and <code>computeEnhancements(..)</code>
methods MUST be called by the [Enhancement Job
Manager](../enhancementjobmanager.html) after all the executions of all
enhancement engines this one depends on are completed. This dependencies are
defined by the [Execution Plan](../chains/executionplan.html) used by the
enhancement job manager to enhance the content item. Implementors of
enhancement engines can therefore trust that all metadata expected to be added
by other enhancement engines are already present within the metadata of the
parsed content items when <code>canEnhance(..)</code> or
<code>computeEnhancements(..)</code> is called.
-### ServicesProperties Interface
+### Services Properties Interface
This interface is implemented by most of the current enhancement engines. It
allows engines to expose additional properties to other component. This
interface defines a single method
/** Getter for the ServiceProperties */
Map<String,Object> getServiceProperties();
-but also predefines the property ENHANCEMENT_ENGINE_ORDERING =
"org.apache.stanbol.enhancer.engine.order" that can be used by enhancement
engine implementations to specify their typical ordering within the enhancement
process.
+but also predefines the property <code>ENHANCEMENT_ENGINE_ORDERING =
"org.apache.stanbol.enhancer.engine.order"</code> that can be used by
enhancement engine implementations to specify their typical ordering within the
enhancement process.
### Engine Ordering Information
-By implementing the ServicesProperties interface enhancement engines do have
the possibility to expose additional metadata to other components. The
ServicesProperties interface defines only a single method
+By implementing the ServicesProperties interface enhancement engines do have
the possibility to expose additional metadata to other components. The services
properties interface defines only a single method
/** Getter for the ServiceProperties */
Map<String,Object> getServiceProperties();
-and is implemented by most of the current enhancement engines. Its currently
only use is to provide information about the engine ordering within the
enhancement process. This information is exposed by using the key
"org.apache.stanbol.enhancer.engine.order" that is defined as value by the
constant ENHANCEMENT_ENGINE_ORDERING defined directly by the ServicesProperties
interface. Values are expected to be integer within the ranges
+and is implemented by most of the current enhancement engines. Its currently
only use is to provide information about the engine ordering within the
enhancement process. This information is exposed by using the key
"org.apache.stanbol.enhancer.engine.order" that is defined as value by the
constant <code>ENHANCEMENT_ENGINE_ORDERING</code> defined directly by the
services properties interface. Values are expected to be integer within the
ranges
* __ORDERING_PRE_PROCESSING__: All values >= 200 are considered for engines
that do some kind of preprocessing of the Content. This includes e.g. the
conversation of media formats such as extracting the plain text from HTML,
keyframes from videos, wave form from mp3 ...; extracting metadata directly
encoded within the parsed content such as ID3 tags from MP3 or RDFa, microdata
provided by HTML content.
* __ORDERING_CONTENT_EXTRACTION__: This range includes values form < 200 and
>= 100 and shall be used by enhancement engine that need to analyze the parsed
content to extract additional metadata. Examples would be Language detection,
Natural Language Processing, Named Entity Recognition, Face Detection in
Images, Speech to text â¦
@@ -75,18 +73,18 @@ and is implemented by most of the curren
* __ORDERING_DEFAULT__: This represents the value 0 and shall be used as
default value for all enhancement engines that do not provide ordering
information or do not implement the ServicesProperties interface.
* __ORDERING_POST_PROCESSING__: This range includes valued form < 0 and >=
-100 and is intended to be used by all enhancement engines that do post
processing of enhancement results such as schema translation, filtering of
Enhancements ...
-The Engine Ordering information as described here are used by the
[DefaultChain](../chains/defaultchain.html) and the
[WeightedChain](../chains/weightedchain.html) to calculate the
[ExecutionPlan](../chains/executionplan.html).
+The engine ordering information as described here are used by the [Default
Chain](../chains/defaultchain.html) and the [Weighted
Chain](../chains/weightedchain.html) to calculate the [Execution
Plan](../chains/executionplan.html).
-Basically this features allows the implementor of an enhancement engine to
define the correct position of his engine within an typical enhancement chain
and therefore ensure that users that add this engine to a Stanbol Enhancer
installation to immediately use this engine with the
[DefaultChain](../chains/defaultchain.html).
+Basically this features allows the implementor of an enhancement engine to
define the correct position of his engine within an typical enhancement chain
and therefore ensure that users that add this engine to a enhancer installation
to immediately use this engine with the [Default
Chain](../chains/defaultchain.html).
-However the Engine Ordering is not the only possibility for users to control
the execution order. Enhancement chain implementations such as the
[ListChain](../chains/listchain.html) and the
[GraphChain](../chains/graphchain.html) do also allow to directly define the
oder of execution. For this chains the ordering information provided by
EnhancementEngines are ignored.
+However, the engine ordering is not the only possibility for users to control
the execution order. Enhancement chain implementations such as the [List
Chain](../chains/listchain.html) and the [Graph
Chain](../chains/graphchain.html) do also allow to directly define the oder of
execution. For these chains the ordering information provided by enhancement
engines are ignored.
## Enhancement Engine Management
-This section describes how enhancement engines are managed by the Stanbol
Enhancer and how they can be selected/accessed by the
[EnhancementJobManager](../enhancementjobmanager.html) execution a
[Chain](../chains/enhancementchain.html).
+This section describes how enhancement engines are managed by the Apache
Stanbol Enhancer and how they can be selected/accessed through the [Enhancement
Job Manager](../enhancementjobmanager.html) and executed in an [Enhancement
Chain](../chains/enhancementchain.html).
-Enhancement engines are registered as OSGI services and managed by using the
following service properties:
+Enhancement engines are registered as OSGi services and managed by using the
following service properties:
* __Name:__ Defined by the value of the property
"stanbol.enhancer.engine.name" it will be used to access Engines on the Stanbol
RESTful interface
* __Service Ranking:__ The service ranking property defined by OSGI will be
used to decide which engine to use in case several active enhancement engines
do use the same name. In such cases only the Engine with the highest ranking
will be used to enhance ContentItems.
@@ -97,18 +95,18 @@ Enhancement engines are registered as OS
_TODO:_ To correctly construct this graph the Engine needs to know this URL.
This could e.g. be provided by some OSGI environment parameter set by the
JerseyApplication. As an alternative we could also parse this URI as an
parameter to the getEngineConfig method.
-->
-Other components such as enhancement Chains do refer to engines by their name.
The actual enhancement engine instance is only looked up shortly before the
execution.
+Other components such as enhancement chains do refer to engines by their name.
The actual enhancement engine instance is only looked up shortly before the
execution.
### Enhancement Engine Name Conflicts
-As enhancement engines are identified by the value of the
"stanbol.enhancer.engine.name" property - the name - there might be cases where
multiple enhancement engine are registered for the same name. In such cases the
normal OSGI procedure to select the default service instance of several
possible matches is used. This means that
+As enhancement engines are identified by the value of the
"stanbol.enhancer.engine.name" property - the name - there might be cases where
multiple enhancement engine are registered for the same name. In such cases the
normal OSGi procedure to select the default service instance of several
possible matches is used. This means that
1. the enhancement engine with the highest "service.ranking" and
2. the enhancement engine with the lowest "service.id"
will be selected on requests for a enhancement engine with a given name.
Requests on the RESTful service API will always answer with the enhancement
engine selected as default. When using the Java API there are also means to
retrieve all enhancement engines for a given name via the [Enhancement Engine
Manager](enhancementenginemanager.html) interface.
-Out of a user perspective there is one major use case for configuring multiple
enhancement engines for the same name. This is to allow the definition of
fallback engines if the main one becomes unavailable. e.g. lets assume that a
user has a local cache of geonames.org loaded into the Entityhub and configures
an [Named Entity Linking](keywordlinkingengine.html) engine to perform semantic
lifting of extracted locations. However Stanbol also provides the [geonames.org
Engine](geonamesengine.html) that provides a similar functionality by directly
accessing [geonames.org](http://geonames.org). By configuring both engines for
the same name, but specifying a higher service ranking for the one using the
local cache one can ensure that the local cache is used for the enhancement
under normal circumstances. However in case the local cache becomes unavailable
the other engine using the remote service will be used for enhancement.
+Out of a user perspective there is one major use case for configuring multiple
enhancement engines for the same name. This is to allow the definition of
fallback engines if the main one becomes unavailable. e.g. lets assume that a
user has a local cache of geonames.org loaded into the [Entity
Hub](../../entityhub/) and configures an [Named Entity
Linking](keywordlinkingengine.html) engine to perform semantic lifting of
extracted locations. However Apache Stanbol also provides the [geonames.org
Engine](geonamesengine.html) that provides a similar functionality by directly
accessing [geonames.org](http://geonames.org). By configuring both engines for
the same name, but specifying a higher service ranking for the one using the
local cache one can ensure that the local cache is used for the enhancement
under normal circumstances. However in case the local cache becomes unavailable
the other engine using the remote service will be used for enhancement.
### Enhancement Engine Manager Interface
@@ -119,7 +117,7 @@ The [Enhancement Engine Manager](enhance
A list of enhancement engine implementations maintained directly by the Apache
Stanbol community can be found [here](../../engines.html).
However the enhancement engine interface is designed in a way that it should
be possible for advanced Apache Stanbol users to implement own enhancement
engine implementations fulfilling their special needs.
-The Stanbol Community would be very happy if users decide to share thoughts
about possible enhancement engines or even would like to contribute addition
engines to the Apache Stanbol project.
+The Apache Stanbol community would be very happy if users decide to share
thoughts about possible enhancement engines or even would like to contribute
addition engines to the Apache Stanbol project.