Author: buildbot
Date: Fri Feb 10 16:38:40 2012
New Revision: 804437
Log:
Staging update by buildbot for stanbol
Added:
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/engines/index.html
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/engines/list.html
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/index.html
Modified:
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components.html
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/factstore/index.html
Modified:
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components.html
==============================================================================
--- websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components.html
(original)
+++ websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components.html
Fri Feb 10 16:38:40 2012
@@ -66,10 +66,10 @@
<p>We will shortly describe the components from top to bottom and link to
their detailed descriptions.</p>
<ul>
<li>
-<p>The <a href="enhancer.html">Enhancer</a> component together with its <a
href="engines.html">Enhancement Engines</a> provides you with the ability to
post content to Apache Stanbol and get suggestions for possible entity
annotation in return. The enhancements are provided via natural language
processing, metadata extraction and linking named entities to public or private
entity repositories. Furthermore, Apache Stanbol provides a machinery to
further process this data and add additional knowledge and links via applying
rules and reasoning. Technically, the enhancements are stored in a triple-graph
that is maintained by <a href="http://incubator.apache.org/clerezza">Apache
Clerezza</a>.</p>
+<p>The <a href="enhancer/">Enhancer</a> component together with its <a
href="enhancer/engines">Enhancement Engines</a> provides you with the ability
to post content to Apache Stanbol and get suggestions for possible entity
annotation in return. The enhancements are provided via natural language
processing, metadata extraction and linking named entities to public or private
entity repositories. Furthermore, Apache Stanbol provides a machinery to
further process this data and add additional knowledge and links via applying
rules and reasoning. Technically, the enhancements are stored in a triple-graph
that is maintained by <a href="http://incubator.apache.org/clerezza">Apache
Clerezza</a>.</p>
</li>
<li>
-<p>The 'Sparql endpoint' gives access to the semantic enhancements form the
Apache Stanbol <a href="enhancer.html">Enhancer</a>.</p>
+<p>The 'Sparql endpoint' gives access to the semantic enhancements form the
Apache Stanbol <a href="enhancer/">Enhancer</a>.</p>
</li>
<li>
<p>The 'EnhancerVIE' is a stateful interface to submit content to analyze and
store the results on the server. It is then possible to browse the resulting
enhanced content items.</p>
Added:
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/engines/index.html
==============================================================================
---
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/engines/index.html
(added)
+++
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/engines/index.html
Fri Feb 10 16:38:40 2012
@@ -0,0 +1,174 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE- 2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+ <link href="/stanbol/css/stanbol.css" rel="stylesheet" type="text/css">
+ <title>Apache Stanbol - Enhancement Engines</title>
+ <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+ <link rel="icon" type="image/png"
href="/stanbol/images/stanbol-logo/stanbol-favicon.png"/>
+</head>
+
+<body>
+ <div id="navigation">
+ <a href="/stanbol/index.html"><img alt="Apache Stanbol" width="220"
height="101" border="0"
src="/stanbol/images/stanbol-logo/stanbol-2010-12-14.png"/></a>
+ <h1 id="stanbol">Stanbol</h1>
+<ul>
+<li><a href="/stanbol/index.html">Home</a></li>
+<li><a href="/stanbol/docs/trunk/tutorial.html">Tutorial</a></li>
+<li><a href="/stanbol/docs/trunk/">Documentation</a></li>
+<li><a href="/stanbol/docs/trunk/building.html">Building</a></li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/mailinglists.html">Mailing Lists</a></li>
+<li><a href="https://issues.apache.org/jira/browse/STANBOL">Issue
Tracker</a></li>
+<li><a href="/stanbol/team.html">Project Team</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/downloads.html">Overview</a></li>
+</ul>
+<h1 id="the_asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org">Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a
Sponsor</a></li>
+<li><a href="http://www.apache.org/security/">Security</a></li>
+</ul>
+ </div>
+
+ <div id="content">
+ <h1 class="title">Enhancement Engines</h1>
+ <p>Enhancement engines are the components that are responsible to enhance
ContentItmes. They are called by the <a
href="../enhancementjobmanager.html">EnhancementJobManager</a>. Enhancement
engines do have full access to the parsed <a
href="../contentitem.html">ContentItem</a>s. They are expected to modify the
state of the content item.</p>
+<p>The RESTful interface of an EnhancementEngines can be accessed by</p>
+<div class="codehilite"><pre><span class="n">http:</span><span
class="sr">//</span><span class="p">{</span><span class="n">host</span><span
class="p">}:{</span><span class="n">port</span><span class="p">}</span><span
class="sr">/{stanbol-root}/</span><span class="n">enhancer</span><span
class="sr">/engine/</span><span class="p">{</span><span
class="n">engine</span><span class="o">-</span><span class="n">name</span><span
class="p">}</span>
+</pre></div>
+
+
+<p>e.g. an EnhancementEngine with the name "ner" running at a Apache Stanbol
instance on local host with the default configuration will be accessible at</p>
+<div class="codehilite"><pre><span class="n">http:</span><span
class="sr">//</span><span class="n">localhost:8080</span><span
class="sr">/enhancer/</span><span class="n">engine</span><span
class="o">/</span><span class="n">ner</span>
+</pre></div>
+
+
+<p>When using the Java API enhancement engines can be liked up as OSGI
services. The <a href="enhancementenginemanager.html">EnhanceEngineManager</a>
service is designed to ease this by providing a API that allows to access
enhancement engine by their name.</p>
+<h2 id="enhancement_engine_interface">Enhancement Engine Interface</h2>
+<p>The interface for enhancement engines contains the following three
methods:</p>
+<div class="codehilite"><pre><span class="sr">/** Getter for the value of the
"stanbol.enhancer.engine.name" property */</span>
+<span class="o">+</span> <span class="n">getName</span><span
class="p">()</span> <span class="p">:</span> <span class="n">String</span>
+<span class="sr">/** Checks if this engine can enhance the parsed content item
*/</span>
+<span class="o">+</span> <span class="n">canEnhance</span><span
class="p">(</span><span class="n">ContentItem</span> <span
class="n">ci</span><span class="p">)</span> <span class="p">:</span> <span
class="nb">int</span>
+<span class="sr">/** Enhances the parsed content item */</span>
+<span class="o">+</span> <span class="n">computeEnhacements</span><span
class="p">(</span><span class="n">ContentItem</span> <span
class="n">ci</span><span class="p">)</span>
+
+<span class="sr">/** The property used for the name of an engine */</span>
+<span class="n">PROPERTY_NAME</span> <span class="p">:</span> <span
class="n">String</span>
+<span class="sr">/** Indicates that this engine can not enhance an content
item */</span>
+<span class="n">CANNOT_ENHANCE</span> <span class="p">:</span> <span
class="nb">int</span>
+<span class="sr">/** Indicates support for synchronous enhancement */</span>
+<span class="n">ENHANCE_SYNCHRONOUS</span> <span class="p">:</span> <span
class="nb">int</span>
+<span class="sr">/** Indicates support for asynchronous enhancement */</span>
+<span class="n">ENHANCE_ASYNC</span> <span class="p">:</span> <span
class="nb">int</span>
+</pre></div>
+
+
+<p>Each enhancement engine has an name assigned. This is typically provided by
the engine configuration and MUST be set as value to the property
"stanbol.enhancer.engine.name" in the service registration of the enhancement
engine. The getter for the name MUST return the same value as the value set to
this property. Enhancement engine implementations will usually get the name by
calling</p>
+<p>this.name =
(String)ComponentContext.getProperties(EnhancementEngine.PROPERTY_NAME);</p>
+<p>in the activate method.</p>
+<p>The "canEnahnce(ContentItem ci)" method is used by the <a
href="../enhancementjobmanager.html">EnhancementJobManager</a> to check if an
engine is able to process a <a href="../contentitem.html">ContentItem</a>.
Calling this method MUST NOT change the state of the ContentItem and this
method MUST also NOT acquire a write lock on the content item.</p>
+<p>The "computeEnhacements(ContentItem ci)" starts the processing of the
parsed ContentItem by the engine. It is expected to change the state of the
parsed ContentItem. Engines that support asynchronous processing need to take
care to correctly apply read/write locks when reading/writing information
from/to the content time. Engines that return ENHANCE_SYNCHRONOUS on calls to
canEnhance(..) do not need to use locks. They can trust that they have
exclusive read/write access to the content item.</p>
+<p>EnhancementEngiens do have full access to the ContentItem. Theoretically
they would be even allowed to delete all metadata as well as all content parts
from the parsed ContentItem. However typically the do only</p>
+<ul>
+<li>read existing ContentParts</li>
+<li>add new ContentParts</li>
+<li>add new Enhancements to the metadata</li>
+<li>some engines might also need to update/delete existing metadata.</li>
+</ul>
+<p>Both the "canEnhance(..)" and "computeEnhancements(..)" methods MUST be
called by the <a href="../enhancementjobmanager.html">EnhancementJobManager</a>
after all the executions of all enhancement engines this one depends on are
completed. This dependencies are defined by the <a
href="../chains/executionplan.html">ExecutionPlan</a> used by the
EnhancementJobManager to enhance the ContentItem. Implementors of enhancement
engines can therefore trust that all metadata expected to be added by other
enhancement engines are already present within the metadata of the parsed
ContentItems when "canEnhance(..)" or "computeEnhancements(..)" is called.</p>
+<h3 id="servicesproperties_interface">ServicesProperties Interface</h3>
+<p>This interface is implemented by most of the current enhancement engines.
It allows engines to expose additional properties to other component. This
interface defines a single method</p>
+<div class="codehilite"><pre><span class="sr">/** Getter for the
ServiceProperties */</span>
+<span class="n">Map</span><span class="sr"><String,Object></span> <span
class="n">getServiceProperties</span><span class="p">();</span>
+</pre></div>
+
+
+<p>but also predefines the property ENHANCEMENT_ENGINE_ORDERING =
"org.apache.stanbol.enhancer.engine.order" that can be used by enhancement
engine implementations to specify their typical ordering within the enhancement
process.</p>
+<h3 id="engine_ordering_information">Engine Ordering Information</h3>
+<p>By implementing the ServicesProperties interface enhancement engines do
have the possibility to expose additional metadata to other components. The
ServicesProperties interface defines only a single method</p>
+<div class="codehilite"><pre><span class="sr">/** Getter for the
ServiceProperties */</span>
+<span class="n">Map</span><span class="sr"><String,Object></span> <span
class="n">getServiceProperties</span><span class="p">();</span>
+</pre></div>
+
+
+<p>and is implemented by most of the current enhancement engines. Its
currently only use is to provide information about the engine ordering within
the enhancement process. This information is exposed by using the key
"org.apache.stanbol.enhancer.engine.order" that is defined as value by the
constant ENHANCEMENT_ENGINE_ORDERING defined directly by the ServicesProperties
interface. Values are expected to be integer within the ranges </p>
+<ul>
+<li><strong>ORDERING_PRE_PROCESSING</strong>: All values >= 200 are
considered for engines that do some kind of preprocessing of the Content. This
includes e.g. the conversation of media formats such as extracting the plain
text from HTML, keyframes from videos, wave form from mp3 ...; extracting
metadata directly encoded within the parsed content such as ID3 tags from MP3
or RDFa, microdata provided by HTML content.</li>
+<li><strong>ORDERING_CONTENT_EXTRACTION</strong>: This range includes values
form < 200 and >= 100 and shall be used by enhancement engine that need
to analyze the parsed content to extract additional metadata. Examples would be
Language detection, Natural Language Processing, Named Entity Recognition, Face
Detection in Images, Speech to text â¦</li>
+<li><strong>ORDERING_EXTRACTION_ENHANCEMENT</strong>: This range includes
values from < 100 and >= 1 and shall be used by enhancement engines to
provide semantic lifting of preexisting enhancement such as linking named
entities extracted by an NER engine with entities defines in a controlled
vocabulary or lifting artist names, song titles ... extracted from mp3 files
with the according Entities defined in an music database.</li>
+<li><strong>ORDERING_DEFAULT</strong>: This represents the value 0 and shall
be used as default value for all enhancement engines that do not provide
ordering information or do not implement the ServicesProperties interface.</li>
+<li><strong>ORDERING_POST_PROCESSING</strong>: This range includes valued form
< 0 and >= -100 and is intended to be used by all enhancement engines
that do post processing of enhancement results such as schema translation,
filtering of Enhancements ...<br />
+</li>
+</ul>
+<p>The Engine Ordering information as described here are used by the <a
href="../chains/defaultchain.html">DefaultChain</a> and the <a
href="../chains/weightedchain.html">WeightedChain</a> to calculate the <a
href="../chains/executionplan.html">ExecutionPlan</a>.</p>
+<p>Basically this features allows the implementor of an enhancement engine to
define the correct position of his engine within an typical enhancement chain
and therefore ensure that users that add this engine to a Stanbol Enhancer
installation to immediately use this engine with the <a
href="../chains/defaultchain.html">DefaultChain</a>.</p>
+<p>However the Engine Ordering is not the only possibility for users to
control the execution order. Enhancement chain implementations such as the <a
href="../chains/listchain.html">ListChain</a> and the <a
href="../chains/graphchain.html">GraphChain</a> do also allow to directly
define the oder of execution. For this chains the ordering information provided
by EnhancementEngines are ignored.</p>
+<h2 id="enhancement_engine_management">Enhancement Engine Management</h2>
+<p>This section describes how enhancement engines are managed by the Stanbol
Enhancer and how they can be selected/accessed by the <a
href="../enhancementjobmanager.html">EnhancementJobManager</a> execution a <a
href="../chains/enhancementchain.html">Chain</a>.</p>
+<p>Enhancement engines are registered as OSGI services and managed by using
the following service properties:</p>
+<ul>
+<li><strong>Name:</strong> Defined by the value of the property
"stanbol.enhancer.engine.name" it will be used to access Engines on the Stanbol
RESTful interface</li>
+<li><strong>Service Ranking:</strong> The service ranking property defined by
OSGI will be used to decide which engine to use in case several active
enhancement engines do use the same name. In such cases only the Engine with
the highest ranking will be used to enhance ContentItems.</li>
+</ul>
+<!-- TODO: The Configuration is not yet defined
+* __Configuration:__ Each EnhacementEngien MAY provide an RDF graph with its
configuration. This graph will be returned on GET request on the URL of the
enhancement engine. If no configuration is known for the engine this MUST at
least return a single triple with the name for the engine.
+
+_TODO:_ To correctly construct this graph the Engine needs to know this URL.
This could e.g. be provided by some OSGI environment parameter set by the
JerseyApplication. As an alternative we could also parse this URI as an
parameter to the getEngineConfig method.
+-->
+
+<p>Other components such as enhancement Chains do refer to engines by their
name. The actual enhancement engine instance is only looked up shortly before
the execution.</p>
+<h3 id="enhancement_engine_name_conflicts">Enhancement Engine Name
Conflicts</h3>
+<p>As enhancement engines are identified by the value of the
"stanbol.enhancer.engine.name" property - the name - there might be cases where
multiple enhancement engine are registered for the same name. In such cases the
normal OSGI procedure to select the default service instance of several
possible matches is used. This means that</p>
+<ol>
+<li>the enhancement engine with the highest "service.ranking" and</li>
+<li>the enhancement engine with the lowest "service.id"</li>
+</ol>
+<p>will be selected on requests for a enhancement engine with a given name.
Requests on the RESTful service API will always answer with the enhancement
engine selected as default. When using the Java API there are also means to
retrieve all enhancement engines for a given name via the <a
href="enhancementenginemanager.html">Enhancement Engine Manager</a>
interface.</p>
+<p>Out of a user perspective there is one major use case for configuring
multiple enhancement engines for the same name. This is to allow the definition
of fallback engines if the main one becomes unavailable. e.g. lets assume that
a user has a local cache of geonames.org loaded into the Entityhub and
configures an <a href="keywordlinkingengine.html">Named Entity Linking</a>
engine to perform semantic lifting of extracted locations. However Stanbol also
provides the <a href="geonamesengine.html">geonames.org Engine</a> that
provides a similar functionality by directly accessing <a
href="http://geonames.org">geonames.org</a>. By configuring both engines for
the same name, but specifying a higher service ranking for the one using the
local cache one can ensure that the local cache is used for the enhancement
under normal circumstances. However in case the local cache becomes unavailable
the other engine using the remote service will be used for enhancement.</p>
+<h3 id="enhancement_engine_manager_interface">Enhancement Engine Manager
Interface</h3>
+<p>The <a href="enhancementenginemanager.html">Enhancement Engine Manager</a>
is the management interface for enhancement engines that can be used by
components to lookup enhancement engines based on their name. There is also
OSGI ServiceTracker like implementation that can be used to track only
enhancement engines registered for a specific set of names. </p>
+<h2 id="enhancement_engine_implementations">Enhancement Engine
Implementations</h2>
+<p>A list of enhancement engine implementations maintained directly by the
Apache Stanbol community can be found <a href="../../engines.html">here</a>.
+However the enhancement engine interface is designed in a way that it should
be possible for advanced Apache Stanbol users to implement own enhancement
engine implementations fulfilling their special needs.</p>
+<p>The Stanbol Community would be very happy if users decide to share thoughts
about possible enhancement engines or even would like to contribute addition
engines to the Apache Stanbol project.</p>
+ </div>
+
+ <div id="footer">
+ <div class="copyright">
+ <p>
+ Copyright © 2010 The Apache Software Foundation, Licensed under
+ the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache
License, Version 2.0</a>.
+ <br />
+ Apache, Stanbol and the Apache feather and Stanbol logos are
trademarks of The Apache Software Foundation.
+ </p>
+ </div>
+ </div>
+
+</body>
+</html>
Added:
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/engines/list.html
==============================================================================
---
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/engines/list.html
(added)
+++
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/engines/list.html
Fri Feb 10 16:38:40 2012
@@ -0,0 +1,149 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE- 2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+ <link href="/stanbol/css/stanbol.css" rel="stylesheet" type="text/css">
+ <title>Apache Stanbol - Enhancement Engines and their main features</title>
+ <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+ <link rel="icon" type="image/png"
href="/stanbol/images/stanbol-logo/stanbol-favicon.png"/>
+</head>
+
+<body>
+ <div id="navigation">
+ <a href="/stanbol/index.html"><img alt="Apache Stanbol" width="220"
height="101" border="0"
src="/stanbol/images/stanbol-logo/stanbol-2010-12-14.png"/></a>
+ <h1 id="stanbol">Stanbol</h1>
+<ul>
+<li><a href="/stanbol/index.html">Home</a></li>
+<li><a href="/stanbol/docs/trunk/tutorial.html">Tutorial</a></li>
+<li><a href="/stanbol/docs/trunk/">Documentation</a></li>
+<li><a href="/stanbol/docs/trunk/building.html">Building</a></li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/mailinglists.html">Mailing Lists</a></li>
+<li><a href="https://issues.apache.org/jira/browse/STANBOL">Issue
Tracker</a></li>
+<li><a href="/stanbol/team.html">Project Team</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/downloads.html">Overview</a></li>
+</ul>
+<h1 id="the_asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org">Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a
Sponsor</a></li>
+<li><a href="http://www.apache.org/security/">Security</a></li>
+</ul>
+ </div>
+
+ <div id="content">
+ <h1 class="title">Enhancement Engines and their main features</h1>
+ <h2 id="preprocessing">Preprocessing</h2>
+<ul>
+<li><strong><a href="enhancer/engines/langidengine.html">Language
Identification Engine</a></strong><ul>
+<li>language detection for textual content utilizing <a
href="http://tika.apache.org/">Apache Tika</a></li>
+</ul>
+</li>
+<li>
+<p><strong><a href="enhancer/engines/metaxaengine.html">Metaxa
Engine</a></strong></p>
+<ul>
+<li>text extraction from various document formats</li>
+<li>extraction of metadata from document formats
+-</li>
+</ul>
+</li>
+</ul>
+<h2 id="natural_language_processing">Natural Language Processing</h2>
+<ul>
+<li><strong><a href="enhancer/engines/namedentityextractionengine.html">Named
Entity Extraction Enhancement Engine</a></strong> <ul>
+<li>NLP processing using OpenNLP NER</li>
+<li>detects occurrences of persons, places and organizations only</li>
+</ul>
+</li>
+<li>
+<p><strong><a
href="enhancer/engines/keywordlinkingengine.html">KeywordLinkingEngine</a></strong></p>
+<ul>
+<li>NLP processing using OpenNLP</li>
+<li>supports multiple languages</li>
+<li>detects occurrences of untyped entities as concepts, takes local
taxonomies as linking target</li>
+</ul>
+</li>
+<li>
+<p><em>Taxonomy Linking Engine</em> (deprecated, see KeywordLinkingEngine)</p>
+<ul>
+<li>NLP processing using OpenNLP POS</li>
+<li>detect occurrences of untyped entities as concepts, takes local taxonomies
as linking target</li>
+</ul>
+</li>
+</ul>
+<h2 id="linking_suggestions">Linking Suggestions</h2>
+<ul>
+<li><strong><a href="enhancer/engines/namedentitytaggingengine.html">Named
Entity Tagging Engine</a></strong><ul>
+<li>suggest links to several Linked Data Sources (e.g. DBpedia)</li>
+</ul>
+</li>
+<li>
+<p><strong><a href="enhancer/engines/geonamesengine.html">Geonames Enhancement
Engine</a></strong> </p>
+<ul>
+<li>suggests links to geonames.org</li>
+<li>provides hierarchical links for locations</li>
+</ul>
+</li>
+<li>
+<p><strong><a href="enhancer/engines/opencalaisengine.html">OpenCalais
Enhancement Engine</a></strong></p>
+<ul>
+<li>integrates service from Open Calais. (Note: You need to provide a key in
order to use this engine)</li>
+</ul>
+</li>
+<li>
+<p><strong><a href="enhancer/engines/zemantaengine.html">Zemanta Enhancement
Engine</a></strong></p>
+<ul>
+<li>integrates the Zemanta services. (Note: You need to provide a key in order
to use this engine)</li>
+</ul>
+</li>
+</ul>
+<h2 id="postprocessing__other">Postprocessing / Other</h2>
+<ul>
+<li><em>CachingDereferencerEngine</em> (deprecated, see dereferencing support
of individual engines as well as <a
href="https://issues.apache.org/jira/browse/STANBOL-336">STANBOL-336</a>)<ul>
+<li>retrieves additional content for presenting the enhancement results.</li>
+</ul>
+</li>
+<li>
+<p><strong><a href="enhancer/engines/refactorengine.html">Refactor
Engine</a></strong>
+ - transforms enhancements according to a target ontology, requires
KRES launcher.</p>
+</li>
+</ul>
+ </div>
+
+ <div id="footer">
+ <div class="copyright">
+ <p>
+ Copyright © 2010 The Apache Software Foundation, Licensed under
+ the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache
License, Version 2.0</a>.
+ <br />
+ Apache, Stanbol and the Apache feather and Stanbol logos are
trademarks of The Apache Software Foundation.
+ </p>
+ </div>
+ </div>
+
+</body>
+</html>
Added:
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/index.html
==============================================================================
---
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/index.html
(added)
+++
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/index.html
Fri Feb 10 16:38:40 2012
@@ -0,0 +1,124 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE- 2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+ <link href="/stanbol/css/stanbol.css" rel="stylesheet" type="text/css">
+ <title>Apache Stanbol - Enhancer</title>
+ <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+ <link rel="icon" type="image/png"
href="/stanbol/images/stanbol-logo/stanbol-favicon.png"/>
+</head>
+
+<body>
+ <div id="navigation">
+ <a href="/stanbol/index.html"><img alt="Apache Stanbol" width="220"
height="101" border="0"
src="/stanbol/images/stanbol-logo/stanbol-2010-12-14.png"/></a>
+ <h1 id="stanbol">Stanbol</h1>
+<ul>
+<li><a href="/stanbol/index.html">Home</a></li>
+<li><a href="/stanbol/docs/trunk/tutorial.html">Tutorial</a></li>
+<li><a href="/stanbol/docs/trunk/">Documentation</a></li>
+<li><a href="/stanbol/docs/trunk/building.html">Building</a></li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/mailinglists.html">Mailing Lists</a></li>
+<li><a href="https://issues.apache.org/jira/browse/STANBOL">Issue
Tracker</a></li>
+<li><a href="/stanbol/team.html">Project Team</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/downloads.html">Overview</a></li>
+</ul>
+<h1 id="the_asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org">Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a
Sponsor</a></li>
+<li><a href="http://www.apache.org/security/">Security</a></li>
+</ul>
+ </div>
+
+ <div id="content">
+ <h1 class="title">Enhancer</h1>
+ <p>This stateless interface allows the caller to submit content to the
Apache Stanbol <a href="engines/">enhancer engines</a> and get the resulting
enhancements formatted as RDF at once without storing anything on the
server-side.</p>
+<p>The content to analyze should be sent in a POST request with the mimetype
specified in the Content-type header. The response will hold the RDF
enhancement serialized in the format specified in the Accept header:</p>
+<div class="codehilite"><pre><span class="n">curl</span> <span
class="o">-</span><span class="n">X</span> <span class="n">POST</span> <span
class="o">-</span><span class="n">H</span> <span class="s">"Accept:
text/turtle"</span> <span class="o">-</span><span class="n">H</span> <span
class="s">"Content-type: text/plain"</span> <span class="o">\</span>
+<span class="o">--</span><span class="n">data</span> <span
class="s">"John Smith was born in London."</span> <span
class="n">http:</span><span class="sr">//</span><span
class="n">localhost:8080</span><span class="o">/</span><span
class="n">engines</span>
+</pre></div>
+
+
+<p>The list of mimetypes accepted as inputs depends on the deployed engines.
By default only text/plain content will be analyzed.</p>
+<h2 id="list_of_available_enhancement_engines">List of Available Enhancement
Engines</h2>
+<p>Apache Stanbol comes with a <a href="engines/list.html">list of predefined
enhancement engines</a>. These engines are supported by the Apache Stanbol
community. If you would like to implement your own enhancement engine, you
should go on reading this documentation.</p>
+<h2 id="main_interfaces_and_utilities">Main Interfaces and Utilities</h2>
+<p>A <strong><a href="contentitem.html">Content Item</a></strong> is the unit
of content that Stanbol Enhancer can deal with. It gives access to the binary
content that was registered, and the graph that represents its metadata
(provided by client and/or generated). The <strong><a
href="engines/">Enhancement Engine</a></strong> provides the interface to
internal or external semantic enhancement engines. There will usually be
several of those, that the EnhancementJobManager uses to enhance content items.
The <strong>Enhancement Job Manager</strong> accepts requests for enhancing
ContentItems, and processes them either synchronously or asynchronously (as
decided by the enhancement engines or by configuration). The
<strong>Enhancement Engine Helper</strong> provides the classes for the
resulting enhancement structure according to the defined <strong>Enhancement
Structure</strong>.</p>
+<h2 id="enhancement_structure">Enhancement Structure</h2>
+<p>The enhancement structure for Apache Stanbol is been described <a
href="http://wiki.iks-project.eu/index.php/EnhancementStructure">here</a> in
full. It defines the types and properties used for the resulting metadata graph
of Apache Stanbol. <em>Note: There is a proposal and ongoing discussion to
update this structure in the future.</em> Every <strong>Enhancement</strong>
type is a description which contains the following important properties:</p>
+<ul>
+<li>creator: the specific enhancement engine creating this enhancement</li>
+<li>creation time: the local system time, when the annotation was created</li>
+<li>extracted-from: the content item for the enhancement. This links to the ID
of the content item as assigned by Stanbol.</li>
+<li>type: the type of the enhancement (e.g. Location, Person, Location,
Concept ...).</li>
+<li>confidence: The level of confidence in the range from 0 to 1 </li>
+</ul>
+<p>A <strong>Text Annotation</strong> type provides metadata for the selected
text. This is intended to be used in addition to the enhancement type if an
enhancement is based on a part of the content.</p>
+<ul>
+<li>start: the character position of the start of the selection. If start is
not defined it is assumed, that the selection starts at the beginning of the
document</li>
+<li>end: the character position of the end of the selection. If end is not
defined it is assumed, that the selection ends at the end of the document.</li>
+<li>selected-text: The text selected by the enhancement. (optional).</li>
+<li>selection-context: The context of the selected text. This adds the
possibility to specify the context used to extract entities such as persons,
organizations, locations ... from natural language documents.</li>
+</ul>
+<p>The <strong>Entity Annotation</strong> refer to named entities which have
been recognized within the content. This type is intended to be used together
with the FISE enhancement type.</p>
+<ul>
+<li>entity-reference: This refers to the URI identifying the Entity</li>
+<li>entity-label: The label(s) of the referred entity</li>
+<li>entity-type: This property can be used to specify the type of the entity
(optional) </li>
+<li>The occurrences of the entity within the content (the exact positions
within the text where this entity is referred) are determined by outgoing
dc:relation links.</li>
+</ul>
+<h2 id="response_in_rdf">Response in RDF</h2>
+<p>Apache Stanbol Enhancer is able to serialize the response in the following
RDF formats:</p>
+<div class="codehilite"><pre><span class="n">application</span><span
class="o">/</span><span class="n">json</span> <span class="p">(</span><span
class="n">JSON</span><span class="o">-</span><span class="n">LD</span><span
class="p">)</span>
+<span class="n">application</span><span class="sr">/rdf+xml (RDF/</span><span
class="n">XML</span><span class="p">)</span>
+<span class="n">application</span><span class="sr">/rdf+json (RDF/</span><span
class="n">JSON</span><span class="p">)</span>
+<span class="n">text</span><span class="o">/</span><span
class="n">turtle</span> <span class="p">(</span><span
class="n">Turtle</span><span class="p">)</span>
+<span class="n">text</span><span class="o">/</span><span
class="n">rdf</span><span class="o">+</span><span class="n">nt</span> <span
class="p">(</span><span class="n">N</span><span class="o">-</span><span
class="n">TRIPLES</span><span class="p">)</span>
+</pre></div>
+
+
+<p>By default the URI of the content item being enhanced is a local, non
de-referencable URI automatically built out of a hash digest of the binary
content. Sometimes it might be helpful to provide the URI of the content-item
to be used in the enhancements RDF graph. This can be achieved by passing a URI
request parameter as follows:</p>
+<div class="codehilite"><pre><span class="n">curl</span> <span
class="o">-</span><span class="n">X</span> <span class="n">POST</span> <span
class="o">-</span><span class="n">H</span> <span class="s">"Accept:
text/turtle"</span> <span class="o">-</span><span class="n">H</span> <span
class="s">"Content-type: text/plain"</span> <span class="o">\</span>
+<span class="o">--</span><span class="n">data</span> <span
class="s">"John Smith was born in London."</span> <span
class="o">\</span>
+<span
class="s">"http://localhost:8080/engines?uri=urn:fise-example-content-item"</span>
+</pre></div>
+ </div>
+
+ <div id="footer">
+ <div class="copyright">
+ <p>
+ Copyright © 2010 The Apache Software Foundation, Licensed under
+ the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache
License, Version 2.0</a>.
+ <br />
+ Apache, Stanbol and the Apache feather and Stanbol logos are
trademarks of The Apache Software Foundation.
+ </p>
+ </div>
+ </div>
+
+</body>
+</html>
Modified:
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/factstore/index.html
==============================================================================
---
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/factstore/index.html
(original)
+++
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/factstore/index.html
Fri Feb 10 16:38:40 2012
@@ -57,10 +57,8 @@
<div id="content">
<h1 class="title">Factstore</h1>
- <p>The FactStore is a component that let's use store relations between
entities identified by their URIs. A relation between two or more entities is
called a <em>fact</em>. The FactStore let's you store N-ary facts according to
a user defined fact schema. In consequence you can store relations between N
participating entities.</p>
-<p>The FactStore only stores the relation and not the entities itself. It only
uses references to entities by using the entities' URI. The entities itself
should be handled by another component, e.g. the <a
href="../entityhub.html">EntityHub</a>. A fact is defined by a fact schema
which is defined over types of entities.</p>
-<p>A fact schema can be defined between an arbitrary number of entities. In
most cases a fact schema is defined between two or three entities. For example,
the fact schema 'works-for' can be defined as a relation between entities of
type 'Person' and 'Organization'. The Fact Store interface allows the creation
of custom fact schemata and to store facts according to these custom
schemata.</p>
-<p>The Fact Store provides a simple way to define and store facts. This
component is meant to be used in scenarios where a simple solution is
sufficient and it is not required to define a complex ontology with reasoning
support.</p>
+ <p>The FactStore is a component that let's use store relations between
entities identified by their URIs. A relation between two or more entities is
called a <em>fact</em>. The FactStore let's you store N-ary facts according to
a user defined fact schema. In consequence you can store relations between N
participating entities. The FactStore only stores the relation and not the
entities itself. It only uses references to entities by using the entities'
URI. The entities itself should be handled by another component, e.g. the <a
href="../entityhub.html">EntityHub</a>. A fact is defined by a fact schema
which is defined over types of entities.</p>
+<p>A fact schema can be defined between an arbitrary number of entities. In
most cases a fact schema is defined between two or three entities. For example,
the fact schema 'works-for' can be defined as a relation between entities of
type 'Person' and 'Organization'. The Fact Store interface allows the creation
of custom fact schemata and to store facts according to these custom schemata.
The Fact Store provides a simple way to define and store facts. This component
is meant to be used in scenarios where a simple solution is sufficient and it
is not required to define a complex ontology with reasoning support.</p>
<p>Read on and have a look at a concrete example or go to the <a
href="specification.html">FactStore specification</a> page for more details. If
you need some information about its realization, read the notes about its <a
href="implementation.html">implementation concept</a>.</p>
<h2 id="example">Example</h2>
<p>Imagine you want to store the fact that the person named John Doe works for
the company Winzigweich. John Doe is represented by the URI
http://www.doe.com/john and the company by http://www.winzigweich.de. This fact
is stored as a relation between the entity http://www.doe.com/john and
http://www.winzigweich.de.</p>