Author: buildbot
Date: Fri Feb 10 16:38:40 2012
New Revision: 804437

Log:
Staging update by buildbot for stanbol

Added:
    
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/engines/index.html
    
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/engines/list.html
    
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/index.html
Modified:
    websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components.html
    
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/factstore/index.html

Modified: 
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components.html
==============================================================================
--- websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components.html 
(original)
+++ websites/staging/stanbol/trunk/content/stanbol/docs/trunk/components.html 
Fri Feb 10 16:38:40 2012
@@ -66,10 +66,10 @@
 <p>We will shortly describe the components from top to bottom and link to 
their detailed descriptions.</p>
 <ul>
 <li>
-<p>The <a href="enhancer.html">Enhancer</a> component together with its <a 
href="engines.html">Enhancement Engines</a> provides you with the ability to 
post content to Apache Stanbol and get suggestions for possible entity 
annotation in return. The enhancements are provided via natural language 
processing, metadata extraction and linking named entities to public or private 
entity repositories. Furthermore, Apache Stanbol provides a machinery to 
further process this data and add additional knowledge and links via applying 
rules and reasoning. Technically, the enhancements are stored in a triple-graph 
that is maintained by <a href="http://incubator.apache.org/clerezza";>Apache 
Clerezza</a>.</p>
+<p>The <a href="enhancer/">Enhancer</a> component together with its <a 
href="enhancer/engines">Enhancement Engines</a> provides you with the ability 
to post content to Apache Stanbol and get suggestions for possible entity 
annotation in return. The enhancements are provided via natural language 
processing, metadata extraction and linking named entities to public or private 
entity repositories. Furthermore, Apache Stanbol provides a machinery to 
further process this data and add additional knowledge and links via applying 
rules and reasoning. Technically, the enhancements are stored in a triple-graph 
that is maintained by <a href="http://incubator.apache.org/clerezza";>Apache 
Clerezza</a>.</p>
 </li>
 <li>
-<p>The 'Sparql endpoint' gives access to the semantic enhancements form the 
Apache Stanbol <a href="enhancer.html">Enhancer</a>.</p>
+<p>The 'Sparql endpoint' gives access to the semantic enhancements form the 
Apache Stanbol <a href="enhancer/">Enhancer</a>.</p>
 </li>
 <li>
 <p>The 'EnhancerVIE' is a stateful interface to submit content to analyze and 
store the results on the server. It is then possible to browse the resulting 
enhanced content items.</p>

Added: 
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/engines/index.html
==============================================================================
--- 
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/engines/index.html
 (added)
+++ 
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/engines/index.html
 Fri Feb 10 16:38:40 2012
@@ -0,0 +1,174 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 
"http://www.w3.org/TR/html4/loose.dtd";>
+<html>
+<head>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE- 2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+  <link href="/stanbol/css/stanbol.css" rel="stylesheet" type="text/css">
+  <title>Apache Stanbol - Enhancement Engines</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <link rel="icon" type="image/png" 
href="/stanbol/images/stanbol-logo/stanbol-favicon.png"/>
+</head>
+
+<body>
+  <div id="navigation"> 
+  <a href="/stanbol/index.html"><img alt="Apache Stanbol" width="220" 
height="101" border="0" 
src="/stanbol/images/stanbol-logo/stanbol-2010-12-14.png"/></a>
+  <h1 id="stanbol">Stanbol</h1>
+<ul>
+<li><a href="/stanbol/index.html">Home</a></li>
+<li><a href="/stanbol/docs/trunk/tutorial.html">Tutorial</a></li>
+<li><a href="/stanbol/docs/trunk/">Documentation</a></li>
+<li><a href="/stanbol/docs/trunk/building.html">Building</a></li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/mailinglists.html">Mailing Lists</a></li>
+<li><a href="https://issues.apache.org/jira/browse/STANBOL";>Issue 
Tracker</a></li>
+<li><a href="/stanbol/team.html">Project Team</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0";>License</a></li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/downloads.html">Overview</a></li>
+</ul>
+<h1 id="the_asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org";>Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html";>Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html";>Become a 
Sponsor</a></li>
+<li><a href="http://www.apache.org/security/";>Security</a></li>
+</ul>
+  </div>
+  
+  <div id="content">
+    <h1 class="title">Enhancement Engines</h1>
+    <p>Enhancement engines are the components that are responsible to enhance 
ContentItmes. They are called by the <a 
href="../enhancementjobmanager.html">EnhancementJobManager</a>. Enhancement 
engines do have full access to the parsed <a 
href="../contentitem.html">ContentItem</a>s. They are expected to modify the 
state of the content item.</p>
+<p>The RESTful interface of an EnhancementEngines can be accessed by</p>
+<div class="codehilite"><pre><span class="n">http:</span><span 
class="sr">//</span><span class="p">{</span><span class="n">host</span><span 
class="p">}:{</span><span class="n">port</span><span class="p">}</span><span 
class="sr">/{stanbol-root}/</span><span class="n">enhancer</span><span 
class="sr">/engine/</span><span class="p">{</span><span 
class="n">engine</span><span class="o">-</span><span class="n">name</span><span 
class="p">}</span>
+</pre></div>
+
+
+<p>e.g. an EnhancementEngine with the name "ner" running at a Apache Stanbol 
instance on local host with the default configuration will be accessible at</p>
+<div class="codehilite"><pre><span class="n">http:</span><span 
class="sr">//</span><span class="n">localhost:8080</span><span 
class="sr">/enhancer/</span><span class="n">engine</span><span 
class="o">/</span><span class="n">ner</span>
+</pre></div>
+
+
+<p>When using the Java API enhancement engines can be liked up as OSGI 
services. The <a href="enhancementenginemanager.html">EnhanceEngineManager</a> 
service is designed to ease this by providing a API that allows to access 
enhancement engine by their name.</p>
+<h2 id="enhancement_engine_interface">Enhancement Engine Interface</h2>
+<p>The interface for enhancement engines contains the following three 
methods:</p>
+<div class="codehilite"><pre><span class="sr">/** Getter for the value of the 
&quot;stanbol.enhancer.engine.name&quot; property */</span>
+<span class="o">+</span> <span class="n">getName</span><span 
class="p">()</span> <span class="p">:</span> <span class="n">String</span>
+<span class="sr">/** Checks if this engine can enhance the parsed content item 
*/</span>
+<span class="o">+</span> <span class="n">canEnhance</span><span 
class="p">(</span><span class="n">ContentItem</span> <span 
class="n">ci</span><span class="p">)</span> <span class="p">:</span> <span 
class="nb">int</span>
+<span class="sr">/** Enhances the parsed content item */</span>
+<span class="o">+</span> <span class="n">computeEnhacements</span><span 
class="p">(</span><span class="n">ContentItem</span> <span 
class="n">ci</span><span class="p">)</span>
+
+<span class="sr">/** The property used for the name of an engine */</span>
+<span class="n">PROPERTY_NAME</span> <span class="p">:</span> <span 
class="n">String</span>
+<span class="sr">/** Indicates that this engine can not enhance an content 
item */</span>
+<span class="n">CANNOT_ENHANCE</span> <span class="p">:</span> <span 
class="nb">int</span>
+<span class="sr">/** Indicates support for synchronous enhancement */</span>
+<span class="n">ENHANCE_SYNCHRONOUS</span> <span class="p">:</span> <span 
class="nb">int</span>
+<span class="sr">/** Indicates support for asynchronous enhancement */</span>
+<span class="n">ENHANCE_ASYNC</span> <span class="p">:</span> <span 
class="nb">int</span>
+</pre></div>
+
+
+<p>Each enhancement engine has an name assigned. This is typically provided by 
the engine configuration and MUST be set as value to the property 
"stanbol.enhancer.engine.name" in the service registration of the enhancement 
engine. The getter for the name MUST return the same value as the value set to 
this property. Enhancement engine implementations will usually get the name by 
calling</p>
+<p>this.name = 
(String)ComponentContext.getProperties(EnhancementEngine.PROPERTY_NAME);</p>
+<p>in the activate method.</p>
+<p>The "canEnahnce(ContentItem ci)" method is used by the <a 
href="../enhancementjobmanager.html">EnhancementJobManager</a> to check if an 
engine is able to process a <a href="../contentitem.html">ContentItem</a>. 
Calling this method MUST NOT change the state of the ContentItem and this 
method MUST also NOT acquire a write lock on the content item.</p>
+<p>The "computeEnhacements(ContentItem ci)" starts the processing of the 
parsed ContentItem by the engine. It is expected to change the state of the 
parsed ContentItem. Engines that support asynchronous processing need to take 
care to correctly apply read/write locks when reading/writing information 
from/to the content time. Engines that return ENHANCE_SYNCHRONOUS on calls to 
canEnhance(..) do not need to use locks. They can trust that they have 
exclusive read/write access to the content item.</p>
+<p>EnhancementEngiens do have full access to the ContentItem. Theoretically 
they would be even allowed to delete all metadata as well as all content parts 
from the parsed ContentItem. However typically the do only</p>
+<ul>
+<li>read existing ContentParts</li>
+<li>add new ContentParts</li>
+<li>add new Enhancements to the metadata</li>
+<li>some engines might also need to update/delete existing metadata.</li>
+</ul>
+<p>Both the "canEnhance(..)" and "computeEnhancements(..)" methods MUST be 
called by the <a href="../enhancementjobmanager.html">EnhancementJobManager</a> 
after all the executions of all enhancement engines this one depends on are 
completed. This dependencies are defined by the <a 
href="../chains/executionplan.html">ExecutionPlan</a> used by the 
EnhancementJobManager to enhance the ContentItem. Implementors of enhancement 
engines can therefore trust that all metadata expected to be added by other 
enhancement engines are already present within the metadata of the parsed 
ContentItems when "canEnhance(..)" or "computeEnhancements(..)" is called.</p>
+<h3 id="servicesproperties_interface">ServicesProperties Interface</h3>
+<p>This interface is implemented by most of the current enhancement engines. 
It allows engines to expose additional properties to other component. This 
interface defines a single method</p>
+<div class="codehilite"><pre><span class="sr">/** Getter for the 
ServiceProperties */</span>
+<span class="n">Map</span><span class="sr">&lt;String,Object&gt;</span> <span 
class="n">getServiceProperties</span><span class="p">();</span>
+</pre></div>
+
+
+<p>but also predefines the property ENHANCEMENT_ENGINE_ORDERING = 
"org.apache.stanbol.enhancer.engine.order" that can be used by enhancement 
engine implementations to specify their typical ordering within the enhancement 
process.</p>
+<h3 id="engine_ordering_information">Engine Ordering Information</h3>
+<p>By implementing the ServicesProperties interface enhancement engines do 
have the possibility to expose additional metadata to other components. The 
ServicesProperties interface defines only a single method</p>
+<div class="codehilite"><pre><span class="sr">/** Getter for the 
ServiceProperties */</span>
+<span class="n">Map</span><span class="sr">&lt;String,Object&gt;</span> <span 
class="n">getServiceProperties</span><span class="p">();</span>
+</pre></div>
+
+
+<p>and is implemented by most of the current enhancement engines. Its 
currently only use is to provide information about the engine ordering within 
the enhancement process. This information is exposed by using the key 
"org.apache.stanbol.enhancer.engine.order" that is defined as value by the 
constant ENHANCEMENT_ENGINE_ORDERING defined directly by the ServicesProperties 
interface. Values are expected to be integer within the ranges </p>
+<ul>
+<li><strong>ORDERING_PRE_PROCESSING</strong>: All values &gt;= 200 are 
considered for engines that do some kind of preprocessing of the Content. This 
includes e.g. the conversation of media formats such as extracting the plain 
text from HTML, keyframes from videos, wave form from mp3 ...; extracting 
metadata directly encoded within the parsed content such as ID3 tags from MP3 
or RDFa, microdata provided by HTML content.</li>
+<li><strong>ORDERING_CONTENT_EXTRACTION</strong>: This range includes values 
form &lt; 200 and &gt;= 100 and shall be used by enhancement engine that need 
to analyze the parsed content to extract additional metadata. Examples would be 
Language detection, Natural Language Processing, Named Entity Recognition, Face 
Detection in Images, Speech to text …</li>
+<li><strong>ORDERING_EXTRACTION_ENHANCEMENT</strong>: This range includes 
values from &lt; 100 and &gt;= 1 and shall be used by enhancement engines to 
provide semantic lifting of preexisting enhancement such as linking named 
entities extracted by an NER engine with entities defines in a controlled 
vocabulary or lifting artist names, song titles ... extracted from mp3 files 
with the according Entities defined in an music database.</li>
+<li><strong>ORDERING_DEFAULT</strong>: This represents the value 0 and shall 
be used as default value for all enhancement engines that do not provide 
ordering information or do not implement the ServicesProperties interface.</li>
+<li><strong>ORDERING_POST_PROCESSING</strong>: This range includes valued form 
&lt; 0 and &gt;= -100 and is intended to be used by all enhancement engines 
that do post processing of enhancement results such as schema translation, 
filtering of Enhancements ...<br />
+</li>
+</ul>
+<p>The Engine Ordering information as described here are used by the <a 
href="../chains/defaultchain.html">DefaultChain</a> and the <a 
href="../chains/weightedchain.html">WeightedChain</a> to calculate the <a 
href="../chains/executionplan.html">ExecutionPlan</a>.</p>
+<p>Basically this features allows the implementor of an enhancement engine to 
define the correct position of his engine within an typical enhancement chain 
and therefore ensure that users that add this engine to a Stanbol Enhancer 
installation to immediately use this engine with the <a 
href="../chains/defaultchain.html">DefaultChain</a>.</p>
+<p>However the Engine Ordering is not the only possibility for users to 
control the execution order. Enhancement chain implementations such as the <a 
href="../chains/listchain.html">ListChain</a> and the <a 
href="../chains/graphchain.html">GraphChain</a> do also allow to directly 
define the oder of execution. For this chains the ordering information provided 
by EnhancementEngines are ignored.</p>
+<h2 id="enhancement_engine_management">Enhancement Engine Management</h2>
+<p>This section describes how enhancement engines are managed by the Stanbol 
Enhancer and how they can be selected/accessed by the <a 
href="../enhancementjobmanager.html">EnhancementJobManager</a> execution a <a 
href="../chains/enhancementchain.html">Chain</a>.</p>
+<p>Enhancement engines are registered as OSGI services and managed by using 
the following service properties:</p>
+<ul>
+<li><strong>Name:</strong> Defined by the value of the property 
"stanbol.enhancer.engine.name" it will be used to access Engines on the Stanbol 
RESTful interface</li>
+<li><strong>Service Ranking:</strong> The service ranking property defined by 
OSGI will be used to decide which engine to use in case several active 
enhancement engines do use the same name. In such cases only the Engine with 
the highest ranking will be used to enhance ContentItems.</li>
+</ul>
+<!-- TODO: The Configuration is not yet defined 
+* __Configuration:__ Each EnhacementEngien MAY provide an RDF graph with its 
configuration. This graph will be returned on GET request on the URL of the 
enhancement engine. If no configuration is known for the engine this MUST at 
least return a single triple with the name for the engine.
+
+_TODO:_ To correctly construct this graph the Engine needs to know this URL. 
This could e.g. be provided by some OSGI environment parameter set by the 
JerseyApplication. As an alternative we could also parse this URI as an 
parameter to the getEngineConfig method.
+-->
+
+<p>Other components such as enhancement Chains do refer to engines by their 
name. The actual enhancement engine instance is only looked up shortly before 
the execution.</p>
+<h3 id="enhancement_engine_name_conflicts">Enhancement Engine Name 
Conflicts</h3>
+<p>As enhancement engines are identified by the value of the 
"stanbol.enhancer.engine.name" property - the name - there might be cases where 
multiple enhancement engine are registered for the same name. In such cases the 
normal OSGI procedure to select the default service instance of several 
possible matches is used. This means that</p>
+<ol>
+<li>the enhancement engine with the highest "service.ranking" and</li>
+<li>the enhancement engine with the lowest "service.id"</li>
+</ol>
+<p>will be selected on requests for a enhancement engine with a given name. 
Requests on the RESTful service API will always answer with the enhancement 
engine selected as default. When using the Java API there are also means to 
retrieve all enhancement engines for a given name via the <a 
href="enhancementenginemanager.html">Enhancement Engine Manager</a> 
interface.</p>
+<p>Out of a user perspective there is one major use case for configuring 
multiple enhancement engines for the same name. This is to allow the definition 
of fallback engines if the main one becomes unavailable. e.g. lets assume that 
a user has a local cache of geonames.org loaded into the Entityhub and 
configures an <a href="keywordlinkingengine.html">Named Entity Linking</a> 
engine to perform semantic lifting of extracted locations. However Stanbol also 
provides the <a href="geonamesengine.html">geonames.org Engine</a> that 
provides a similar functionality by directly accessing <a 
href="http://geonames.org";>geonames.org</a>. By configuring both engines for 
the same name, but specifying a higher service ranking for the one using the 
local cache one can ensure that the local cache is used for the enhancement 
under normal circumstances. However in case the local cache becomes unavailable 
the other engine using the remote service will be used for enhancement.</p>
+<h3 id="enhancement_engine_manager_interface">Enhancement Engine Manager 
Interface</h3>
+<p>The <a href="enhancementenginemanager.html">Enhancement Engine Manager</a> 
is the management interface for enhancement engines that can be used by 
components to lookup enhancement engines based on their name. There is also 
OSGI ServiceTracker like implementation that can be used to track only 
enhancement engines registered for a specific set of names. </p>
+<h2 id="enhancement_engine_implementations">Enhancement Engine 
Implementations</h2>
+<p>A list of enhancement engine implementations maintained directly by the 
Apache Stanbol community can be found <a href="../../engines.html">here</a>.
+However the enhancement engine interface is designed in a way that it should 
be possible for advanced Apache Stanbol users to implement own enhancement 
engine implementations fulfilling their special needs.</p>
+<p>The Stanbol Community would be very happy if users decide to share thoughts 
about possible enhancement engines or even would like to contribute addition 
engines to the Apache Stanbol project.</p>
+  </div>
+  
+  <div id="footer">
+    <div class="copyright">
+      <p>
+        Copyright &copy; 2010 The Apache Software Foundation, Licensed under 
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0";>Apache 
License, Version 2.0</a>.
+        <br />
+        Apache, Stanbol and the Apache feather and Stanbol logos are 
trademarks of The Apache Software Foundation.
+      </p>
+    </div>
+  </div>
+  
+</body>
+</html>

Added: 
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/engines/list.html
==============================================================================
--- 
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/engines/list.html
 (added)
+++ 
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/engines/list.html
 Fri Feb 10 16:38:40 2012
@@ -0,0 +1,149 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 
"http://www.w3.org/TR/html4/loose.dtd";>
+<html>
+<head>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE- 2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+  <link href="/stanbol/css/stanbol.css" rel="stylesheet" type="text/css">
+  <title>Apache Stanbol - Enhancement Engines and their main features</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <link rel="icon" type="image/png" 
href="/stanbol/images/stanbol-logo/stanbol-favicon.png"/>
+</head>
+
+<body>
+  <div id="navigation"> 
+  <a href="/stanbol/index.html"><img alt="Apache Stanbol" width="220" 
height="101" border="0" 
src="/stanbol/images/stanbol-logo/stanbol-2010-12-14.png"/></a>
+  <h1 id="stanbol">Stanbol</h1>
+<ul>
+<li><a href="/stanbol/index.html">Home</a></li>
+<li><a href="/stanbol/docs/trunk/tutorial.html">Tutorial</a></li>
+<li><a href="/stanbol/docs/trunk/">Documentation</a></li>
+<li><a href="/stanbol/docs/trunk/building.html">Building</a></li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/mailinglists.html">Mailing Lists</a></li>
+<li><a href="https://issues.apache.org/jira/browse/STANBOL";>Issue 
Tracker</a></li>
+<li><a href="/stanbol/team.html">Project Team</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0";>License</a></li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/downloads.html">Overview</a></li>
+</ul>
+<h1 id="the_asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org";>Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html";>Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html";>Become a 
Sponsor</a></li>
+<li><a href="http://www.apache.org/security/";>Security</a></li>
+</ul>
+  </div>
+  
+  <div id="content">
+    <h1 class="title">Enhancement Engines and their main features</h1>
+    <h2 id="preprocessing">Preprocessing</h2>
+<ul>
+<li><strong><a href="enhancer/engines/langidengine.html">Language 
Identification Engine</a></strong><ul>
+<li>language detection for textual content utilizing <a 
href="http://tika.apache.org/";>Apache Tika</a></li>
+</ul>
+</li>
+<li>
+<p><strong><a href="enhancer/engines/metaxaengine.html">Metaxa 
Engine</a></strong></p>
+<ul>
+<li>text extraction from various document formats</li>
+<li>extraction of metadata from document formats
+-</li>
+</ul>
+</li>
+</ul>
+<h2 id="natural_language_processing">Natural Language Processing</h2>
+<ul>
+<li><strong><a href="enhancer/engines/namedentityextractionengine.html">Named 
Entity Extraction Enhancement Engine</a></strong> <ul>
+<li>NLP processing using OpenNLP NER</li>
+<li>detects occurrences of persons, places and organizations only</li>
+</ul>
+</li>
+<li>
+<p><strong><a 
href="enhancer/engines/keywordlinkingengine.html">KeywordLinkingEngine</a></strong></p>
+<ul>
+<li>NLP processing using OpenNLP</li>
+<li>supports multiple languages</li>
+<li>detects occurrences of untyped entities as concepts, takes local 
taxonomies as linking target</li>
+</ul>
+</li>
+<li>
+<p><em>Taxonomy Linking Engine</em> (deprecated, see KeywordLinkingEngine)</p>
+<ul>
+<li>NLP processing using OpenNLP POS</li>
+<li>detect occurrences of untyped entities as concepts, takes local taxonomies 
as linking target</li>
+</ul>
+</li>
+</ul>
+<h2 id="linking_suggestions">Linking Suggestions</h2>
+<ul>
+<li><strong><a href="enhancer/engines/namedentitytaggingengine.html">Named 
Entity Tagging Engine</a></strong><ul>
+<li>suggest links to several Linked Data Sources (e.g. DBpedia)</li>
+</ul>
+</li>
+<li>
+<p><strong><a href="enhancer/engines/geonamesengine.html">Geonames Enhancement 
Engine</a></strong> </p>
+<ul>
+<li>suggests links to geonames.org</li>
+<li>provides hierarchical links for locations</li>
+</ul>
+</li>
+<li>
+<p><strong><a href="enhancer/engines/opencalaisengine.html">OpenCalais 
Enhancement Engine</a></strong></p>
+<ul>
+<li>integrates service from Open Calais. (Note: You need to provide a key in 
order to use this engine)</li>
+</ul>
+</li>
+<li>
+<p><strong><a href="enhancer/engines/zemantaengine.html">Zemanta Enhancement 
Engine</a></strong></p>
+<ul>
+<li>integrates the Zemanta services. (Note: You need to provide a key in order 
to use this engine)</li>
+</ul>
+</li>
+</ul>
+<h2 id="postprocessing__other">Postprocessing / Other</h2>
+<ul>
+<li><em>CachingDereferencerEngine</em> (deprecated, see dereferencing support 
of individual engines as well as  <a 
href="https://issues.apache.org/jira/browse/STANBOL-336";>STANBOL-336</a>)<ul>
+<li>retrieves additional content for presenting the enhancement results.</li>
+</ul>
+</li>
+<li>
+<p><strong><a href="enhancer/engines/refactorengine.html">Refactor 
Engine</a></strong>
+        - transforms enhancements according to a target ontology, requires 
KRES launcher.</p>
+</li>
+</ul>
+  </div>
+  
+  <div id="footer">
+    <div class="copyright">
+      <p>
+        Copyright &copy; 2010 The Apache Software Foundation, Licensed under 
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0";>Apache 
License, Version 2.0</a>.
+        <br />
+        Apache, Stanbol and the Apache feather and Stanbol logos are 
trademarks of The Apache Software Foundation.
+      </p>
+    </div>
+  </div>
+  
+</body>
+</html>

Added: 
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/index.html
==============================================================================
--- 
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/index.html 
(added)
+++ 
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/index.html 
Fri Feb 10 16:38:40 2012
@@ -0,0 +1,124 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 
"http://www.w3.org/TR/html4/loose.dtd";>
+<html>
+<head>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE- 2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+  <link href="/stanbol/css/stanbol.css" rel="stylesheet" type="text/css">
+  <title>Apache Stanbol - Enhancer</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <link rel="icon" type="image/png" 
href="/stanbol/images/stanbol-logo/stanbol-favicon.png"/>
+</head>
+
+<body>
+  <div id="navigation"> 
+  <a href="/stanbol/index.html"><img alt="Apache Stanbol" width="220" 
height="101" border="0" 
src="/stanbol/images/stanbol-logo/stanbol-2010-12-14.png"/></a>
+  <h1 id="stanbol">Stanbol</h1>
+<ul>
+<li><a href="/stanbol/index.html">Home</a></li>
+<li><a href="/stanbol/docs/trunk/tutorial.html">Tutorial</a></li>
+<li><a href="/stanbol/docs/trunk/">Documentation</a></li>
+<li><a href="/stanbol/docs/trunk/building.html">Building</a></li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/mailinglists.html">Mailing Lists</a></li>
+<li><a href="https://issues.apache.org/jira/browse/STANBOL";>Issue 
Tracker</a></li>
+<li><a href="/stanbol/team.html">Project Team</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0";>License</a></li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/downloads.html">Overview</a></li>
+</ul>
+<h1 id="the_asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org";>Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html";>Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html";>Become a 
Sponsor</a></li>
+<li><a href="http://www.apache.org/security/";>Security</a></li>
+</ul>
+  </div>
+  
+  <div id="content">
+    <h1 class="title">Enhancer</h1>
+    <p>This stateless interface allows the caller to submit content to the 
Apache Stanbol <a href="engines/">enhancer engines</a> and get the resulting 
enhancements formatted as RDF at once without storing anything on the 
server-side.</p>
+<p>The content to analyze should be sent in a POST request with the mimetype 
specified in the Content-type header. The response will hold the RDF 
enhancement serialized in the format specified in the Accept header:</p>
+<div class="codehilite"><pre><span class="n">curl</span> <span 
class="o">-</span><span class="n">X</span> <span class="n">POST</span> <span 
class="o">-</span><span class="n">H</span> <span class="s">&quot;Accept: 
text/turtle&quot;</span> <span class="o">-</span><span class="n">H</span> <span 
class="s">&quot;Content-type: text/plain&quot;</span> <span class="o">\</span>
+<span class="o">--</span><span class="n">data</span> <span 
class="s">&quot;John Smith was born in London.&quot;</span> <span 
class="n">http:</span><span class="sr">//</span><span 
class="n">localhost:8080</span><span class="o">/</span><span 
class="n">engines</span>
+</pre></div>
+
+
+<p>The list of mimetypes accepted as inputs depends on the deployed engines. 
By default only text/plain content will be analyzed.</p>
+<h2 id="list_of_available_enhancement_engines">List of Available Enhancement 
Engines</h2>
+<p>Apache Stanbol comes with a <a href="engines/list.html">list of predefined 
enhancement engines</a>. These engines are supported by the Apache Stanbol 
community. If you would like to implement your own enhancement engine, you 
should go on reading this documentation.</p>
+<h2 id="main_interfaces_and_utilities">Main Interfaces and Utilities</h2>
+<p>A <strong><a href="contentitem.html">Content Item</a></strong> is the unit 
of content that Stanbol Enhancer can deal with. It gives access to the binary 
content that was registered, and the graph that represents its metadata 
(provided by client and/or generated). The <strong><a 
href="engines/">Enhancement Engine</a></strong> provides the interface to 
internal or external semantic enhancement engines. There will usually be 
several of those, that the EnhancementJobManager uses to enhance content items. 
The <strong>Enhancement Job Manager</strong> accepts requests for enhancing 
ContentItems, and processes them either synchronously or asynchronously (as 
decided by the enhancement engines or by configuration). The 
<strong>Enhancement Engine Helper</strong> provides the classes for the 
resulting enhancement structure according to the defined <strong>Enhancement 
Structure</strong>.</p>
+<h2 id="enhancement_structure">Enhancement Structure</h2>
+<p>The enhancement structure for Apache Stanbol is been described <a 
href="http://wiki.iks-project.eu/index.php/EnhancementStructure";>here</a> in 
full. It defines the types and properties used for the resulting metadata graph 
of Apache Stanbol. <em>Note: There is a proposal and ongoing discussion to 
update this structure in the future.</em> Every <strong>Enhancement</strong> 
type is a description which contains the following important properties:</p>
+<ul>
+<li>creator: the specific enhancement engine creating this enhancement</li>
+<li>creation time: the local system time, when the annotation was created</li>
+<li>extracted-from: the content item for the enhancement. This links to the ID 
of the content item as assigned by Stanbol.</li>
+<li>type: the type of the enhancement (e.g. Location, Person, Location, 
Concept ...).</li>
+<li>confidence: The level of confidence in the range from 0 to 1 </li>
+</ul>
+<p>A <strong>Text Annotation</strong> type provides metadata for the selected 
text. This is intended to be used in addition to the enhancement type if an 
enhancement is based on a part of the content.</p>
+<ul>
+<li>start: the character position of the start of the selection. If start is 
not defined it is assumed, that the selection starts at the beginning of the 
document</li>
+<li>end: the character position of the end of the selection. If end is not 
defined it is assumed, that the selection ends at the end of the document.</li>
+<li>selected-text: The text selected by the enhancement. (optional).</li>
+<li>selection-context: The context of the selected text. This adds the 
possibility to specify the context used to extract entities such as persons, 
organizations, locations ... from natural language documents.</li>
+</ul>
+<p>The <strong>Entity Annotation</strong> refer to named entities which have 
been recognized within the content. This type is intended to be used together 
with the FISE enhancement type.</p>
+<ul>
+<li>entity-reference: This refers to the URI identifying the Entity</li>
+<li>entity-label: The label(s) of the referred entity</li>
+<li>entity-type: This property can be used to specify the type of the entity 
(optional) </li>
+<li>The occurrences of the entity within the content (the exact positions 
within the text where this entity is referred) are determined by outgoing 
dc:relation links.</li>
+</ul>
+<h2 id="response_in_rdf">Response in RDF</h2>
+<p>Apache Stanbol Enhancer is able to serialize the response in the following 
RDF formats:</p>
+<div class="codehilite"><pre><span class="n">application</span><span 
class="o">/</span><span class="n">json</span> <span class="p">(</span><span 
class="n">JSON</span><span class="o">-</span><span class="n">LD</span><span 
class="p">)</span>
+<span class="n">application</span><span class="sr">/rdf+xml (RDF/</span><span 
class="n">XML</span><span class="p">)</span>
+<span class="n">application</span><span class="sr">/rdf+json (RDF/</span><span 
class="n">JSON</span><span class="p">)</span>
+<span class="n">text</span><span class="o">/</span><span 
class="n">turtle</span> <span class="p">(</span><span 
class="n">Turtle</span><span class="p">)</span>
+<span class="n">text</span><span class="o">/</span><span 
class="n">rdf</span><span class="o">+</span><span class="n">nt</span> <span 
class="p">(</span><span class="n">N</span><span class="o">-</span><span 
class="n">TRIPLES</span><span class="p">)</span>
+</pre></div>
+
+
+<p>By default the URI of the content item being enhanced is a local, non 
de-referencable URI automatically built out of a hash digest of the binary 
content. Sometimes it might be helpful to provide the URI of the content-item 
to be used in the enhancements RDF graph. This can be achieved by passing a URI 
request parameter as follows:</p>
+<div class="codehilite"><pre><span class="n">curl</span> <span 
class="o">-</span><span class="n">X</span> <span class="n">POST</span> <span 
class="o">-</span><span class="n">H</span> <span class="s">&quot;Accept: 
text/turtle&quot;</span> <span class="o">-</span><span class="n">H</span> <span 
class="s">&quot;Content-type: text/plain&quot;</span> <span class="o">\</span>
+<span class="o">--</span><span class="n">data</span> <span 
class="s">&quot;John Smith was born in London.&quot;</span> <span 
class="o">\</span>
+<span 
class="s">&quot;http://localhost:8080/engines?uri=urn:fise-example-content-item&quot;</span>
+</pre></div>
+  </div>
+  
+  <div id="footer">
+    <div class="copyright">
+      <p>
+        Copyright &copy; 2010 The Apache Software Foundation, Licensed under 
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0";>Apache 
License, Version 2.0</a>.
+        <br />
+        Apache, Stanbol and the Apache feather and Stanbol logos are 
trademarks of The Apache Software Foundation.
+      </p>
+    </div>
+  </div>
+  
+</body>
+</html>

Modified: 
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/factstore/index.html
==============================================================================
--- 
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/factstore/index.html 
(original)
+++ 
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/factstore/index.html 
Fri Feb 10 16:38:40 2012
@@ -57,10 +57,8 @@
   
   <div id="content">
     <h1 class="title">Factstore</h1>
-    <p>The FactStore is a component that let's use store relations between 
entities identified by their URIs. A relation between two or more entities is 
called a <em>fact</em>. The FactStore let's you store N-ary facts according to 
a user defined fact schema. In consequence you can store relations between N 
participating entities.</p>
-<p>The FactStore only stores the relation and not the entities itself. It only 
uses references to entities by using the entities' URI. The entities itself 
should be handled by another component, e.g. the <a 
href="../entityhub.html">EntityHub</a>. A fact is defined by a fact schema 
which is defined over types of entities.</p>
-<p>A fact schema can be defined between an arbitrary number of entities. In 
most cases a fact schema is defined between two or three entities. For example, 
the fact schema 'works-for' can be defined as a relation between entities of 
type 'Person' and 'Organization'. The Fact Store interface allows the creation 
of custom fact schemata and to store facts according to these custom 
schemata.</p>
-<p>The Fact Store provides a simple way to define and store facts. This 
component is meant to be used in scenarios where a simple solution is 
sufficient and it is not required to define a complex ontology with reasoning 
support.</p>
+    <p>The FactStore is a component that let's use store relations between 
entities identified by their URIs. A relation between two or more entities is 
called a <em>fact</em>. The FactStore let's you store N-ary facts according to 
a user defined fact schema. In consequence you can store relations between N 
participating entities. The FactStore only stores the relation and not the 
entities itself. It only uses references to entities by using the entities' 
URI. The entities itself should be handled by another component, e.g. the <a 
href="../entityhub.html">EntityHub</a>. A fact is defined by a fact schema 
which is defined over types of entities.</p>
+<p>A fact schema can be defined between an arbitrary number of entities. In 
most cases a fact schema is defined between two or three entities. For example, 
the fact schema 'works-for' can be defined as a relation between entities of 
type 'Person' and 'Organization'. The Fact Store interface allows the creation 
of custom fact schemata and to store facts according to these custom schemata. 
The Fact Store provides a simple way to define and store facts. This component 
is meant to be used in scenarios where a simple solution is sufficient and it 
is not required to define a complex ontology with reasoning support.</p>
 <p>Read on and have a look at a concrete example or go to the <a 
href="specification.html">FactStore specification</a> page for more details. If 
you need some information about its realization, read the notes about its <a 
href="implementation.html">implementation concept</a>.</p>
 <h2 id="example">Example</h2>
 <p>Imagine you want to store the fact that the person named John Doe works for 
the company Winzigweich. John Doe is represented by the URI 
http://www.doe.com/john and the company by http://www.winzigweich.de. This fact 
is stored as a relation between the entity http://www.doe.com/john and 
http://www.winzigweich.de.</p>


Reply via email to