svn commit: r1180832 - in /incubator/stanbol/site/trunk/content/stanbol/docs/trunk: ./ enhancer/engines/

agruber Mon, 10 Oct 2011 02:05:11 -0700

Author: agruber
Date: Mon Oct 10 09:04:43 2011
New Revision: 1180832

URL: http://svn.apache.org/viewvc?rev=1180832&view=rev
Log:
Added documentation for further engines, moved files in engines directory, 
updated engines overview


Added:
    
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/langidengine.mdtext
      - copied unchanged from r1180805, 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/langidengine.mdtext
    
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/metaxaengine.mdtext
      - copied unchanged from r1180805, 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/metaxaengine.mdtext
    
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/namedentityextractionengine.mdtext
      - copied unchanged from r1180805, 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/namedentityextractionengine.mdtext
    
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/namedentitytaggingengine.mdtext
      - copied unchanged from r1180805, 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/namedentitytaggingengine.mdtext
Removed:
    incubator/stanbol/site/trunk/content/stanbol/docs/trunk/langidengine.mdtext
    incubator/stanbol/site/trunk/content/stanbol/docs/trunk/metaxaengine.mdtext
    
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/namedentityextractionengine.mdtext
    
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/namedentitytaggingengine.mdtext
Modified:
    incubator/stanbol/site/trunk/content/stanbol/docs/trunk/engines.mdtext
    incubator/stanbol/site/trunk/content/stanbol/docs/trunk/multilingual.mdtext

Modified: incubator/stanbol/site/trunk/content/stanbol/docs/trunk/engines.mdtext
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/engines.mdtext?rev=1180832&r1=1180831&r2=1180832&view=diff
==============================================================================
--- incubator/stanbol/site/trunk/content/stanbol/docs/trunk/engines.mdtext 
(original)
+++ incubator/stanbol/site/trunk/content/stanbol/docs/trunk/engines.mdtext Mon 
Oct 10 09:04:43 2011
@@ -15,13 +15,13 @@ Title: Enhancement Engines and their mai
 
 - __[Named Entity Extraction Enhancement 
Engine](enhancer/engines/namedentityextractionengine.html)__ 
        - NLP processing using OpenNLP NER
-       - detect occurrences of persons, places and organizations only
+       - detects occurrences of persons, places and organizations only
        
        
 - __[KeywordLinkingEngine](enhancer/engines/keywordlinkingengine.html)__
        - NLP processing using OpenNLP
        - supports multiple languages
-       - dedect occurences of untyped entities as concepts, takes local 
taxonomies as linking target
+       - detects occurrences of untyped entities as concepts, takes local 
taxonomies as linking target
 
        
 - _Taxonomy Linking Engine_ (deprecated, see KeywordLinkingEngine)
@@ -32,15 +32,16 @@ Title: Enhancement Engines and their mai
 ## Linking Suggestions
 
 - __[Named Entity Tagging 
Engine](enhancer/engines/namedentitytaggingengine.html)__
-       - suggest links to several Linked Data Sources (e.g. dbpedia)
+       - suggest links to several Linked Data Sources (e.g. DBpedia)
 
-- __Location Enhancement Engine__ 
+- __[Geonames Enhancement Engine](enhancer/engines/geonamesengine.html)__ 
        - suggests links to geonames.org
+       - provides hierarchical links for locations
 
-- __OpenCalais Enhancement Engine__
+- __[OpenCalais Enhancement Engine](enhancer/engines/opencalaisengine.html)__
        - integrates service from Open Calais. (Note: You need to provide a key 
in order to use this engine)
 
-- __Zemanta Enhancement Engine__
+- __[Zemanta Enhancement Engine](zemantaengine.html)__
        - integrates the Zemanta services. (Note: You need to provide a key in 
order to use this engine)
 
 
@@ -50,5 +51,5 @@ Title: Enhancement Engines and their mai
 - _CachingDereferencerEngine_ (deprecated, see dereferencing support of 
individual engines as well as  
[STANBOL-336](https://issues.apache.org/jira/browse/STANBOL-336))
        - retrieves additional content for presenting the enhancement results.
        
-- __Refactor Engine__
+- __[Refactor Engine](enhancer/engines/refactorengine.html)__
                - transforms enhancements according to a target ontology, 
requires KRES launcher.

Modified: 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/multilingual.mdtext
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/multilingual.mdtext?rev=1180832&r1=1180831&r2=1180832&view=diff
==============================================================================
--- incubator/stanbol/site/trunk/content/stanbol/docs/trunk/multilingual.mdtext 
(original)
+++ incubator/stanbol/site/trunk/content/stanbol/docs/trunk/multilingual.mdtext 
Mon Oct 10 09:04:43 2011
@@ -14,14 +14,14 @@ The following languages are supported -
 ##Configuration steps
 
 - Have language labels in your target data and install the index
-- Activate the LangIdEnhancementEngine and the KeywordLinkingEngine
 - Add language models to your Stanbol instance
+- Activate the LangIdEnhancementEngine and the KeywordLinkingEngine
 - Configure the KeywordLinkingEngine
 
 
 ###Install your index
 
-In case you want to use an index of your custom vocabulary, first [create an 
index](customvocabulary.html) out of it and then add the index to your stanbol 
instance. Simply paste the <code>{yourindex}.solr.zip</code> into your 
<code>{stanbol-root}/sling/datafiles</code> directory and install the 
respective OSGI bundle at your OSGI admin console.
+In DBpedia, there exist language labels for many entities. In case you want to 
use an index of your custom vocabulary, first [create the 
index](customvocabulary.html) from it and  add the index to your stanbol 
instance. Simply paste the <code>{yourindex}.solr.zip</code> into your 
<code>{stanbol-root}/sling/datafiles</code> directory and install the 
respective OSGI bundle at your OSGI admin console.
 
 Make sure, that this index contains language labels in all languages you want 
to work with and that they are properly indexed.
 
@@ -39,26 +39,37 @@ After this the bundles are available in 
 
 The naming of the bundles is 
"org.apache.stanbol.data.opennlp.lang.{language}-*.jar".
 
-Add the bundle via the OSGI admin console in the bundles tab. The language 
bundles will fetch and install the according 
[OpenNLP](http://dev.iks-project.eu/downloads/opennlp/models-1.5/) models for 
the languages you want to use.
-
-OpenNLP provides language support
+Add the bundles via the OSGI admin console in the bundles tab. The language 
bundles will fetch and install the according 
[OpenNLP](http://dev.iks-project.eu/downloads/opennlp/models-1.5/) models for 
the languages you want to use.
 
 
 
-###Activate the LangID engine and the KeywordLinkingEngine
+###Activate LangID engine and KeywordLinkingEngine
 
 Go to the admin console and deactivate some of the available engines. 
Especially the standard NER engine and the Entity Linking Engines should be 
deactivated, as they do not support multiple languages. At least two engines 
need to be activated:
 
 - The [Language Identification Engine](enhancer/engines/langidengine.html) 
provides you with the language of the text you want to enhance, it creates a 
dc:terms languaage property. The 
-- The [Keyword Linking Engine](enhancer/engines/keywordlinkingengine.html) 
+- The [Keyword Linking Engine](enhancer/engines/keywordlinkingengine.html) 
provides you with the TextAnnotations (selects potential parts of your text) as 
well as with EntitiyAnnotations (provides suggestions for links). Be aware, 
that the result (especially the recall) heavily depends on the amount of 
entities you have specified in your target data source.
 
 
 
 ###Configure the KeywordLinkingEngine
 
-(TODO)
+At the OSGI admin console, you can get the most relevant configuration options 
of the Keyword Linking Engine.
+
+- **Referenced Site:** The ID of the Entityhub Referenced Site holding the 
Controlled Vocabulary (e.g. a taxonomy or just a set of named entities) 
+- **Label Field:** The field used to match Entities with a mentions within the 
parsed text.
+- **Type Field:** The field used to retrieve the types of matched Entities. 
Values of that field are expected to be URIs 
+- **Redirect Field:** Entities may define redirects to other Entities (e.g. 
"USA"(http://dbpedia.org/resource/USA) -> "United 
States"(http://dbpedia.org/resource/United_States). Values of this field are 
expected to link to other entities part of the controlled vocabulary
+- **Redirect Mode:** Defines how to process redirects of Entities mentioned in 
the parsed content.. Three modes to deal with such links are supported: Ignore 
redirects; Add values from redirected Entities to extracted; Follow Redirects 
and suggest the redirected Entity instead of the extracted. 
+- **Min Token Length:**        The minimum length of Tokens used to lookup 
Entities within the Controlled Vocabulary. This parameter is ignored in case a 
POS (Part of Speech) tagger is available for the language of the parsed content.
+- **Suggestions:** The maximal number of suggestions returned for a single 
mention. (org.apache.stanbol.enhancer.engines.keywordextraction.maxSuggestions)
+Languages      
+- **Languages to process:** An empty text indicates that all languages are 
processed. Use ',' as separator for languages (e.g. 'en,de' to enhance only 
English and German texts). 
+- **Default Matching Language:** The language used in addition to the language 
detected for the analysed text to search for Entities. Typically this 
configuration is an empty string to search for labels without any language 
defined, but for some data sets (such as DBpedia.org) that add languages to any 
labels it might improve resuls to change this configuration (e.g. to 'en' in 
the case of DBpedia.org).
+
+Read the technical description of this [Enhancement  
Engine](enhancer/engines/keywordlinkingengine.html) to learn about more 
configuration options.
 
 
-##Examples
+##Results
 
-(TODO)
+Depending on your linking target dataset - the engine provides you with 
enhancement suggestions using labels in your chosen language(s). Note: In the 
actual version of the DBpedia index, the link directs to the english version of 
the resource.
\ No newline at end of file

svn commit: r1180832 - in /incubator/stanbol/site/trunk/content/stanbol/docs/trunk: ./ enhancer/engines/

Reply via email to