Author: agruber
Date: Mon Oct 10 09:04:43 2011
New Revision: 1180832
URL: http://svn.apache.org/viewvc?rev=1180832&view=rev
Log:
Added documentation for further engines, moved files in engines directory,
updated engines overview
Added:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/langidengine.mdtext
- copied unchanged from r1180805,
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/langidengine.mdtext
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/metaxaengine.mdtext
- copied unchanged from r1180805,
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/metaxaengine.mdtext
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/namedentityextractionengine.mdtext
- copied unchanged from r1180805,
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/namedentityextractionengine.mdtext
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/namedentitytaggingengine.mdtext
- copied unchanged from r1180805,
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/namedentitytaggingengine.mdtext
Removed:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/langidengine.mdtext
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/metaxaengine.mdtext
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/namedentityextractionengine.mdtext
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/namedentitytaggingengine.mdtext
Modified:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/engines.mdtext
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/multilingual.mdtext
Modified: incubator/stanbol/site/trunk/content/stanbol/docs/trunk/engines.mdtext
URL:
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/engines.mdtext?rev=1180832&r1=1180831&r2=1180832&view=diff
==============================================================================
--- incubator/stanbol/site/trunk/content/stanbol/docs/trunk/engines.mdtext
(original)
+++ incubator/stanbol/site/trunk/content/stanbol/docs/trunk/engines.mdtext Mon
Oct 10 09:04:43 2011
@@ -15,13 +15,13 @@ Title: Enhancement Engines and their mai
- __[Named Entity Extraction Enhancement
Engine](enhancer/engines/namedentityextractionengine.html)__
- NLP processing using OpenNLP NER
- - detect occurrences of persons, places and organizations only
+ - detects occurrences of persons, places and organizations only
- __[KeywordLinkingEngine](enhancer/engines/keywordlinkingengine.html)__
- NLP processing using OpenNLP
- supports multiple languages
- - dedect occurences of untyped entities as concepts, takes local
taxonomies as linking target
+ - detects occurrences of untyped entities as concepts, takes local
taxonomies as linking target
- _Taxonomy Linking Engine_ (deprecated, see KeywordLinkingEngine)
@@ -32,15 +32,16 @@ Title: Enhancement Engines and their mai
## Linking Suggestions
- __[Named Entity Tagging
Engine](enhancer/engines/namedentitytaggingengine.html)__
- - suggest links to several Linked Data Sources (e.g. dbpedia)
+ - suggest links to several Linked Data Sources (e.g. DBpedia)
-- __Location Enhancement Engine__
+- __[Geonames Enhancement Engine](enhancer/engines/geonamesengine.html)__
- suggests links to geonames.org
+ - provides hierarchical links for locations
-- __OpenCalais Enhancement Engine__
+- __[OpenCalais Enhancement Engine](enhancer/engines/opencalaisengine.html)__
- integrates service from Open Calais. (Note: You need to provide a key
in order to use this engine)
-- __Zemanta Enhancement Engine__
+- __[Zemanta Enhancement Engine](zemantaengine.html)__
- integrates the Zemanta services. (Note: You need to provide a key in
order to use this engine)
@@ -50,5 +51,5 @@ Title: Enhancement Engines and their mai
- _CachingDereferencerEngine_ (deprecated, see dereferencing support of
individual engines as well as
[STANBOL-336](https://issues.apache.org/jira/browse/STANBOL-336))
- retrieves additional content for presenting the enhancement results.
-- __Refactor Engine__
+- __[Refactor Engine](enhancer/engines/refactorengine.html)__
- transforms enhancements according to a target ontology,
requires KRES launcher.
Modified:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/multilingual.mdtext
URL:
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/multilingual.mdtext?rev=1180832&r1=1180831&r2=1180832&view=diff
==============================================================================
--- incubator/stanbol/site/trunk/content/stanbol/docs/trunk/multilingual.mdtext
(original)
+++ incubator/stanbol/site/trunk/content/stanbol/docs/trunk/multilingual.mdtext
Mon Oct 10 09:04:43 2011
@@ -14,14 +14,14 @@ The following languages are supported -
##Configuration steps
- Have language labels in your target data and install the index
-- Activate the LangIdEnhancementEngine and the KeywordLinkingEngine
- Add language models to your Stanbol instance
+- Activate the LangIdEnhancementEngine and the KeywordLinkingEngine
- Configure the KeywordLinkingEngine
###Install your index
-In case you want to use an index of your custom vocabulary, first [create an
index](customvocabulary.html) out of it and then add the index to your stanbol
instance. Simply paste the <code>{yourindex}.solr.zip</code> into your
<code>{stanbol-root}/sling/datafiles</code> directory and install the
respective OSGI bundle at your OSGI admin console.
+In DBpedia, there exist language labels for many entities. In case you want to
use an index of your custom vocabulary, first [create the
index](customvocabulary.html) from it and add the index to your stanbol
instance. Simply paste the <code>{yourindex}.solr.zip</code> into your
<code>{stanbol-root}/sling/datafiles</code> directory and install the
respective OSGI bundle at your OSGI admin console.
Make sure, that this index contains language labels in all languages you want
to work with and that they are properly indexed.
@@ -39,26 +39,37 @@ After this the bundles are available in
The naming of the bundles is
"org.apache.stanbol.data.opennlp.lang.{language}-*.jar".
-Add the bundle via the OSGI admin console in the bundles tab. The language
bundles will fetch and install the according
[OpenNLP](http://dev.iks-project.eu/downloads/opennlp/models-1.5/) models for
the languages you want to use.
-
-OpenNLP provides language support
+Add the bundles via the OSGI admin console in the bundles tab. The language
bundles will fetch and install the according
[OpenNLP](http://dev.iks-project.eu/downloads/opennlp/models-1.5/) models for
the languages you want to use.
-###Activate the LangID engine and the KeywordLinkingEngine
+###Activate LangID engine and KeywordLinkingEngine
Go to the admin console and deactivate some of the available engines.
Especially the standard NER engine and the Entity Linking Engines should be
deactivated, as they do not support multiple languages. At least two engines
need to be activated:
- The [Language Identification Engine](enhancer/engines/langidengine.html)
provides you with the language of the text you want to enhance, it creates a
dc:terms languaage property. The
-- The [Keyword Linking Engine](enhancer/engines/keywordlinkingengine.html)
+- The [Keyword Linking Engine](enhancer/engines/keywordlinkingengine.html)
provides you with the TextAnnotations (selects potential parts of your text) as
well as with EntitiyAnnotations (provides suggestions for links). Be aware,
that the result (especially the recall) heavily depends on the amount of
entities you have specified in your target data source.
###Configure the KeywordLinkingEngine
-(TODO)
+At the OSGI admin console, you can get the most relevant configuration options
of the Keyword Linking Engine.
+
+- **Referenced Site:** The ID of the Entityhub Referenced Site holding the
Controlled Vocabulary (e.g. a taxonomy or just a set of named entities)
+- **Label Field:** The field used to match Entities with a mentions within the
parsed text.
+- **Type Field:** The field used to retrieve the types of matched Entities.
Values of that field are expected to be URIs
+- **Redirect Field:** Entities may define redirects to other Entities (e.g.
"USA"(http://dbpedia.org/resource/USA) -> "United
States"(http://dbpedia.org/resource/United_States). Values of this field are
expected to link to other entities part of the controlled vocabulary
+- **Redirect Mode:** Defines how to process redirects of Entities mentioned in
the parsed content.. Three modes to deal with such links are supported: Ignore
redirects; Add values from redirected Entities to extracted; Follow Redirects
and suggest the redirected Entity instead of the extracted.
+- **Min Token Length:** The minimum length of Tokens used to lookup
Entities within the Controlled Vocabulary. This parameter is ignored in case a
POS (Part of Speech) tagger is available for the language of the parsed content.
+- **Suggestions:** The maximal number of suggestions returned for a single
mention. (org.apache.stanbol.enhancer.engines.keywordextraction.maxSuggestions)
+Languages
+- **Languages to process:** An empty text indicates that all languages are
processed. Use ',' as separator for languages (e.g. 'en,de' to enhance only
English and German texts).
+- **Default Matching Language:** The language used in addition to the language
detected for the analysed text to search for Entities. Typically this
configuration is an empty string to search for labels without any language
defined, but for some data sets (such as DBpedia.org) that add languages to any
labels it might improve resuls to change this configuration (e.g. to 'en' in
the case of DBpedia.org).
+
+Read the technical description of this [Enhancement
Engine](enhancer/engines/keywordlinkingengine.html) to learn about more
configuration options.
-##Examples
+##Results
-(TODO)
+Depending on your linking target dataset - the engine provides you with
enhancement suggestions using labels in your chosen language(s). Note: In the
actual version of the DBpedia index, the link directs to the english version of
the resource.
\ No newline at end of file