Author: rwesten
Date: Thu Jun 14 05:46:11 2012
New Revision: 1350093

URL: http://svn.apache.org/viewvc?rev=1350093&view=rev
Log:
first version of EntityTagging

Modified:
    
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/enhancementstructure.mdtext

Modified: 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/enhancementstructure.mdtext
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/enhancementstructure.mdtext?rev=1350093&r1=1350092&r2=1350093&view=diff
==============================================================================
--- 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/enhancementstructure.mdtext
 (original)
+++ 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/enhancementstructure.mdtext
 Thu Jun 14 05:46:11 2012
@@ -137,7 +137,55 @@ TopicAnnotation are used to categorize/c
 
 ## Entity Tagging
 
-TODO: Work in progress
+Entity Tagging is about suggesting Users Entities instead of Strings to tag 
their Documents. The difference is very easy to explain. Lets assume a Blogger 
that uses the tag "Bob Marley" to tag a blog entry. Tagging is all about 
structuring content - so by tagging it with "Bob Marley" he can not easily find 
all Documents that uses that tag. However most likely he would also want to 
create a category of Documents about Reggae music and most likely he would like 
that Documents tagged with "Bob Marley" are part of that group. 
+
+But while the knowledge that "Bob Marley" is related to "Reggae music" might 
be obvious for the Blogger it can not be known by the Blgging Tool he uses. So 
typically the only way to active this is that the Blogger tags the document 
with both tags.
+
+Entity Tagging tries to work around that by linking Documents with Entities 
defined by a knowledge base. The fact that Bob Marley is related to Reggae 
music is nothing novel. [DBpedia](http://dbpedia.org) - the Wikipedia database 
- does know that and a lot more about - the Entity - 
[dbpedia:Bob_Marley](dbpedia.org/resource/Bob_Marley). So if the blogger tags 
his Document with "dbpedia:Bob_Marley" he does not only tag it with "Bob 
Marley" but also with all the other contextual information provided by DBPedia 
- including the fact that Bob_Marley was an Reggae interpret.
+
+But this does not only work with famous people, big cities … nowadays the 
web [links data](http://linked-data.org) of different domains. However this is 
not only about the Web - it works even better if you also can use Entities 
relevant to yourself and/or your working environment (Products, CRM 
information, …).
+
+### Suggest Entities with the Stanbol Enhancer
+
+Requesting the Stanbol Enhancer to analyze a text requires to send an POST the 
the [RESTful API](enhancerrest.html) of the Stanbol Enhancer.
+
+    curl -X POST -H "Accept: application/rdf+xml" -H "Content-type: 
text/plain" \
+     --data "The Stanbol enhancer can detect famous cities such as \
+             Paris and people such as Bob Marley." 
http://{host}:{port}/enhancer
+
+As response you will receive the enhancement results formatted as RDF graph in 
the serialization specified by the "Accept" header ('application/rdf+xml' in 
the above example request). This RDF graph contains the information about the 
Entities extracted from the parsed content. 
+
+The following Figure shows how extracted entities are described in the 
enhancement results. 
+!['fise:EntityAnnotation' example](es_entityannotation.png "This Example shown 
an EntityAnnotation that suggests the Entity 'dbpedia:Bob_Marley' for the 
TextAnnotation")
+
+In principle there are two Resources that are of interest for the Entity 
tagging use case:
+
+1. EntityAnnotations: Resources with the 'rdf:type' 'fise:EntityAnnotation' do 
represent the entity suggestions by the Stanbol Enhancer. This resources 
provide the label, type and most important the URI of the extracted Entity. In 
addition the value of the fise:confidence' [0..1] can be used as indication how 
certain the Stanbol Enhancer is about this Entity. 
+2. Entities: This refers to all resources with an incoming 
'fise:entity-reference' relation (such as 'dbpedia:Bob_Marley' in the above 
example). Enhancement Engines can be configured to "dereference" suggested 
entities - meaning to use the URI of the entity to retrieve additional 
information. In this case additional information about suggested Entities will 
be available in the Enhancement results. If this in not the case users will 
need to dereference suggested entities themselves.
+
+The following steps are typically needed to acquire the information needed to 
implement an entity tagging user interface:
+
+1. Iterate over all suggested Entities: This are all resources such as 
"{entity-annotation} rdf:type fise:EntityAnnotation"
+2. Basic information: Those are available directly via the {entity-annotation} 
to ensure there availability even if the {entity} itself in not not included - 
dereferenced - in the enhancement results.
+    * URI of the suggested Entity: {entity-annotation} fise:entity-reference 
{entity}
+    * Label: The value of the fise:entity-label is typically the label via 
that the Entity was recognized in the analyzed content. Additional labels are 
typically available via the {entity}
+    * Types: Tha value of the fise:entity-type property of the 
{entity-annotation} are  the same as the rdf:type values of the {entity}.
+    * Confidence: The 'fise:confidence' value represent how confident the 
Stanbol Enhancer is about this suggestion. Values are in the range [0..1] where 
0 means very uncertain and 1 represent a high certainly.
+3. Dereferenced {entity}: Some EnhancementEngines support to add also 
information about suggested Entities to the enhancement results - in other 
words: to dereference suggested entities. In this case additional information 
about the {entity} can be retrieved directly from the enhancement results. Most 
important those information include all available labels (in all languages) of 
the Entity.
+4. Dereferencing suggested Entities: If the suggested Entity is available via 
the Stanbol Entityhub the {entity-anntotation} does have the 'entityhub:site' 
property. The value of this property is the name of the ReferencedSite of the 
Entityhub. To dereference the Entity a GET request to 
"{stanbol-root-URL}/entityhub/site/{site-name}/entity?id={entity}" need to be 
used. The "Accept" header of the request need to be set to the according RDF 
serialization (e.g. "application/rdf+json").
+
+### Content Categorizations:
+
+'fise:TopicAnnotation' instances are used to formally represent categories 
assigned to the parsed Content. The main difference between extracted Entities 
and assigned Categories is that extracted Entities do have one or more explicit 
mentions within the text while assigned Categories are suggested based on the 
document as a whole - typically they are not explicitly mentioned in the text.
+
+Typically a entity tagging UI will want to distinguish between Categories and 
Entities because:
+
+* Categories are used to group Content (e.g. Blog posts about Work and private 
things)
+* Entities are used to search/suggest Blog posts about specific topics (e.g. A 
blog about some feature implemented with "Apache Solr", a nice event in the 
"Sternbräu" in "Salzburg")
+
+The usage of 'fise:TopicAnnotation' is similar to EntityAnnotation. They do 
use the exact same properties 
('fise:entity-referene','fise:entity-label',fise:entity-type', 
'fise:confidence','entityhub:site'). The only difference is that one need to 
iterate over '{topic-anntoation} rdf:type fise:TopicAnnotaion'. So typically 
clients will want to use the exact same code to process {entity-annotation} and 
{topic-annotation} instances.
+
+In the next section "Entity Disambiguation" an improved version of Entity 
Tagging is described that allows users to: (1) accept/decline a spotted Entity 
and than (2) select one of several suggested Entities.
 
 ## Entity Disambiguation
 


Reply via email to