Author: rwesten
Date: Mon Nov 28 05:41:23 2011
New Revision: 1206981

URL: http://svn.apache.org/viewvc?rev=1206981&view=rev
Log:
first version of a proposal for the Stanbol Enhancement Structure based on the 
[Annotation-Ontology](http://code.google.com/p/annotation-ontology/)

Added:
    
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/ses_annotationontology.mdtext

Added: 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/ses_annotationontology.mdtext
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/ses_annotationontology.mdtext?rev=1206981&view=auto
==============================================================================
--- 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/ses_annotationontology.mdtext
 (added)
+++ 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/ses_annotationontology.mdtext
 Mon Nov 28 05:41:23 2011
@@ -0,0 +1,157 @@
+Title: The Stanbol Enhancement Structure (PROPOSAL)
+
+Please NOTE: This is a proposal for the future version of the Enhancement 
Structure used by the Stanbol Enhancer. 
+
+**NOTES:** 
+
+* This **DOES NOT** describe the Enhancement Structure used by the current 
version of the Stanbol Enhancer! 
+* There is also an [older Proposal](stanbolenhancementstructure.html) that 
might still contain some information that are not yet contained in this version.
+
+## Background
+
+This proposal is aimed to define the "Stanbol Enhancement Structure" intended 
to be used by future version of the Stanbol Enhancer to encode Knowledge 
extracted from analyzed Documents.
+
+Currently the Stanbol Enhancer still uses the [FISE Enhancement 
Structure](http://wiki.iks-project.eu/index.php/EnhancementStructure) that 
dates back before the incubation of Stanbol to Apache. This proposal now 
suggest to base the "Stanbol Enhancement Structure" on the existing 
[Annotation-Ontology](http://code.google.com/p/annotation-ontology/wiki/Homepage).
+
+The following two sections provide a short overview about the currently used 
FISE Enhancement Structure as well as the Annotation-Ontology. As this 
information is critical to understand the suggestion made in the later parts of 
this document.
+
+### FISE Enhancement Structure
+
+The FISE Enhancement Structure defines three main Concepts:
+
+1. **FISE Enhancement**: Defines Metadata about the creation process, type of 
the Enhancement as well as relations to other Enhancements.
+2. **FISE Text Annotation**: Defines a selections within enhanced plain Text. 
Annotations about other content types are not defined.
+3. **FISE Entity Annotation**: Defines an annotation about an Entity.
+
+Each Annotation created by an Enhancement Engine MUST have the FISE 
Enhancement type as well as one of FISE Text Annotation or FISE Entity 
Annotation.
+
+The typical use is as follows:
+
+* A Text Annotation is used to define the annotated part of the document. Text 
Annotations do use the dc:type property to define the type of the extracted 
entity (e.g. as provided by Named Entity Recognition). 
+* A Entity Annotation is used to suggest Entities for a Text Annotation. 
+* Properties of the Enhancement are used to link the Text Annotation with the 
suggested Entity Annotations.
+* Enhancement Engines may also add knowledge about suggested entities 
(dereferencing of entities).
+
+Annotations like Keywords, Categories ... where discussed but never formally 
defined for the FISE Enhancement Structure.
+
+### Annotation-Ontology
+
+This Proposal describes how Stanbol can used the 
[Annotation-Ontology](http://code.google.com/p/annotation-ontology/wiki/Homepage)
 for encoding Enhancements. 
+
+From the Annotation-Ontology homepage:
+
+> Annotation Ontology (AO) is a vocabulary designed to extensively reuse 
existing domain ontologies (entities annotations or semantic tags) and to 
provide several other kind of annotations - comments, textual annotation 
(classic tags), notes, examples, erratum... - on potentially any kind of 
document (text, images, audio...) and document fragments.
+
+The following Figure gives an overview about the Annotation-Ontology as it 
shows a simple tagging like annotation of an whole document.
+
+> ![Example of annotation on a whole document with 
AO](http://annotation-ontology.googlecode.com/svn/trunk/images/Document%20Annotation%20-%20AO%20Annotation%20Ontology%20-%20by%20Paolo%20Ciccarese.png
 "Example of annotation on a whole document with AO")
+
+> Image Credit: Annotation-Ontology 
[Link](http://annotation-ontology.googlecode.com/svn/trunk/images/Document%20Annotation%20-%20AO%20Annotation%20Ontology%20-%20by%20Paolo%20Ciccarese.png)
+
+## Stanbol Enhancement Strucutre
+
+The following sections describe how the Stanbol Enhancement Structure can 
utilize the Annotation-Ontology to encode knowledge extracted form analyzed 
Content Items.
+
+### ContentItems
+
+Within the FISE Enhancement Structure the enhanced ContentItems where only 
referenced by the **fise:extracted-form** property. There was no specification 
on how to further define properties of the ContentItem. The Annotation-Ontology 
defines a much richer vocabulary for that.
+
+First an most important the Annotation-Ontology distinguished between the:
+
+* **Annotated Document**: This is the Document that is annotated
+* **Source Document**: This is the Document version that was used for the 
annotation process.
+
+> ![Source 
Documents](http://annotation-ontology.googlecode.com/svn/trunk/images/Source%20Document%202%20-%20AO%20Annotation%20Ontology%20-%20by%20Paolo%20Ciccarese.png
 "Document Annotations")
+
+> Image Credit: Annotation Ontology 
[Link](http://annotation-ontology.googlecode.com/svn/trunk/images/Source%20Document%202%20-%20AO%20Annotation%20Ontology%20-%20by%20Paolo%20Ciccarese.png)
+
+As an example: If a Web-Crawler crawls a site on the Web and stores a local 
copy for indexing, than the **Annotated Document** would use the URL of the 
document on the Web. The **Source Document** would be the ID of the locally 
cached version used for the enhancement process.
+
+#### Content Adapter and Source Documents:
+
+The Content Adapter pattern was suggested to be used to convert parsed 
documents to different Content Formats such as extracting the Plain Text of 
parsed HTML or PDF documents.
+
+The possibility to distinguish between the *Annotated Document* and the 
*Source Document* nicely supports this, because while Enhancement Engines can 
state that an Annotation is about the *Annotated Document* they can still state 
the exact *Source Document* that was used for processing. This allows e.g. to 
clearly state that the indexes of a text selection are based on the plain text 
version of the *Annotated Document*. 
+
+### Content Selectors
+
+The FISE Enhancement Structure defined a single "Content Selector" the *FISE 
Text Annotation*. The Annotation-Ontology uses a much richer Structure that 
even provides the possibility to extensions for defining specific selections 
different content types.
+
+With the Annotation-Ontology each Selector can link to both a the *Annotated 
Document* and the *Source Document*. In the following an Example for an Image 
Selection
+
+> ![Image 
Selector](http://annotation-ontology.googlecode.com/svn/trunk/images/Image%20InitEndCorner%20Selector%20-%20AO%20Annotation%20Ontology%20-%20by%20Paolo%20Ciccarese.png
 "Image Selector Example")
+
+> Image Credits: Annotation-Ontology 
[Link](http://annotation-ontology.googlecode.com/svn/trunk/images/Image%20InitEndCorner%20Selector%20-%20AO%20Annotation%20Ontology%20-%20by%20Paolo%20Ciccarese.png).
+
+#### Text Selectors
+
+The "PrefixPostfixSelector" as defined by the Text-Annotation Ontology differs 
from the currently used FISE Text Annotation. It does not define the character 
indexes and uses prefix and postfix instead of the surrounding context.
+
+Regarding backward compatibility The suggestion is to adopt the 
"PrefixPostfixSelector" but keep the start and end positions of the current 
Text Annotation. The prefix/posfix model of the "PrefixPostfixSelector" is 
definitely better than the used context of the FISE Text Annotation, because it 
allows to clearly identify the selected text even if it occurs several times in 
a given context.
+
+#### Multi Media Selectors and the Media Fragments Standard
+
+The [Media Fragments Working 
Group](http://www.w3.org/2008/WebVideo/Fragments/) of the W3C is currently 
working on a Recommendion on how to encode Fragments of Resources within so 
called [Media Fragments 
URIs](http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/).
+
+This specification defines how to encode the 
[Temporal](http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/#naming-time),
 
[Spatial](http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/#naming-space),
 
[Track](http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/#naming-track)
 and 
[ID](http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/#naming-id)
 dimensions within Document URIs but also defines processing rules (e.g. for 
Browsers) and the semantics.
+
+The proposal here is to use this specification for encoding selections within 
multi media files within the Annotation-Ontology. This will most likely require 
the definition of an MediaFragmentSelector as extension.
+
+### Annotations
+
+The FISE Enhancement Structure uses both properties of FISE Enhancements and 
FISE TextAnnotation/EntityAnnotation to describe Annotations as defined by the 
Annotation-Ontology. On the other side some properties of the FISE 
TextAnnotation are part of the Selectors within the Annotation-Ontology. 
Because of that the switch to the Annotation-Ontology will not only mean a 
change in the used Vocabulary, but also bring some structural changes. 
+
+Annotations as defined by the Annotation-Ontology are structured as follows:
+
+* An Annotation is represented by a Resource (called Annotation-Resource in 
the remaining document) with the rdf:type ao:Annotation. Special types of 
Annotations can be introduced by subclasses of ao:Annotation.
+* The Annotation-Resource may be linked to an Selector with the **ao:context** 
property. If no such link is present the Annotation-Resource is about the whole 
Document. It is also possible to link multiple Selectors with an annotation.
+* Each Annotation-Resource MUST BE linked to the *Annotated Document* by using 
the **ao:annotatesResource** property. The *Source Document* can be referenced 
by using the **ao:onSourceDocument**. It is also possible to link multiple 
Documents with an annotation.
+
+The following sub-sections will provide an overview how Text Annotations , 
Entity Annotations and Category Annotations as used by Stanbol can be expressed 
using the Annotation-Ontology
+
+#### Text Annotations
+
+Text Annotations are Annotations as typically created by NER (Named Entity 
Recognition) engines. Such Annotations select a part of a Text and assign an 
type (Person, Organization, Place ...) to that.
+
+The text selection can be expressed by using an "PrefixPostfixSelector". The 
type and the confidence of the detected named entity need to be properties of 
the Annotation class.
+
+#### Entity Annotations
+
+Entity Annotations are similar to "Qualifier" annotations as defined to the 
Annotaiton-Ontology. The *ao:hasTopic* relation is used to link the annotation 
with the related topic.
+
+#### Category Anotations
+
+Category Annotations are typically about the whole or an specific section of 
an Document. Normal Selectors can be used for defining the categorized Section. 
If no Selector is present the categorization applies to the whole document. The 
"Qualifier" annotation could also be used as a base class for categorizations.
+
+### Annotation Sets
+
+Within the Annotation-Ontologies Annotation Sets can be used to group several 
Annotations together. Although the FISE Enhancement Structure does not 
explicitly define a similar possibility the possibilities to define relations 
between FISE Enhancements are used for a similar purpose by the Stanbol 
Enhancer. Therefore the suggestion is to use this feature of the 
Annotation-Ontology to model for expressing sets of possible Categories, 
suggestions of Entities.
+
+The following figure shows an Example for an Annotation Set with a single 
Annotation
+
+> ![Annotation 
sets](http://annotation-ontology.googlecode.com/svn/trunk/images/Annotation%20Set%20-%20AO%20Annotation%20Ontology%20-%20by%20Paolo%20Ciccarese.png
 "A simple Annotation Set with a single Annotation")
+
+> Image Credits: Annotation-Ontology 
[Link](http://annotation-ontology.googlecode.com/svn/trunk/images/Annotation%20Set%20-%20AO%20Annotation%20Ontology%20-%20by%20Paolo%20Ciccarese.png)
+
+This suggests the use of Annotation Sets to formally describe situations where 
the Stanbol Enhancer need group several Annotations in order to provide users 
the possibility to select from a predefined set of options. Assigning an unique 
ID - the URI of the AnnotationSet instance - to such a collection of 
Annotations brings also the possibility for the consumer to provide explicit 
feedback to the Stanbol Enhancer (e.g. by accepting/rejecting Annotations part 
of the AnnotationSet, adding an additional Annotation to an set, ...)
+
+Note that single Annotations might be part of several annotation sets. As an 
Example take an Text Annotation for that to sets of Entity suggestions are 
generated.
+
+The suggestion is to create subclasses for common types of Annotation Sets 
uses by the Stanbol Enhancer
+
+#### Entity Suggestions
+
+With the FISE Enhancement Structure this is expressed by a 
*fise:TextAnnotation* that is linked to several *fise:EntityAnnotation*s by the 
*dc:relation* property.
+
+Expressing the same based on the Annotation-Ontology would be possible by
+
+* An Annotation Set that links to the following Annotations (by the *ao:item* 
property):
+* An TextAnnotaion including the PrefixPostfixSelector selector defining the 
actual position of the selected text within the document
+* One EntityAnnotation (extends ao:Qualifier) per suggested Entities.
+* In addition the Annotation Set also includes metadata such the the Engine 
that created the suggestions
+
+#### Category Suggestions
+
+Typically categorizations can provide more than a single Category. So grouping 
such suggestions within an AnnotationSet gives Users the possibility to 
accept/reject one or more of such suggestions. In addition it would also allow 
to distinguish sets of categorizations calculated based on disjoint sets of 
categories (e.g. a categorization based on a UserProfile with a categorization 
based on general topics or a spatial categorization.)
+
+


Reply via email to