Author: buildbot
Date: Mon Nov 28 05:41:33 2011
New Revision: 799378
Log:
Staging update by buildbot
Added:
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/ses_annotationontology.html
Added:
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/ses_annotationontology.html
==============================================================================
---
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/ses_annotationontology.html
(added)
+++
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/ses_annotationontology.html
Mon Nov 28 05:41:33 2011
@@ -0,0 +1,168 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE- 2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+ <link href="/stanbol/css/stanbol.css" rel="stylesheet" type="text/css">
+ <title>Apache Stanbol - The Stanbol Enhancement Structure (PROPOSAL)</title>
+ <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+ <link rel="icon" type="image/png"
href="/stanbol/images/stanbol-logo/stanbol-favicon.png"/>
+</head>
+
+<body>
+ <div id="navigation">
+ <img alt="Apache Stanbol" width="220" height="101"
src="/stanbol/images/stanbol-logo/stanbol-2010-12-14.png"/>
+ <h1 id="stanbol_links">Stanbol links</h1>
+<ul>
+<li><a href="/stanbol/index.html">Home</a></li>
+<li><a href="/stanbol/team.html">Project Team</a></li>
+<li><a href="/stanbol/docs/trunk/">Documentation</a></li>
+</ul>
+<h1 id="asf_links">ASF links</h1>
+<ul>
+<li><a href="http://www.apache.org">Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a
Sponsor</a></li>
+<li><a href="http://www.apache.org/security/">Security</a></li>
+</ul>
+ </div>
+
+ <div id="content">
+ <h1 class="title">The Stanbol Enhancement Structure (PROPOSAL)</h1>
+ <p>Please NOTE: This is a proposal for the future version of the
Enhancement Structure used by the Stanbol Enhancer. </p>
+<p><strong>NOTES:</strong> </p>
+<ul>
+<li>This <strong>DOES NOT</strong> describe the Enhancement Structure used by
the current version of the Stanbol Enhancer! </li>
+<li>There is also an <a href="stanbolenhancementstructure.html">older
Proposal</a> that might still contain some information that are not yet
contained in this version.</li>
+</ul>
+<h2 id="background">Background</h2>
+<p>This proposal is aimed to define the "Stanbol Enhancement Structure"
intended to be used by future version of the Stanbol Enhancer to encode
Knowledge extracted from analyzed Documents.</p>
+<p>Currently the Stanbol Enhancer still uses the <a
href="http://wiki.iks-project.eu/index.php/EnhancementStructure">FISE
Enhancement Structure</a> that dates back before the incubation of Stanbol to
Apache. This proposal now suggest to base the "Stanbol Enhancement Structure"
on the existing <a
href="http://code.google.com/p/annotation-ontology/wiki/Homepage">Annotation-Ontology</a>.</p>
+<p>The following two sections provide a short overview about the currently
used FISE Enhancement Structure as well as the Annotation-Ontology. As this
information is critical to understand the suggestion made in the later parts of
this document.</p>
+<h3 id="fise_enhancement_structure">FISE Enhancement Structure</h3>
+<p>The FISE Enhancement Structure defines three main Concepts:</p>
+<ol>
+<li><strong>FISE Enhancement</strong>: Defines Metadata about the creation
process, type of the Enhancement as well as relations to other
Enhancements.</li>
+<li><strong>FISE Text Annotation</strong>: Defines a selections within
enhanced plain Text. Annotations about other content types are not defined.</li>
+<li><strong>FISE Entity Annotation</strong>: Defines an annotation about an
Entity.</li>
+</ol>
+<p>Each Annotation created by an Enhancement Engine MUST have the FISE
Enhancement type as well as one of FISE Text Annotation or FISE Entity
Annotation.</p>
+<p>The typical use is as follows:</p>
+<ul>
+<li>A Text Annotation is used to define the annotated part of the document.
Text Annotations do use the dc:type property to define the type of the
extracted entity (e.g. as provided by Named Entity Recognition). </li>
+<li>A Entity Annotation is used to suggest Entities for a Text Annotation.
</li>
+<li>Properties of the Enhancement are used to link the Text Annotation with
the suggested Entity Annotations.</li>
+<li>Enhancement Engines may also add knowledge about suggested entities
(dereferencing of entities).</li>
+</ul>
+<p>Annotations like Keywords, Categories ... where discussed but never
formally defined for the FISE Enhancement Structure.</p>
+<h3 id="annotation-ontology">Annotation-Ontology</h3>
+<p>This Proposal describes how Stanbol can used the <a
href="http://code.google.com/p/annotation-ontology/wiki/Homepage">Annotation-Ontology</a>
for encoding Enhancements. </p>
+<p>From the Annotation-Ontology homepage:</p>
+<blockquote>
+<p>Annotation Ontology (AO) is a vocabulary designed to extensively reuse
existing domain ontologies (entities annotations or semantic tags) and to
provide several other kind of annotations - comments, textual annotation
(classic tags), notes, examples, erratum... - on potentially any kind of
document (text, images, audio...) and document fragments.</p>
+</blockquote>
+<p>The following Figure gives an overview about the Annotation-Ontology as it
shows a simple tagging like annotation of an whole document.</p>
+<blockquote>
+<p><img alt="Example of annotation on a whole document with AO"
src="http://annotation-ontology.googlecode.com/svn/trunk/images/Document%20Annotation%20-%20AO%20Annotation%20Ontology%20-%20by%20Paolo%20Ciccarese.png"
title="Example of annotation on a whole document with AO" /></p>
+<p>Image Credit: Annotation-Ontology <a
href="http://annotation-ontology.googlecode.com/svn/trunk/images/Document%20Annotation%20-%20AO%20Annotation%20Ontology%20-%20by%20Paolo%20Ciccarese.png">Link</a></p>
+</blockquote>
+<h2 id="stanbol_enhancement_strucutre">Stanbol Enhancement Strucutre</h2>
+<p>The following sections describe how the Stanbol Enhancement Structure can
utilize the Annotation-Ontology to encode knowledge extracted form analyzed
Content Items.</p>
+<h3 id="contentitems">ContentItems</h3>
+<p>Within the FISE Enhancement Structure the enhanced ContentItems where only
referenced by the <strong>fise:extracted-form</strong> property. There was no
specification on how to further define properties of the ContentItem. The
Annotation-Ontology defines a much richer vocabulary for that.</p>
+<p>First an most important the Annotation-Ontology distinguished between
the:</p>
+<ul>
+<li><strong>Annotated Document</strong>: This is the Document that is
annotated</li>
+<li><strong>Source Document</strong>: This is the Document version that was
used for the annotation process.</li>
+</ul>
+<blockquote>
+<p><img alt="Source Documents"
src="http://annotation-ontology.googlecode.com/svn/trunk/images/Source%20Document%202%20-%20AO%20Annotation%20Ontology%20-%20by%20Paolo%20Ciccarese.png"
title="Document Annotations" /></p>
+<p>Image Credit: Annotation Ontology <a
href="http://annotation-ontology.googlecode.com/svn/trunk/images/Source%20Document%202%20-%20AO%20Annotation%20Ontology%20-%20by%20Paolo%20Ciccarese.png">Link</a></p>
+</blockquote>
+<p>As an example: If a Web-Crawler crawls a site on the Web and stores a local
copy for indexing, than the <strong>Annotated Document</strong> would use the
URL of the document on the Web. The <strong>Source Document</strong> would be
the ID of the locally cached version used for the enhancement process.</p>
+<h4 id="content_adapter_and_source_documents">Content Adapter and Source
Documents:</h4>
+<p>The Content Adapter pattern was suggested to be used to convert parsed
documents to different Content Formats such as extracting the Plain Text of
parsed HTML or PDF documents.</p>
+<p>The possibility to distinguish between the <em>Annotated Document</em> and
the <em>Source Document</em> nicely supports this, because while Enhancement
Engines can state that an Annotation is about the <em>Annotated Document</em>
they can still state the exact <em>Source Document</em> that was used for
processing. This allows e.g. to clearly state that the indexes of a text
selection are based on the plain text version of the <em>Annotated
Document</em>. </p>
+<h3 id="content_selectors">Content Selectors</h3>
+<p>The FISE Enhancement Structure defined a single "Content Selector" the
<em>FISE Text Annotation</em>. The Annotation-Ontology uses a much richer
Structure that even provides the possibility to extensions for defining
specific selections different content types.</p>
+<p>With the Annotation-Ontology each Selector can link to both a the
<em>Annotated Document</em> and the <em>Source Document</em>. In the following
an Example for an Image Selection</p>
+<blockquote>
+<p><img alt="Image Selector"
src="http://annotation-ontology.googlecode.com/svn/trunk/images/Image%20InitEndCorner%20Selector%20-%20AO%20Annotation%20Ontology%20-%20by%20Paolo%20Ciccarese.png"
title="Image Selector Example" /></p>
+<p>Image Credits: Annotation-Ontology <a
href="http://annotation-ontology.googlecode.com/svn/trunk/images/Image%20InitEndCorner%20Selector%20-%20AO%20Annotation%20Ontology%20-%20by%20Paolo%20Ciccarese.png">Link</a>.</p>
+</blockquote>
+<h4 id="text_selectors">Text Selectors</h4>
+<p>The "PrefixPostfixSelector" as defined by the Text-Annotation Ontology
differs from the currently used FISE Text Annotation. It does not define the
character indexes and uses prefix and postfix instead of the surrounding
context.</p>
+<p>Regarding backward compatibility The suggestion is to adopt the
"PrefixPostfixSelector" but keep the start and end positions of the current
Text Annotation. The prefix/posfix model of the "PrefixPostfixSelector" is
definitely better than the used context of the FISE Text Annotation, because it
allows to clearly identify the selected text even if it occurs several times in
a given context.</p>
+<h4 id="multi_media_selectors_and_the_media_fragments_standard">Multi Media
Selectors and the Media Fragments Standard</h4>
+<p>The <a href="http://www.w3.org/2008/WebVideo/Fragments/">Media Fragments
Working Group</a> of the W3C is currently working on a Recommendion on how to
encode Fragments of Resources within so called <a
href="http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/">Media
Fragments URIs</a>.</p>
+<p>This specification defines how to encode the <a
href="http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/#naming-time">Temporal</a>,
<a
href="http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/#naming-space">Spatial</a>,
<a
href="http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/#naming-track">Track</a>
and <a
href="http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/#naming-id">ID</a>
dimensions within Document URIs but also defines processing rules (e.g. for
Browsers) and the semantics.</p>
+<p>The proposal here is to use this specification for encoding selections
within multi media files within the Annotation-Ontology. This will most likely
require the definition of an MediaFragmentSelector as extension.</p>
+<h3 id="annotations">Annotations</h3>
+<p>The FISE Enhancement Structure uses both properties of FISE Enhancements
and FISE TextAnnotation/EntityAnnotation to describe Annotations as defined by
the Annotation-Ontology. On the other side some properties of the FISE
TextAnnotation are part of the Selectors within the Annotation-Ontology.
Because of that the switch to the Annotation-Ontology will not only mean a
change in the used Vocabulary, but also bring some structural changes. </p>
+<p>Annotations as defined by the Annotation-Ontology are structured as
follows:</p>
+<ul>
+<li>An Annotation is represented by a Resource (called Annotation-Resource in
the remaining document) with the rdf:type ao:Annotation. Special types of
Annotations can be introduced by subclasses of ao:Annotation.</li>
+<li>The Annotation-Resource may be linked to an Selector with the
<strong>ao:context</strong> property. If no such link is present the
Annotation-Resource is about the whole Document. It is also possible to link
multiple Selectors with an annotation.</li>
+<li>Each Annotation-Resource MUST BE linked to the <em>Annotated Document</em>
by using the <strong>ao:annotatesResource</strong> property. The <em>Source
Document</em> can be referenced by using the
<strong>ao:onSourceDocument</strong>. It is also possible to link multiple
Documents with an annotation.</li>
+</ul>
+<p>The following sub-sections will provide an overview how Text Annotations ,
Entity Annotations and Category Annotations as used by Stanbol can be expressed
using the Annotation-Ontology</p>
+<h4 id="text_annotations">Text Annotations</h4>
+<p>Text Annotations are Annotations as typically created by NER (Named Entity
Recognition) engines. Such Annotations select a part of a Text and assign an
type (Person, Organization, Place ...) to that.</p>
+<p>The text selection can be expressed by using an "PrefixPostfixSelector".
The type and the confidence of the detected named entity need to be properties
of the Annotation class.</p>
+<h4 id="entity_annotations">Entity Annotations</h4>
+<p>Entity Annotations are similar to "Qualifier" annotations as defined to the
Annotaiton-Ontology. The <em>ao:hasTopic</em> relation is used to link the
annotation with the related topic.</p>
+<h4 id="category_anotations">Category Anotations</h4>
+<p>Category Annotations are typically about the whole or an specific section
of an Document. Normal Selectors can be used for defining the categorized
Section. If no Selector is present the categorization applies to the whole
document. The "Qualifier" annotation could also be used as a base class for
categorizations.</p>
+<h3 id="annotation_sets">Annotation Sets</h3>
+<p>Within the Annotation-Ontologies Annotation Sets can be used to group
several Annotations together. Although the FISE Enhancement Structure does not
explicitly define a similar possibility the possibilities to define relations
between FISE Enhancements are used for a similar purpose by the Stanbol
Enhancer. Therefore the suggestion is to use this feature of the
Annotation-Ontology to model for expressing sets of possible Categories,
suggestions of Entities.</p>
+<p>The following figure shows an Example for an Annotation Set with a single
Annotation</p>
+<blockquote>
+<p><img alt="Annotation sets"
src="http://annotation-ontology.googlecode.com/svn/trunk/images/Annotation%20Set%20-%20AO%20Annotation%20Ontology%20-%20by%20Paolo%20Ciccarese.png"
title="A simple Annotation Set with a single Annotation" /></p>
+<p>Image Credits: Annotation-Ontology <a
href="http://annotation-ontology.googlecode.com/svn/trunk/images/Annotation%20Set%20-%20AO%20Annotation%20Ontology%20-%20by%20Paolo%20Ciccarese.png">Link</a></p>
+</blockquote>
+<p>This suggests the use of Annotation Sets to formally describe situations
where the Stanbol Enhancer need group several Annotations in order to provide
users the possibility to select from a predefined set of options. Assigning an
unique ID - the URI of the AnnotationSet instance - to such a collection of
Annotations brings also the possibility for the consumer to provide explicit
feedback to the Stanbol Enhancer (e.g. by accepting/rejecting Annotations part
of the AnnotationSet, adding an additional Annotation to an set, ...)</p>
+<p>Note that single Annotations might be part of several annotation sets. As
an Example take an Text Annotation for that to sets of Entity suggestions are
generated.</p>
+<p>The suggestion is to create subclasses for common types of Annotation Sets
uses by the Stanbol Enhancer</p>
+<h4 id="entity_suggestions">Entity Suggestions</h4>
+<p>With the FISE Enhancement Structure this is expressed by a
<em>fise:TextAnnotation</em> that is linked to several
<em>fise:EntityAnnotation</em>s by the <em>dc:relation</em> property.</p>
+<p>Expressing the same based on the Annotation-Ontology would be possible
by</p>
+<ul>
+<li>An Annotation Set that links to the following Annotations (by the
<em>ao:item</em> property):</li>
+<li>An TextAnnotaion including the PrefixPostfixSelector selector defining the
actual position of the selected text within the document</li>
+<li>One EntityAnnotation (extends ao:Qualifier) per suggested Entities.</li>
+<li>In addition the Annotation Set also includes metadata such the the Engine
that created the suggestions</li>
+</ul>
+<h4 id="category_suggestions">Category Suggestions</h4>
+<p>Typically categorizations can provide more than a single Category. So
grouping such suggestions within an AnnotationSet gives Users the possibility
to accept/reject one or more of such suggestions. In addition it would also
allow to distinguish sets of categorizations calculated based on disjoint sets
of categories (e.g. a categorization based on a UserProfile with a
categorization based on general topics or a spatial categorization.)</p>
+ </div>
+
+ <div id="footer">
+ <div class="copyright">
+ <p>
+ Copyright © 2010 The Apache Software Foundation, Licensed under
+ the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache
License, Version 2.0</a>.
+ <br />
+ Apache, Stanbol and the Apache feather and Stanbol logos are
trademarks of The Apache Software Foundation.
+ </p>
+ </div>
+ </div>
+
+</body>
+</html>