Author: buildbot
Date: Fri Jan 27 12:49:39 2012
New Revision: 803419
Log:
Staging update by buildbot for stanbol
Added:
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/contentitem.html
Added:
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/contentitem.html
==============================================================================
---
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/contentitem.html
(added)
+++
websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/contentitem.html
Fri Jan 27 12:49:39 2012
@@ -0,0 +1,147 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE- 2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+ <link href="/stanbol/css/stanbol.css" rel="stylesheet" type="text/css">
+ <title>Apache Stanbol - ContentItem</title>
+ <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+ <link rel="icon" type="image/png"
href="/stanbol/images/stanbol-logo/stanbol-favicon.png"/>
+</head>
+
+<body>
+ <div id="navigation">
+ <a href="/stanbol/index.html"><img alt="Apache Stanbol" width="220"
height="101" border="0"
src="/stanbol/images/stanbol-logo/stanbol-2010-12-14.png"/></a>
+ <h1 id="stanbol">Stanbol</h1>
+<ul>
+<li><a href="/stanbol/index.html">Home</a></li>
+<li><a href="/stanbol/docs/trunk/tutorial.html">Tutorial</a></li>
+<li><a href="/stanbol/docs/trunk/">Documentation</a></li>
+<li><a href="/stanbol/docs/trunk/building.html">Building</a></li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/mailinglists.html">Mailing Lists</a></li>
+<li><a href="https://issues.apache.org/jira/browse/STANBOL">Issue
Tracker</a></li>
+<li><a href="/stanbol/team.html">Project Team</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="http://dev.iks-project.eu/downloads/stanbol-launchers/">Pre-built
Launchers</a></li>
+</ul>
+<h1 id="the_asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org">Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a
Sponsor</a></li>
+<li><a href="http://www.apache.org/security/">Security</a></li>
+</ul>
+ </div>
+
+ <div id="content">
+ <h1 class="title">ContentItem</h1>
+ <p>The ContentItem is the Object that represents the Content that is
enhanced by the Stanbol Enhancer. The ContentItem is create based on the data
provided by the enhancement request and used throughout the enhancement process
to store results. After the enhancement process finishes the ContentItem
represents therefore the result of the Stanbol enhancement process.</p>
+<p>The following section describe the interface of the ContentItem in more
details.</p>
+<h3 id="contentparts">ContentParts</h3>
+<p>ContentParts are used to represent the original content as well as
transformations of the original content (typically created by pre-processing <a
href="engines/enhancementengine.html">EnhancementEngine</a> such as <a
href="engines/metaxaengine.html">Metaxa</a>)</p>
+<div class="codehilite"><pre><span class="sr">/** Getter for the ContentPart
based on the index */</span>
+<span class="n">getPart</span><span class="p">(</span><span
class="nb">int</span> <span class="nb">index</span><span class="p">,</span>
<span class="n">Class</span><span class="sr"><T></span> <span
class="n">type</span><span class="p">)</span> <span class="p">:</span> <span
class="n">T</span>
+<span class="sr">/** Getter for the ContentPart based on its ID */</span>
+<span class="n">getPart</span><span class="p">(</span><span
class="n">UriRef</span> <span class="n">uri</span><span class="p">,</span>
<span class="n">Class</span><span class="sr"><T></span> <span
class="n">type</span><span class="p">)</span> <span class="p">:</span> <span
class="n">T</span>
+<span class="sr">/** Getter for the ID based on the index */</span>
+<span class="n">getPartUri</span><span class="p">(</span><span
class="nb">index</span> <span class="nb">index</span><span class="p">)</span>
<span class="p">:</span> <span class="n">UriRef</span>
+<span class="sr">/** Adds a new ContentPart to the ContentItem */</span>
+<span class="n">addPart</span><span class="p">(</span><span
class="n">UriRef</span> <span class="n">uri</span><span class="p">,</span>
<span class="n">Object</span> <span class="n">part</span><span
class="p">)</span> <span class="p">:</span> <span class="n">Object</span>
+</pre></div>
+
+
+<p>ContentParts are accessible by the index AND by there URI formatted id.
Re-adding an ContentPart will replace the old one. The index will not be
changed by this operation.</p>
+<p>There are two types of ContentParts:</p>
+<ol>
+<li>ContentParts for that additional metadata are provided within the metadata
of the ContentItem. Such ContentParts are typically used to store transformed
versions of the original content. This allows e.g. engines that can only
process plain text version to query for the content part containing this
version of the parsed document.</li>
+<li>ContentParts that are registered under a predefined URI. Such ContentParts
are typically not mentioned within the metadata of the ContentItem. Typically
this is used to share intermediate enhancement results in-between enhancement
engines. An example would be Tokens, Sentens, POS tags and Chunks as extracted
by some NLP engine. Engines that want to consume such data need to know the
predefined URI. They will typically check within the "canEnhance(..)" method if
a ContentPart with this URI is present and if it has the correct type. </li>
+</ol>
+<h3 id="accessing_the_main_content_of_the_contentitem">Accessing the Main
Content of the ContentItem</h3>
+<p>The main content of the ContentItem refers to the content parsed by the
enhancement request (or downloaded from the URL provided by an request). For
accessing this content the following methods are available</p>
+<div class="codehilite"><pre><span class="o">/**</span> <span
class="n">Getter</span> <span class="k">for</span> <span class="n">the</span>
<span class="n">InputStream</span> <span class="n">of</span> <span
class="n">the</span> <span class="n">content</span> <span class="n">as</span>
<span class="n">parsed</span>
+ <span class="k">for</span> <span class="n">the</span> <span
class="n">ContentItem</span> <span class="o">*/</span>
+<span class="o">+</span> <span class="n">getStream</span><span
class="p">()</span> <span class="p">:</span> <span class="n">InputStream</span>
+<span class="sr">/** Getter for the mime type of the content */</span>
+<span class="o">+</span> <span class="n">getMimeType</span><span
class="p">()</span> <span class="p">:</span> <span class="n">String</span>
+<span class="sr">/** Getted for the Content as Blob */</span>
+<span class="o">+</span> <span class="n">getBlob</span><span
class="p">()</span> <span class="p">:</span> <span class="n">Blob</span>
+</pre></div>
+
+
+<p>The "getStream()" and "getMimeType()" methods are shortcuts for the
according methods of Blob. Calling "contentItem.getBlob.getStream()" will
return an InputStream over the exact same content as directly calling
"getStream()" on the ContentItem. Note that the Blob interface also provides a
"getParameter()" method that allows to retrieve mime type parameters such as
the charset of textual content.</p>
+<p>The content parsed by the user is stored as ContentPart at the index '0'
with the URI of the ContentItem in the form of a Blob. Therefore calling</p>
+<p>contentItem.getPart(0,Blob.class)
+ contentItem.getPart(contentItem.getUri(),Blob.class)
+ contentItem.getBlob()</p>
+<p>MUST return all the exact same Blob instance.</p>
+<h3 id="metadata_of_the_contentitem">Metadata of the ContentItem</h3>
+<p>The metadata of the ContentItem are managed by an LockableMGraph. This is
basically a normal java.util.Collections for Triples. The only RDF specific
method is support for filtered iterators that support wildcards for subjects,
predicates and objects.</p>
+<p>This graph is used to store all enhancement results as well as metadata
about the content item (such as content parts) and the enhancement process (see
<a href="executionmetadata.html">Executionmetadata</a>.</p>
+<h3 id="readwrite_locks">Read/Write locks</h3>
+<p>During the Stanbol enhancement process as executed by the <a
href="enhancementjobmanager.html">EnhancementJobManager</a> components running
in multiple threads need to access the state of the ContentItem. Because of
that the ContentItem provides the possibility to acquire locks.</p>
+<div class="codehilite"><pre><span class="sr">/** Getter for the ReadWirteLock
of a ContentItem +/</span>
+<span class="o">+</span> <span class="n">getLock</span><span
class="p">()</span> <span class="p">:</span> <span class="n">java</span><span
class="o">.</span><span class="n">util</span><span class="o">.</span><span
class="n">concurrent</span><span class="o">.</span><span
class="n">ReadWriteLock</span>
+</pre></div>
+
+
+<p>Note also that</p>
+<div class="codehilite"><pre><span class="n">contentItem</span><span
class="o">.</span><span class="n">getLock</span><span class="p">()</span>
+<span class="n">contentItem</span><span class="o">.</span><span
class="n">getMetadata</span><span class="p">()</span><span
class="o">.</span><span class="n">getLock</span><span class="p">()</span>
+</pre></div>
+
+
+<p>will return the same ReadWriteLock instance.</p>
+<p>This lock can be used request read/write locks on the ContentItem. All
methods of the ContentItem and also the MGrpah holding the metadata need to be
protected by using the lock. That means that users that do not need to product
whole sections of code do not need to brother with the usage of locks. Typical
examples are working with ContentParts, final Classes like Blob or
adding/removing a triple from the metadata.</p>
+<p>However whenever components need to ensure that the data are not changed by
other threads while performing some calculations read/write locks MUST BE used.
A typical example are iterations over data returned by the MGraph. In this case
code iterating over the results should be protected against concurrent changes
by</p>
+<div class="codehilite"><pre><span class="n">contentItem</span><span
class="o">.</span><span class="n">getLock</span><span class="p">()</span><span
class="o">.</span><span class="n">readLock</span><span class="p">()</span><span
class="o">.</span><span class="n">lock</span><span class="p">();</span>
+<span class="n">try</span> <span class="p">{</span>
+ <span class="n">Iterator</span><span class="sr"><Triple></span>
<span class="n">it</span> <span class="o">=</span> <span
class="n">contentItem</span><span class="o">.</span><span
class="n">getMetadata</span><span class="p">()</span><span class="o">.</span>
+ <span class="n">filter</span><span class="p">(</span><span
class="n">null</span><span class="p">,</span><span class="n">RDF</span><span
class="o">.</span><span class="n">TYPE</span><span class="p">,</span><span
class="n">TechnicalClasses</span><span class="o">.</span><span
class="n">ENHANCER_TEXTANNOTATION</span><span class="p">);</span>
+ <span class="k">while</span><span class="p">(</span><span
class="n">it</span><span class="o">.</span><span class="n">hasNext</span><span
class="p">()){</span>
+ <span class="nb">log</span><span class="o">.</span><span
class="n">debug</span><span class="p">(</span><span
class="err">"</span><span class="n">Process</span> <span
class="n">TextAnnotation:</span> <span class="p">{},</span><span
class="n">it</span><span class="o">.</span><span class="k">next</span><span
class="p">()</span><span class="o">.</span><span
class="n">getSubject</span><span class="p">());</span>
+ <span class="sr">//</span><span class="nb">read</span> <span
class="n">the</span> <span class="n">needed</span> <span
class="n">information</span>
+ <span class="p">}</span>
+<span class="p">}</span> <span class="n">finally</span> <span
class="p">{</span>
+ <span class="n">contentItem</span><span class="o">.</span><span
class="n">getLock</span><span class="p">()</span><span class="o">.</span><span
class="n">readLock</span><span class="p">()</span><span class="o">.</span><span
class="n">unlock</span><span class="p">()</span>
+<span class="p">}</span>
+</pre></div>
+
+
+<p>While accessing ContentItems within an <a
href="engines/enhancementengine.html">EnhancementEngine</a> there is an
exception to this rule. If an engine declares that is only supports the
SYNCHRONOUS enhancement mode the <a
href="enhancementjobmanager.html">EnhancementJobManager</a> needs to take care
the an engine has exclusive access to the ContentItem. In that case
implementors of EnhancementEngines need not to care about using read/write
locks.</p>
+ </div>
+
+ <div id="footer">
+ <div class="copyright">
+ <p>
+ Copyright © 2010 The Apache Software Foundation, Licensed under
+ the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache
License, Version 2.0</a>.
+ <br />
+ Apache, Stanbol and the Apache feather and Stanbol logos are
trademarks of The Apache Software Foundation.
+ </p>
+ </div>
+ </div>
+
+</body>
+</html>