Author: fchrist
Date: Tue Feb 14 09:13:01 2012
New Revision: 1243838
URL: http://svn.apache.org/viewvc?rev=1243838&view=rev
Log:
Content item review, fixed some typos and formatting
Modified:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/contentitem.mdtext
Modified:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/contentitem.mdtext
URL:
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/contentitem.mdtext?rev=1243838&r1=1243837&r2=1243838&view=diff
==============================================================================
---
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/contentitem.mdtext
(original)
+++
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/contentitem.mdtext
Tue Feb 14 09:13:01 2012
@@ -1,12 +1,12 @@
-Title: ContentItem
+Title: Content Item
-The ContentItem is the Object that represents the Content that is enhanced by
the Stanbol Enhancer. The ContentItem is create based on the data provided by
the enhancement request and used throughout the enhancement process to store
results. After the enhancement process finishes the ContentItem represents
therefore the result of the Stanbol enhancement process.
+The content item is the object that represents the content that is enhanced by
the Apache Stanbol enhancer. The content item is created based on the data
provided by the enhancement request and used throughout the enhancement process
to store results. Therefore, after the enhancement process has finished, the
content item represents the result of the Apache Stanbol enhancement process.
-The following section describe the interface of the ContentItem in more
details.
+The following section describes the interface of the content item in more
detail.
-### ContentParts
+### Content Parts
-ContentParts are used to represent the original content as well as
transformations of the original content (typically created by pre-processing
[EnhancementEngine](engines/enhancementengine.html) such as
[Metaxa](engines/metaxaengine.html))
+Content parts are used to represent the original content as well as
transformations of the original content (typically created by pre-processing
[enhancement engines](engines/list.html) such as the [Metaxa
engine](engines/metaxaengine.html))
/** Getter for the ContentPart based on the index */
getPart(int index, Class<T> type) : T
@@ -14,19 +14,19 @@ ContentParts are used to represent the o
getPart(UriRef uri, Class<T> type) : T
/** Getter for the ID based on the index */
getPartUri(index index) : UriRef
- /** Adds a new ContentPart to the ContentItem */
+ /** Adds a new ContentPart to the content item */
addPart(UriRef uri, Object part) : Object
-ContentParts are accessible by the index AND by there URI formatted id.
Re-adding an ContentPart will replace the old one. The index will not be
changed by this operation.
+Content parts are accessible by the index _and_ by their URI formatted ID.
Re-adding a content part will replace the old one. The index will not be
changed by this operation.
-There are two types of ContentParts:
+There are two types of content parts:
-1. ContentParts for that additional metadata are provided within the metadata
of the ContentItem. Such ContentParts are typically used to store transformed
versions of the original content. This allows e.g. engines that can only
process plain text version to query for the content part containing this
version of the parsed document.
-2. ContentParts that are registered under a predefined URI. Such ContentParts
are typically not mentioned within the metadata of the ContentItem. Typically
this is used to share intermediate enhancement results in-between enhancement
engines. An example would be Tokens, Sentens, POS tags and Chunks as extracted
by some NLP engine. Engines that want to consume such data need to know the
predefined URI. They will typically check within the "canEnhance(..)" method if
a ContentPart with this URI is present and if it has the correct type.
+1. Content parts that have additional metadata provided within the metadata of
the content item. Such content parts are typically used to store transformed
versions of the original content. This allows e.g. engines that can only
process plain text versions to query for the content part containing this
version of the parsed document.
+2. Content parts that are registered under a predefined URI. Such content
parts are typically not mentioned within the metadata of the content item. This
is used to share intermediate enhancement results between enhancement engines.
An example would be tokens, sentences, POS tags and chunks that are extracted
by some NLP engine. Engines that want to consume such data need to know the
predefined URI of the content part holding this data. They will check within
the <code>canEnhance(..)</code> method if a content part with an expected URI
is present and if it has the correct type.
-### Accessing the Main Content of the ContentItem
+### Accessing the Main Content of the Content Item
-The main content of the ContentItem refers to the content parsed by the
enhancement request (or downloaded from the URL provided by an request). For
accessing this content the following methods are available
+The main content of the content item refers to the content parsed by the
enhancement request (or downloaded from the URL provided by an request). For
accessing this content the following methods are available
/** Getter for the InputStream of the content as parsed
for the ContentItem */
@@ -36,25 +36,25 @@ The main content of the ContentItem refe
/** Getted for the Content as Blob */
+ getBlob() : Blob
-The "getStream()" and "getMimeType()" methods are shortcuts for the according
methods of Blob. Calling "contentItem.getBlob.getStream()" will return an
InputStream over the exact same content as directly calling "getStream()" on
the ContentItem. Note that the Blob interface also provides a "getParameter()"
method that allows to retrieve mime type parameters such as the charset of
textual content.
+The <code>getStream()</code> and <code>getMimeType()</code> methods are
shortcuts for the according methods of the content item's blob object. Calling
<code>contentItem.getBlob.getStream()</code> will return an InputStream over
the exact same content as directly calling <code>getStream()</code> on the
content item. Note that the blob interface also provides a
<code>getParameter()</code> method that allows to retrieve mime type parameters
such as the charset of textual content.
-The content parsed by the user is stored as ContentPart at the index '0' with
the URI of the ContentItem in the form of a Blob. Therefore calling
+The content parsed by the user is stored as content part at the index '0' with
the URI of the content item in the form of a blob. Therefore calling
contentItem.getPart(0,Blob.class)
contentItem.getPart(contentItem.getUri(),Blob.class)
contentItem.getBlob()
-MUST return all the exact same Blob instance.
+returns the same blob instance.
-### Metadata of the ContentItem
+### Metadata of the Content Item
-The metadata of the ContentItem are managed by an LockableMGraph. This is
basically a normal java.util.Collections for Triples. The only RDF specific
method is support for filtered iterators that support wildcards for subjects,
predicates and objects.
+The metadata of the content item are managed by an lockable MGraph. This is
basically a normal <code>java.util.Collections</code> for triples. The only RDF
specific method is support for filtered iterators that support wildcards for
subjects, predicates and objects.
-This graph is used to store all enhancement results as well as metadata about
the content item (such as content parts) and the enhancement process (see
[Executionmetadata](executionmetadata.html).
+This graph is used to store all enhancement results as well as metadata about
the content item (such as content parts) and the enhancement process (see
[execution metadata](executionmetadata.html).
### Read/Write locks
-During the Stanbol enhancement process as executed by the
[EnhancementJobManager](enhancementjobmanager.html) components running in
multiple threads need to access the state of the ContentItem. Because of that
the ContentItem provides the possibility to acquire locks.
+During the Apache Stanbol enhancement process as executed by the [enhancement
job manager](enhancementjobmanager.html) components running in multiple threads
need to access the state of the content item. Because of that the content item
provides the possibility to acquire locks.
/** Getter for the ReadWirteLock of a ContentItem +/
+ getLock() : java.util.concurrent.ReadWriteLock
@@ -64,11 +64,11 @@ Note also that
contentItem.getLock()
contentItem.getMetadata().getLock()
-will return the same ReadWriteLock instance.
+will return the same <code>ReadWriteLock</code> instance.
-This lock can be used request read/write locks on the ContentItem. All methods
of the ContentItem and also the MGrpah holding the metadata need to be
protected by using the lock. That means that users that do not need to product
whole sections of code do not need to brother with the usage of locks. Typical
examples are working with ContentParts, final Classes like Blob or
adding/removing a triple from the metadata.
+This lock can be used to request read/write locks on the content item. All
methods of the content item and also the <code>MGrpah</code> holding the
metadata need to be protected by using the lock. That means that users that do
not need to product whole sections of code do not need to brother with the
usage of locks. Typical examples are working with content parts, final classes
like <code>Blob</code> or adding/removing a triple from the metadata.
-However whenever components need to ensure that the data are not changed by
other threads while performing some calculations read/write locks MUST BE used.
A typical example are iterations over data returned by the MGraph. In this case
code iterating over the results should be protected against concurrent changes
by
+However, whenever components need to ensure that the data are not changed by
other threads while performing some calculations read/write locks _must be_
used. A typical example are iterations over data returned by the MGraph. In
this case code iterating over the results should be protected against
concurrent changes by
contentItem.getLock().readLock().lock();
try {
@@ -82,6 +82,4 @@ However whenever components need to ensu
contentItem.getLock().readLock().unlock()
}
-While accessing ContentItems within an
[EnhancementEngine](engines/enhancementengine.html) there is an exception to
this rule. If an engine declares that is only supports the SYNCHRONOUS
enhancement mode the [EnhancementJobManager](enhancementjobmanager.html) needs
to take care the an engine has exclusive access to the ContentItem. In that
case implementors of EnhancementEngines need not to care about using read/write
locks.
-
-
+While accessing content items within an [enhancement
engine](engines/enhancementengine.html) there is an exception to this rule. If
an engine declares that is only supports the <code>SYNCHRONOUS</code>
enhancement mode the [enhancement job manager](enhancementjobmanager.html)
needs to take care the an engine has exclusive access to the content item. In
that case implementors of enhancement engines need not to care about using
read/write locks.