contentitem.mdtext

rwesten Fri, 27 Jan 2012 04:50:00 -0800

Author: rwesten
Date: Fri Jan 27 12:49:31 2012
New Revision: 1236662

URL: http://svn.apache.org/viewvc?rev=1236662&view=rev
Log:
Added documentation for ContentItem


Added:
    
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/contentitem.mdtext

Added: 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/contentitem.mdtext
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/contentitem.mdtext?rev=1236662&view=auto
==============================================================================
--- 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/contentitem.mdtext
 (added)
+++ 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/contentitem.mdtext
 Fri Jan 27 12:49:31 2012
@@ -0,0 +1,87 @@
+Title: ContentItem
+
+The ContentItem is the Object that represents the Content that is enhanced by 
the Stanbol Enhancer. The ContentItem is create based on the data provided by 
the enhancement request and used throughout the enhancement process to store 
results. After the enhancement process finishes the ContentItem represents 
therefore the result of the Stanbol enhancement process.
+
+The following section describe the interface of the ContentItem in more 
details.
+
+### ContentParts
+
+ContentParts are used to represent the original content as well as 
transformations of the original content (typically created by pre-processing 
[EnhancementEngine](engines/enhancementengine.html) such as 
[Metaxa](engines/metaxaengine.html))
+
+    /** Getter for the ContentPart based on the index */
+    getPart(int index, Class<T> type) : T
+    /** Getter for the ContentPart based on its ID */
+    getPart(UriRef uri, Class<T> type) : T
+    /** Getter for the ID based on the index */
+    getPartUri(index index) : UriRef
+    /** Adds a new ContentPart to the ContentItem */
+    addPart(UriRef uri, Object part) : Object
+
+ContentParts are accessible by the index AND by there URI formatted id. 
Re-adding an ContentPart will replace the old one. The index will not be 
changed by this operation.
+
+There are two types of ContentParts:
+
+1. ContentParts for that additional metadata are provided within the metadata 
of the ContentItem. Such ContentParts are typically used to store transformed 
versions of the original content. This allows e.g. engines that can only 
process plain text version to query for the content part containing this 
version of the parsed document.
+2. ContentParts that are registered under a predefined URI. Such ContentParts 
are typically not mentioned within the metadata of the ContentItem. Typically 
this is used to share intermediate enhancement results in-between enhancement 
engines. An example would be Tokens, Sentens, POS tags and Chunks as extracted 
by some NLP engine. Engines that want to consume such data need to know the 
predefined URI. They will typically check within the "canEnhance(..)" method if 
a ContentPart with this URI is present and if it has the correct type. 
+
+### Accessing the Main Content of the ContentItem
+
+The main content of the ContentItem refers to the content parsed by the 
enhancement request (or downloaded from the URL provided by an request). For 
accessing this content the following methods are available
+     
+    /** Getter for the InputStream of the content as parsed
+        for the ContentItem */
+    + getStream() : InputStream
+    /** Getter for the mime type of the content */
+    + getMimeType() : String
+    /** Getted for the Content as Blob */
+    + getBlob() : Blob
+
+The "getStream()" and "getMimeType()" methods are shortcuts for the according 
methods of Blob. Calling "contentItem.getBlob.getStream()" will return an 
InputStream over the exact same content as directly calling "getStream()" on 
the ContentItem. Note that the Blob interface also provides a "getParameter()" 
method that allows to retrieve mime type parameters such as the charset of 
textual content.
+
+The content parsed by the user is stored as ContentPart at the index '0' with 
the URI of the ContentItem in the form of a Blob. Therefore calling
+
+   contentItem.getPart(0,Blob.class)
+   contentItem.getPart(contentItem.getUri(),Blob.class)
+   contentItem.getBlob()
+
+MUST return all the exact same Blob instance.
+
+### Metadata of the ContentItem
+
+The metadata of the ContentItem are managed by an LockableMGraph. This is 
basically a normal java.util.Collections for Triples. The only RDF specific 
method is support for filtered iterators that support wildcards for subjects, 
predicates and objects.
+
+This graph is used to store all enhancement results as well as metadata about 
the content item (such as content parts) and the enhancement process (see 
[Executionmetadata](executionmetadata.html).
+
+### Read/Write locks
+
+During the Stanbol enhancement process as executed by the 
[EnhancementJobManager](enhancementjobmanager.html) components running in 
multiple threads need to access the state of the ContentItem. Because of that 
the ContentItem provides the possibility to acquire locks.
+    
+    /** Getter for the ReadWirteLock of a ContentItem +/
+    + getLock() : java.util.concurrent.ReadWriteLock 
+
+Note also that
+
+    contentItem.getLock()
+    contentItem.getMetadata().getLock()
+
+will return the same ReadWriteLock instance.
+
+This lock can be used request read/write locks on the ContentItem. All methods 
of the ContentItem and also the MGrpah holding the metadata need to be 
protected by using the lock. That means that users that do not need to product 
whole sections of code do not need to brother with the usage of locks. Typical 
examples are working with ContentParts, final Classes like Blob or 
adding/removing a triple from the metadata.
+
+However whenever components need to ensure that the data are not changed by 
other threads while performing some calculations read/write locks MUST BE used. 
A typical example are iterations over data returned by the MGraph. In this case 
code iterating over the results should be protected against concurrent changes  
by
+
+    contentItem.getLock().readLock().lock();
+    try {
+        Iterator<Triple> it = contentItem.getMetadata().
+            filter(null,RDF.TYPE,TechnicalClasses.ENHANCER_TEXTANNOTATION);
+        while(it.hasNext()){
+            log.debug("Process TextAnnotation: {},it.next().getSubject());
+            //read the needed information
+        }
+    } finally {
+        contentItem.getLock().readLock().unlock()
+    }
+
+While accessing ContentItems within an 
[EnhancementEngine](engines/enhancementengine.html) there is an exception to 
this rule. If an engine declares that is only supports the SYNCHRONOUS 
enhancement mode the [EnhancementJobManager](enhancementjobmanager.html) needs 
to take care the an engine has exclusive access to the ContentItem. In that 
case implementors of EnhancementEngines need not to care about using read/write 
locks.
+
+

svn commit: r1236662 - /incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/contentitem.mdtext

Reply via email to