Allow streaming of transformed content to Blobs
-----------------------------------------------

                 Key: STANBOL-579
                 URL: https://issues.apache.org/jira/browse/STANBOL-579
             Project: Stanbol
          Issue Type: Sub-task
          Components: Enhancer
            Reporter: Rupert Westenthaler
            Assignee: Rupert Westenthaler


While adapting the TikaEngine and the MetaxaEngine to the new model 
ContentItemFactory pattern, i recognized that it is important to support 
streaming of content to a Blob. Because otherwise those kind of engine would 
need to temporary hold the whole transformed version of the content (e.g. the 
extract plain/text, xhtml, ...) before they could create a new Blob via one of 
the ContentItemFactory#createBlob(...) methods.

The following extension to the ContentItemFactory will avoid this issue and 
allow to "stream" content to a Blob

Added Method to the ContentItemFactory

    /** Creates a new ContentSink */
    + createContentSink(String mediaType) : ContentSink;

and the new Interface ContentSink

    /** Getter for the OutputStream */
    + getOutputStream() : OutputStream;
    /** Getter for the Blob */
    + getBlob() : Blob;

__Note:__ User MUST NOT parse the Blob of a ContentSink to any other components 
until all the data are written to the OutputStream, because this may cause that 
other components to read partial data when calling Blob#getStream(). This 
feature is intended to reduce the memory footprint and not to support 
concurrent writing and reading of data as supported by pipes.

__Intended Usage:__

This example shows a typical usage of a ContentSink within the 
processEnhancement(..) method of an EnhancementEngine

    ContentItem ci; //the content item to process
    ContentSink plainTextSink = 
contentItemFactory.createContentSink("text/plain");
    Writer writer = new 
OutputStreamWriter(plainTextSink.getOutputStream,"UTF-8");
    try {
    // parse the writer to the framework that extracts the text
    } finally {
        IOUtils.closeQuietly(writer); 
    }
    //now add the Blob to the ContentItem
    UriRef textBlobUri; //create an UriRef for the Blob
    ci.addPart(textBlobUri, plainTextSink.getBlob());
    plainTextSink = null;



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to