Allow streaming of transformed content to Blobs
-----------------------------------------------
Key: STANBOL-579
URL: https://issues.apache.org/jira/browse/STANBOL-579
Project: Stanbol
Issue Type: Sub-task
Components: Enhancer
Reporter: Rupert Westenthaler
Assignee: Rupert Westenthaler
While adapting the TikaEngine and the MetaxaEngine to the new model
ContentItemFactory pattern, i recognized that it is important to support
streaming of content to a Blob. Because otherwise those kind of engine would
need to temporary hold the whole transformed version of the content (e.g. the
extract plain/text, xhtml, ...) before they could create a new Blob via one of
the ContentItemFactory#createBlob(...) methods.
The following extension to the ContentItemFactory will avoid this issue and
allow to "stream" content to a Blob
Added Method to the ContentItemFactory
/** Creates a new ContentSink */
+ createContentSink(String mediaType) : ContentSink;
and the new Interface ContentSink
/** Getter for the OutputStream */
+ getOutputStream() : OutputStream;
/** Getter for the Blob */
+ getBlob() : Blob;
__Note:__ User MUST NOT parse the Blob of a ContentSink to any other components
until all the data are written to the OutputStream, because this may cause that
other components to read partial data when calling Blob#getStream(). This
feature is intended to reduce the memory footprint and not to support
concurrent writing and reading of data as supported by pipes.
__Intended Usage:__
This example shows a typical usage of a ContentSink within the
processEnhancement(..) method of an EnhancementEngine
ContentItem ci; //the content item to process
ContentSink plainTextSink =
contentItemFactory.createContentSink("text/plain");
Writer writer = new
OutputStreamWriter(plainTextSink.getOutputStream,"UTF-8");
try {
// parse the writer to the framework that extracts the text
} finally {
IOUtils.closeQuietly(writer);
}
//now add the Blob to the ContentItem
UriRef textBlobUri; //create an UriRef for the Blob
ci.addPart(textBlobUri, plainTextSink.getBlob());
plainTextSink = null;
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira