Add Interfaces for parsing Content
----------------------------------
Key: STANBOL-577
URL: https://issues.apache.org/jira/browse/STANBOL-577
Project: Stanbol
Issue Type: Sub-task
Reporter: Rupert Westenthaler
Assignee: Rupert Westenthaler
Currently different types of ContentItem define there own constructors that do
fit there specific implementation. e.g. the InMemoryBlob defines constructors
that allow to parse the content as ByteArray. This makes completely sense for
this implementation, because directly allows to parse the data if they are
already loaded in memory. The WebContentItem as an other example can not
support a Constructor taking a byte array, because at the time of construction
only the URL of - reference to - the content is available. Also for a File
based ContentItem implementation a constructor with an byte array would not be
preferable as the whole point of such an implementation would be to avoid to
load the whole content in memory.
However with the introduction of a factory pattern to construct ContentItems
the interfaces used to parse content MUST be normalized - because they are part
of the API of the ContentItemFactory interface. To solve this the following two
interfaces are added to the Stanbol Enhancer API
First the __ContentSource__ interface intended to be used for already
dereferenced content
** the content as stream */
+ getStream() : InputStream
/** the content as byte array */
+ getData() : byte[]
/** optionally the media type of the content */
+ getMediaType() : String
/** optionally the file name of the content */
+ getFileName() : String
/** optionally additional headers */
+ getHeaders() : Map<String,List<String>>
With the following default implementations:
* StreamSource: A ContentSource wrapping an InputStream. Multiple calls to
#getStream() will not be supported. Calls to #getData() will load the contents
provided by the stream into memory.
* ByteArraySource: A ContentSource implementation that internally uses a byte
array. To be used in cases where users need to parse content to the Stanbol
Enhancer that is already loaded in-memory. Calls to #getData() MUST NOT copy
the internal byte array.
* StringSource: A ContentSource implementation that directly allows to parse a
String instance.
Note that ContentItem/Blob implementations that
* store the content in-memory should prefer to call ContentSource#getData() to
retrieve the content from the ContentSource
* stream the content to a file/database/CMS need to use
ContentSource#getStream() to avoid loading the whole content in-memory!
Second the __ContentReference__ interface intended to be used to create
ContentItems/Blons for content where only a reference is available.
/** the Reference to the content */
+ gerReference() : String
/** dereferences the content */
+ dereference() : ContentSource
With the following default implementation:
* UrlReference: Allows to use any Java URL to reference a Content. This
basically is a replacement for the current WebContentItem implementation.
Both interfaces and implementations will be part of the Stanbol Enhancer
Services API module.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira