[
https://issues.apache.org/jira/browse/STANBOL-577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rupert Westenthaler resolved STANBOL-577.
-----------------------------------------
Resolution: Fixed
implemented and documented with #1324645
> Add Interfaces for parsing Content
> ----------------------------------
>
> Key: STANBOL-577
> URL: https://issues.apache.org/jira/browse/STANBOL-577
> Project: Stanbol
> Issue Type: Sub-task
> Components: Enhancer
> Reporter: Rupert Westenthaler
> Assignee: Rupert Westenthaler
>
> Currently different types of ContentItem define there own constructors that
> do fit there specific implementation. e.g. the InMemoryBlob defines
> constructors that allow to parse the content as ByteArray. This makes
> completely sense for this implementation, because directly allows to parse
> the data if they are already loaded in memory. The WebContentItem as an other
> example can not support a Constructor taking a byte array, because at the
> time of construction only the URL of - reference to - the content is
> available. Also for a File based ContentItem implementation a constructor
> with an byte array would not be preferable as the whole point of such an
> implementation would be to avoid to load the whole content in memory.
> However with the introduction of a factory pattern to construct ContentItems
> the interfaces used to parse content MUST be normalized - because they are
> part of the API of the ContentItemFactory interface. To solve this the
> following two interfaces are added to the Stanbol Enhancer API
> First the __ContentSource__ interface intended to be used for already
> dereferenced content
> ** the content as stream */
> + getStream() : InputStream
> /** the content as byte array */
> + getData() : byte[]
> /** optionally the media type of the content */
> + getMediaType() : String
> /** optionally the file name of the content */
> + getFileName() : String
> /** optionally additional headers */
> + getHeaders() : Map<String,List<String>>
>
> With the following default implementations:
> * StreamSource: A ContentSource wrapping an InputStream. Multiple calls to
> #getStream() will not be supported. Calls to #getData() will load the
> contents provided by the stream into memory.
> * ByteArraySource: A ContentSource implementation that internally uses a byte
> array. To be used in cases where users need to parse content to the Stanbol
> Enhancer that is already loaded in-memory. Calls to #getData() MUST NOT copy
> the internal byte array.
> * StringSource: A ContentSource implementation that directly allows to parse
> a String instance.
> Note that ContentItem/Blob implementations that
> * store the content in-memory should prefer to call ContentSource#getData()
> to retrieve the content from the ContentSource
> * stream the content to a file/database/CMS need to use
> ContentSource#getStream() to avoid loading the whole content in-memory!
> Second the __ContentReference__ interface intended to be used to create
> ContentItems/Blons for content where only a reference is available.
> /** the Reference to the content */
> + gerReference() : String
> /** dereferences the content */
> + dereference() : ContentSource
>
> With the following default implementation:
> * UrlReference: Allows to use any Java URL to reference a Content. This
> basically is a replacement for the current WebContentItem implementation.
> Both interfaces and implementations will be part of the Stanbol Enhancer
> Services API module.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira