[ 
https://issues.apache.org/jira/browse/STANBOL-577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rupert Westenthaler resolved STANBOL-577.
-----------------------------------------

    Resolution: Fixed

implemented and documented with #1324645
                
> Add Interfaces for parsing Content
> ----------------------------------
>
>                 Key: STANBOL-577
>                 URL: https://issues.apache.org/jira/browse/STANBOL-577
>             Project: Stanbol
>          Issue Type: Sub-task
>          Components: Enhancer
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>
> Currently different types of ContentItem define there own constructors that 
> do fit there specific implementation. e.g. the InMemoryBlob defines 
> constructors that allow to parse the content as ByteArray. This makes 
> completely sense for this implementation, because directly allows to parse 
> the data if they are already loaded in memory. The WebContentItem as an other 
> example can not support a Constructor taking a byte array, because at the 
> time of construction only the URL of - reference to - the content is 
> available. Also for a File based ContentItem implementation a constructor 
> with an byte array would not be preferable as the whole point of such an 
> implementation would be to avoid to load the whole content in memory.
> However with the introduction of a factory pattern to construct ContentItems 
> the interfaces used to parse content MUST be normalized - because they are 
> part of the API of the ContentItemFactory interface. To solve this the 
> following two interfaces are added to the Stanbol Enhancer API
> First the __ContentSource__ interface intended to be used for already 
> dereferenced content
>     ** the content as stream */
>     + getStream() : InputStream
>     /** the content as byte array */
>     + getData() : byte[]
>     /** optionally the media type of the content */
>     + getMediaType() : String
>     /** optionally the file name of the content */
>     + getFileName() : String
>     /** optionally additional headers */
>     + getHeaders() : Map<String,List<String>>
>         
> With the following default implementations:
> * StreamSource: A ContentSource wrapping an InputStream. Multiple calls to 
> #getStream() will not be supported. Calls to #getData() will load the 
> contents provided by the stream into memory.
> * ByteArraySource: A ContentSource implementation that internally uses a byte 
> array. To be used in cases where users need to parse content to the Stanbol 
> Enhancer that is already loaded in-memory. Calls to #getData() MUST NOT copy 
> the internal byte array. 
> * StringSource: A ContentSource implementation that directly allows to parse 
> a String instance.
> Note that ContentItem/Blob implementations that
> * store the content in-memory should prefer to call ContentSource#getData() 
> to retrieve the content from the ContentSource
> * stream the content to a file/database/CMS need to use 
> ContentSource#getStream() to avoid loading the whole content in-memory!
> Second the __ContentReference__ interface intended to be used to create 
> ContentItems/Blons for content where only a reference is available.
>     /** the Reference to the content */
>     + gerReference() : String
>     /** dereferences the content */
>     + dereference() : ContentSource
>     
> With the following default implementation:
> * UrlReference: Allows to use any Java URL to reference a Content. This 
> basically is a replacement for the current WebContentItem implementation.
> Both interfaces and implementations will be part of the Stanbol Enhancer 
> Services API module.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to