[ 
https://issues.apache.org/jira/browse/STANBOL-478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Kasper reopened STANBOL-478:
-----------------------------------

      Assignee: Walter Kasper  (was: Rupert Westenthaler)

For external clients that use Metaxa for text extraction the text will not be 
visible/accessible anymore in the metadata graph. There should be at least an 
option for them to have the text included in the metadata so they can retrieve 
it there by a simple Sparql query.
                
> Change Metaxa Engine to create PlainText version as ContentPart and change 
> other Engines to retrieve PlainText version from ContentPart
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: STANBOL-478
>                 URL: https://issues.apache.org/jira/browse/STANBOL-478
>             Project: Stanbol
>          Issue Type: Improvement
>          Components: Enhancer
>            Reporter: Rupert Westenthaler
>            Assignee: Walter Kasper
>
> Instead of adding/reading the "text/plain" version of an ContentItem to/from 
> the metadata of the ContentItem the new ContentPart API should be used for 
> that.
> This will require the Metaxa Engine to store literal values of all Triples 
> with the ContentItem.getUri() as subject and
>     
>     http://www.semanticdesktop.org/ontologies/2007/01/19/nie#plainTextContent
> as property to a Blob and add this as ContentPart to the ContentItem.
> Other EnhancementEngines need than to search for a Blob with the MimeType 
> "text/plain" instead of retrieving the plain text from the metadata.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to