Jukka -

Do you want to revisit the architecture regarding the information we are
currently keeping in the Content object (and will be moving momentarily)?
Specifically, the text, xml, and regexp values?  Wouldn't there be cases
where the different parsers would need their own strings identifying a
property such as title?  Should we support overriding the existing keys with
parser implementation-specific keys?  So maybe they would be something like
this?:

defaultText="title"
defaultXML=...
defaultRegExp=...
org.xyz.FooParser=...

... so perhaps the parser would look up its own class name, and fall back to
the default if it doesn't find it?

- Keith


JIRA [EMAIL PROTECTED] wrote:
> 
> 
>      [
> https://issues.apache.org/jira/browse/TIKA-46?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> ]
> 
> Jukka Zitting updated TIKA-46:
> ------------------------------
> 
>     Attachment: TIKA-46-part1.patch
> 
> Attached a patch (TIKA-46-part1.patch) for introducing a Metadata object
> to the Parser interface. This is just the first half of the complete
> solution, as we still need to find a way to pass the configuration
> information currently contained in the Content collection.
> 
>> Use Metadata in Parser
>> ----------------------
>>
>>                 Key: TIKA-46
>>                 URL: https://issues.apache.org/jira/browse/TIKA-46
>>             Project: Tika
>>          Issue Type: Improvement
>>            Reporter: Jukka Zitting
>>            Assignee: Jukka Zitting
>>         Attachments: TIKA-46-part1.patch
>>
>>
>> The Parser interface should use the Metadata framework to pass document
>> metadata in and out.
> 
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/-jira--Created%3A-%28TIKA-46%29-Use-Metadata-in-Parser-tf4584057.html#a13085997
Sent from the Apache Tika - Development mailing list archive at Nabble.com.

Reply via email to