[ 
https://issues.apache.org/jira/browse/TIKA-126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting reopened TIKA-126:
--------------------------------


I'm having second thoughts about this feature. It sounds useful, but few 
parsers can easily implement this without parsing the full document in any 
case, so the actual performance benefits are questionable. The downside of this 
issue is that it adds extra complexity to the otherwise clear and simple Parser 
interface.

I'm inclined to revert these changes for now, and perhaps revisit the issue 
when we have a more pressing use case for an extra parsing mode like this.

> Add Parser.parse(InputStream, Metadata) for metadata extraction
> ---------------------------------------------------------------
>
>                 Key: TIKA-126
>                 URL: https://issues.apache.org/jira/browse/TIKA-126
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>             Fix For: 0.2-incubating
>
>
> In some cases a client is just interested in the parsed metadata and not the 
> extracted text content. It is easy to ignore the text content by just passing 
> a dummy DefaultHandler to the existing parse() method, but many parsers could 
> avoid a lot of work if they knew in advance that the text content is not 
> needed.
> Thus I want to add a parse(InputStream, Metadata) signature to the Parser 
> interface. I'll also add an AbstractParser base class with a trivial 
> implementation of that method:
>     public abstract AbstractParser implements Parser {
>         public void parse(InputStream stream, Metadata metadata) {
>             parse(stream, new DefaultHandler(), metadata);
>         }
>     }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to