Hi,
On 10/6/07, Chris Mattmann <[EMAIL PROTECTED]> wrote:
> I'll put my name out there as someone available to be the release master
> when the time comes. I've done it on Nutch before and wouldn't mind doing it
> for Tika. Just let me know if you guys agree.
+1!
> > First, I'd like to replace the current Iterable<Content> construct
> > with a Metadata object that allows metadata to be passed in and out of
> > the parser. Also, this Metadata object should be decoupled from parser
> > configuration.
>
> I completely agree. I'd like to help with this issue as the Metadata
> framework is very near and dear to my heart. What's the interface that you
> are proposing for it look like again? Something like:
>
> String parse(InputStream stream, Metadata metadata)
> throws IOException, TikaException;
Exactly.
> > Second, instead of returning the text content of a document as a
> > String, I'd like the parsers to generate SAX events with the text
> > content passed as characters() events.
>
> Then, the next evolutionary step would be:
>
> SAXEvent parse(InputStream stream, Metadata metadata)
> throws IOException, TikaException;
I'd rather go with:
void parse(InputStream stream, ContentHandler handler, Metadata metadata)
throws IOException, SAXException, TikaException;
I.e. the parser invokes a series of callback methods on the given
handler instance. This way the parse result never needs to be
contained in a single object.
BR,
Jukka Zitting