Hi, As you've seen, I've been refactoring the Parser classes quite heavily for the past few weeks, and now with TIKA-43 I'm reaching a milestone that already resembles the proposed interface design.
Once TIKA-43 is committed (I'm giving it a day or two for reviews and comments) there are still two Parser related changes that I'd like to do before I think we're ready to do the first 0.1 release. First, I'd like to replace the current Iterable<Content> construct with a Metadata object that allows metadata to be passed in and out of the parser. Also, this Metadata object should be decoupled from parser configuration. Second, instead of returning the text content of a document as a String, I'd like the parsers to generate SAX events with the text content passed as characters() events. Unless anyone objects (feel free to do so if you have better design ideas!), I'll follow up with new patches for these two issues in the next week or two. Once these changes are done, I think we're good to go for the first Tika release. Such a timing would also be perfect for the upcoming ApacheCon US conference. :-) BR, Jukka Zitting
