Hi Jukka, Totally agree with the parser roadmap. Thanks for this good job. I also agree with replacing Content class by Matadata class, however the metadata class should not be limited to one metadata standard example DublinCore, I think that metadata class should be extensible or generic to support multiple metadata standards.
Regards. On 10/5/07, Chris Mattmann <[EMAIL PROTECTED]> wrote: > > Hi Jukka, > > > Once TIKA-43 is committed (I'm giving it a day or two for reviews and > > comments) there are still two Parser related changes that I'd like to > > do before I think we're ready to do the first 0.1 release. > > +1, agreed. At present, we've worked through 30 JIRA issues so far (great > job guys!), and I think that the library is reaching stability and is > primed > for an official release. > > I'll put my name out there as someone available to be the release master > when the time comes. I've done it on Nutch before and wouldn't mind doing > it > for Tika. Just let me know if you guys agree. > > > > > First, I'd like to replace the current Iterable<Content> construct > > with a Metadata object that allows metadata to be passed in and out of > > the parser. Also, this Metadata object should be decoupled from parser > > configuration. > > I completely agree. I'd like to help with this issue as the Metadata > framework is very near and dear to my heart. What's the interface that you > are proposing for it look like again? Something like: > > String parse(InputStream stream, Metadata metadata) > throws IOException, TikaException; > > > > > > Second, instead of returning the text content of a document as a > > String, I'd like the parsers to generate SAX events with the text > > content passed as characters() events. > > Then, the next evolutionary step would be: > > SAXEvent parse(InputStream stream, Metadata metadata) > throws IOException, TikaException; > > ? > > > > > Unless anyone objects (feel free to do so if you have better design > > ideas!), I'll follow up with new patches for these two issues in the > > next week or two. Once these changes are done, I think we're good to > > go for the first Tika release. Such a timing would also be perfect for > > the upcoming ApacheCon US conference. :-) > > Totally agree! Great job so far: I am really starting to like this new > Parsing interface... > > Cheers, > Chris > > > > > BR, > > > > Jukka Zitting > > ______________________________________________ > Chris Mattmann, Ph.D. > [EMAIL PROTECTED] > Cognizant Development Engineer > Early Detection Research Network Project > > _________________________________________________ > Jet Propulsion Laboratory Pasadena, CA > Office: 171-266B Mailstop: 171-246 > _______________________________________________________ > > Disclaimer: The opinions presented within are my own and do not reflect > those of either NASA, JPL, or the California Institute of Technology. > > > -- --------------------------------------------------------- Rida Benjelloun Doculibre inc. [EMAIL PROTECTED] [EMAIL PROTECTED] Cel: 418-262-3222 Tel: 418-353-3390 Site Web : http://www.doculibre.com ---------------------------------------------------------
