In my opinion, if somebody wants such a specialized parser with his own optimizations, he could simply write his own parser using nekohtml and plug into TIKA.
----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: [email protected] > -----Original Message----- > From: Jukka Zitting [mailto:[email protected]] > Sent: Tuesday, December 16, 2008 12:07 AM > To: [email protected] > Subject: Re: Extending existing Parsers - No easy to do right now, could > we make it easier? > > Hi, > > On Tue, Dec 9, 2008 at 1:04 PM, Stephane Bastian > <[email protected]> wrote: > > In any case, as you pointed out Tika might not be the best place to do > this. > > However going back to my initial short term issue, which is extending > the > > Html Parser, I would definitely take the solution you proposed earlier > if > > it's still on the table ;) > > I thought about this a bit more (see TIKA-182), and I must say that > I'd rather not apply the patch to Tika. Doing so would create an extra > binding between client code and the underlying parser library, and > would make it difficult for us to later replace the parser if we > wanted to. > > BR, > > Jukka Zitting
