Hi,

On Tue, Jan 27, 2009 at 12:39 AM, Jonathan Koren <[email protected]> wrote:
> On Jan 26, 2009, at 2:15 PM, Jukka Zitting wrote:
>> Cool! However, see http://markmail.org/message/rgesbchrufeauxnw for a
>> discussion on how complex a parser implementation within Tika can
>> become until it would be better to look for (or create) an external
>> parser library for that format.
>
> I particularly liked the part where the example given as a good enough
> parser was the very parser I singled out. :)
>
> So the takeaway is "Don't be PDFBox," and "Don't be afraid to add yet
> another dependency, if reimplementing  is easy?"

Yeah. If you can do something reasonable with at most a few hundred
lines of code, then it's OK to have it in Tika. But as soon as you go
beyond that, the effort is better spent by contributing to some more
external parser library and using the result in TIka.

BR,

Jukka Zitting

Reply via email to