Re: Update on Integration with Tika

2009-11-17 Thread Andrzej Bialecki
Julien Nioche wrote: [...] Sure, but it also mean that we would hopefully get a working version of the Tika parser quicker by not making it depend on the extension point refactoring. It could also be argued that the Tika parsing and the refactoring of the extension points are two separated iss

Re: Update on Integration with Tika

2009-11-17 Thread Julien Nioche
> >> I haven't looked yet at the way extension points work, so I don't really >> have an idea on how difficult this would be. Some of Tika's classes (mostly >> MimeType) are used explicitly in several places of the core, would we need >> to hide them behind non Tika objects in order not to have dir

Re: Update on Integration with Tika

2009-11-17 Thread Andrzej Bialecki
Julien Nioche wrote: Well ... let's consider this: in the past we used to put things under /lib/ when they were being used by more than a few plugins. Then we started using library-only plugins (e.g. lib-xml, lib-nekohtml, etc). There is a mechanism that allows us to export a

Re: Update on Integration with Tika

2009-11-17 Thread Julien Nioche
> > > Well ... let's consider this: in the past we used to put things under /lib/ > when they were being used by more than a few plugins. Then we started using > library-only plugins (e.g. lib-xml, lib-nekohtml, etc). There is a mechanism > that allows us to export any classes from a plugin so that

Re: Update on Integration with Tika

2009-11-17 Thread Andrzej Bialecki
Julien Nioche wrote: Hi guys, This is confusing. Could you please explain why various Tika parts need to be put in different places? NUTCH_HOME/lib : tika-core.jar NUTCH_HOME/tika-plugin/lib : tika-parsers.jar Tika being used by the core only for its Mimetype functionalities we only need to

Re: Update on Integration with Tika

2009-11-17 Thread Jukka Zitting
Hi, On Tue, Nov 17, 2009 at 10:24 AM, Julien Nioche wrote: > First let me explain the classloader issue. The main class in the Tika > plugin instantiates a TikaConfig object (using Tika's XML > configuration file), which tries to load the parser classes for each > mime-type Tika knows about. Reme

Re: Update on Integration with Tika

2009-11-17 Thread Julien Nioche
Hi guys, >> This is confusing. Could you please explain why various Tika parts >> need to be put in different places? NUTCH_HOME/lib : tika-core.jar NUTCH_HOME/tika-plugin/lib : tika-parsers.jar Tika being used by the core only for its Mimetype functionalities we only need to put tika-core at th

Re: Update on Integration with Tika

2009-11-16 Thread Ken Krugler
On Nov 16, 2009, at 12:00pm, Andrzej Bialecki wrote: Julien Nioche wrote: Hi, I came across the classloader issue that you mentioned but got everything to work OK by duplicating the class TikaConfiguration into the package used by my plugin. The lib tika-core goes into the main /lib dir

Re: Update on Integration with Tika

2009-11-16 Thread Andrzej Bialecki
Julien Nioche wrote: Hi, I came across the classloader issue that you mentioned but got everything to work OK by duplicating the class TikaConfiguration into the package used by my plugin. The lib tika-core goes into the main /lib dir of nutch while tika-parsers jar goes into the lib dir of t

Update on Integration with Tika

2009-11-16 Thread Julien Nioche
Hi, I came across the classloader issue that you mentioned but got everything to work OK by duplicating the class TikaConfiguration into the package used by my plugin. The lib tika-core goes into the main /lib dir of nutch while tika-parsers jar goes into the lib dir of the plugin. I now have a fi