Re: Nutch Hadoop Job plugins property

Tejas Patil Mon, 09 Dec 2013 17:18:13 -0800

When you run Nutch over Hadoop ie. deploy mode, you use the job file
(apache-nutch-1.X.job). This is nothing but a big fat zip file
containing (you can unzip it and verify yourself) :
(a) all the nutch classes compiled,
(b) config files and
(c) dependent jars

When hadoop launches map-reduce jobs for nutch:
1. This nutch job file is copied over to the node where your job is
executed (say map task),
2. It is unpacked
3. Nutch gets the nutch-site.xml and nutch-default.xml, loads the configs.
4. By default, plugin.folders is set to "plugins" which is a relative path.
It would search the plugin classes in the classpath under a directory named
"plugins".
5. The "plugins" directory is under a directory named "classes" which is in
the classpath (this is inside the extracted job file). Now, required plugin
classes are loaded from here and everything runs fine.

In short: Leave it as it is. It should work over Hadoop by default.

Thanks,
Tejas

On Mon, Dec 9, 2013 at 4:54 PM, S.L <simpleliving...@gmail.com> wrote:

> What should be the plugins property be set to when running Nutch as a
> Hadoop job ?
>
> I just created a deploy mode jar running the ant script , I see that the
> value of the plugins property is being copied and used from the
> confiuration into the hadoop job. While it seems to be getting the plugins
> directory  because Hadoop is being run on the same machine , I am sure it
> will fail when moved to a different machine.
>
> How should I set the plugins property so that it is relative to the hadoop
> job?
>
> Thanks
>

Re: Nutch Hadoop Job plugins property

Reply via email to