When you run Nutch over Hadoop ie. deploy mode, you use the job file (apache-nutch-1.X.job). This is nothing but a big fat zip file containing (you can unzip it and verify yourself) : (a) all the nutch classes compiled, (b) config files and (c) dependent jars
When hadoop launches map-reduce jobs for nutch: 1. This nutch job file is copied over to the node where your job is executed (say map task), 2. It is unpacked 3. Nutch gets the nutch-site.xml and nutch-default.xml, loads the configs. 4. By default, plugin.folders is set to "plugins" which is a relative path. It would search the plugin classes in the classpath under a directory named "plugins". 5. The "plugins" directory is under a directory named "classes" which is in the classpath (this is inside the extracted job file). Now, required plugin classes are loaded from here and everything runs fine. In short: Leave it as it is. It should work over Hadoop by default. Thanks, Tejas On Mon, Dec 9, 2013 at 4:54 PM, S.L <simpleliving...@gmail.com> wrote: > What should be the plugins property be set to when running Nutch as a > Hadoop job ? > > I just created a deploy mode jar running the ant script , I see that the > value of the plugins property is being copied and used from the > confiuration into the hadoop job. While it seems to be getting the plugins > directory because Hadoop is being run on the same machine , I am sure it > will fail when moved to a different machine. > > How should I set the plugins property so that it is relative to the hadoop > job? > > Thanks >