On Sun, Dec 22, 2013 at 4:39 AM, Amit Sela <am...@infolinks.com> wrote:

> Hi all,
>
> I'm trying to use the nutch ParseUtil to parse nutch Content with
> parse-tika and parse-html


By nutch content, you mean nutch segment ? Please try using the 'bin/nutch
parse' command instead.

but I keep getting:
>
> RuntimeException: x point org.apache.nutch.parse.Parser not found
>

This smells like some problem in loading the plugins.

>
> I'm running this in a MR outside of the nutch crawl jobs, and when I run it
> in IDE I have to add the build/ directory to project classpath in order to
> solve it.
>

The bin/nutch script generates appropriate classpath before invoking the
class. You can get the value of CLASSPATH formed by the script and try to
get the same in IDE. Glad that you found a way around.

>
> I hoped distributing the apache-nutch-1.7.jar (version I use) to data nodes
> classpath directories would help, I even added parse-plugins.xml but it
> won't do...
>
> I hope that you were running from "runtime/deploy" for distributed mode.
No need to distribute the jar. Hadoop does that for you. Even the configs
are inside the "runtime/deploy/apache-nutch-1.XX-.job" file.

Anyone managed that ?
>
> Thanks,
>
> Amit.
>

Reply via email to