Re: Nutch - Hadoop Help

Lewis John Mcgibbney Mon, 03 Feb 2014 11:59:33 -0800

Hi Manikandan,

On Mon, Feb 3, 2014 at 3:45 PM, <[email protected]> wrote:


> And then, I'm running this:
> $HADOOP_HOME/bin/hadoop jar /usr/local/nutch/nutch.job
> org.apache.nutch.crawl.Crawler dmoz -dir /user/hduser/crawl -depth 3 -topN
> 5000
>

You're using the Crawler class. This is not advised at all and is now
deprecated. There is no point in downloading the crawl script if you are
going to use the Crawler class. I would suggest you using the crawl script.


>
> org.apache.gora.memory.store.MemStore as the Gora storage class.
>

Please don't use MemStore its implementation in Gora 0.3 is not thread safe
and is only used for trivial tests. Please see the 2.x tutorial on the
Nutch wiki for details of how to configure the supported Gora persistent
data stores.


Once you've used the crawl script, and configured your Nutch deployment job
file, please get back to us with your results.
Remeber you will always need to regenerate your Nutch job file if you make
configuration changes to your Nutch deployment.
hth
Thanks

Re: Nutch - Hadoop Help

Reply via email to