Thank you Lewis.  We are running on a cluster of 30 bare metal systems
using Cloudera's CDH 5.7.0.  The links that you have sent are all for the
crawl script, but the commands that I'm running are use the nutch script -
such as:
./runtime/deploy/bin/nutch generate -topN 500000 -batchId Testcrawl1
./runtime/deploy/bin/nutch fetch Testcrawl1
./runtime/deploy/bin/nutch parse Testcrawl1
./runtime/deploy/bin/nutch updatedb Testcrawl1

Thanks for any other ideas!

-Joe

On Fri, May 20, 2016 at 5:07 PM, Lewis John Mcgibbney <
lewis.mcgibb...@gmail.com> wrote:

> Hi Joe,
> Please see the following line
> https://github.com/apache/nutch/blob/2.x/src/bin/crawl#L65\
> and also
> https://github.com/apache/nutch/blob/2.x/src/bin/crawl#L91
> then
> https://github.com/apache/nutch/blob/2.x/src/bin/crawl#L158
> reduce tasks are directly aligned with nodes/2
> What kind of set up it is that you are using? Apologies for late reply,
> this digest got lot.
> Lewis
>
> On Wed, May 18, 2016 at 11:46 AM, <user-digest-h...@nutch.apache.org>
> wrote:
>
> >
> >
> > From: Joseph Obernberger <joseph.obernber...@gmail.com>
> > To: user@nutch.apache.org, tien.nguyenm...@gmail.com
> > Cc:
> > Date: Mon, 16 May 2016 14:53:17 -0400
> > Subject: Re: Nutch 2.3.1 - Fetch Phase - Only 2 Reducers
> > Thank you Nguyen.  I tried modifying the yarn settings to no avail.
> Other
> > jobs, like the generate phase, create 128 reducers no problem.  It's only
> > the fetch phase that always makes 2 reducers.  Running multiple jobs does
> > work, and does increase throughput, but I'm still curious as to why I'm
> > only getting 2 reducers in the fetch phase only.
> > Thanks!
> >
> > -Joe
> >
> >
>

Reply via email to