Re: Nutch scalability tests

Lewis John Mcgibbney Tue, 02 Jul 2013 16:16:07 -0700

Hi,

On Tue, Jul 2, 2013 at 3:53 PM, h b <hb6...@gmail.com> wrote:


> So, I tried this with the generate.max.count property set to 5000, rebuild
> ant; ant jar; ant job and reran fetch.
> It still appears the same, first 79 reducers zip through and the last one
> is crawling, literally...
>

Sorry I should have been more explicit. This property does not directly
affect fetching. It is used when GENERATING fetch lists. Meaning that it
needs to be present and acknowledged at the generate phase... before
fetching is executed.
Besides this, is there any progress being made at all on the last reduce?
if you look at your CPU (and heap) for the box this is running on, it is
usual to notice high levels for both of these respectively. Maybe this
output writer is just taking a good while to write data down to HDFS...
assuming you are using 1.x.


>
> As for the logs, I mentioned on one of my earlier threads that when I run
> from the deploy directory, I am not getting any logs generated.
> I looked for the logs directory under local as well as under deploy, and
> just to make sure, also in the grid. I do not see the logs directory. So I
> created it manually under deploy before starting fetch, and still there is
> nothing in this directory,
>
>
OK so when you run Nutch as a deployed job in your logs are present within
$HADOOP_LOG_DIR... you can check some logs on the JobTracker WebApp e.g.
you will be able to see the reduce tasks for the fetch job and you will
also be able to see varying snippets or all of the log here.

Re: Nutch scalability tests

Reply via email to