Hi
Nutch is not crawling all outlinks even with following property
property
namedb.max.outlinks.per.page/name
value-1/value
descriptionThe maximum number of outlinks that we'll process for a page.
If this value is nonnegative (=0), at most db.max.outlinks.per.page outlinks
will be
DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the
property of Persistent Systems Ltd. It is intended only for the use of the
individual or entity to which it is addressed. If you are not the intended
recipient, you are not authorized to read,
Hi,
I have 10 node Nutch cluster.
I have following report. Cluster have very low (slow) performance.(I am not
using indexing...using nutch as web crawler)
What following reports shows...
Even I have 10 node cluster at time shows only # running tasks as 3
Is this expected behavior or have to
Hi,
I am using nutch with 10 node cluster.
I want to configure nutch-site.xml
What is difference between mapred.map.tasks and
mapred.tasktracker.map.tasks.maximum
Or
mapred.reduce.tasks and mapred.tasktracker.reduce.tasks.maximum
Thanks
-Pravin
From: Pravin Karne
Sent: Thursday, July 02, 2009
Hi,
I am using Nutch 1.0 for web crawling.(Not using indexing of Nutch)
I have 10 Nodes cluster.
I want to optimize it's fetch performance, for this which properties I have to
change from configuration files(nutch-site.xml,hadoop0site.xml,...etc)
Thanks in advance
-Pravin
DISCLAIMER