Nutch is not crawling all outlinks

2009-09-22 Thread Pravin Karne
Hi Nutch is not crawling all outlinks even with following property property namedb.max.outlinks.per.page/name value-1/value descriptionThe maximum number of outlinks that we'll process for a page. If this value is nonnegative (=0), at most db.max.outlinks.per.page outlinks will be

test mail

2009-07-02 Thread Pravin Karne
DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read,

Nutch is very slow....what does following graph shows

2009-07-02 Thread Pravin Karne
Hi, I have 10 node Nutch cluster. I have following report. Cluster have very low (slow) performance.(I am not using indexing...using nutch as web crawler) What following reports shows... Even I have 10 node cluster at time shows only # running tasks as 3 Is this expected behavior or have to

what is diff between mapred.map.tasks and mapred.tasktracker.map.tasks.maximum

2009-07-02 Thread Pravin Karne
Hi, I am using nutch with 10 node cluster. I want to configure nutch-site.xml What is difference between mapred.map.tasks and mapred.tasktracker.map.tasks.maximum Or mapred.reduce.tasks and mapred.tasktracker.reduce.tasks.maximum Thanks -Pravin From: Pravin Karne Sent: Thursday, July 02, 2009

How to optimize nutch's fetch perfotmance

2009-06-30 Thread Pravin Karne
Hi, I am using Nutch 1.0 for web crawling.(Not using indexing of Nutch) I have 10 Nodes cluster. I want to optimize it's fetch performance, for this which properties I have to change from configuration files(nutch-site.xml,hadoop0site.xml,...etc) Thanks in advance -Pravin DISCLAIMER