RE: mapred.map.tasks

2006-04-20 Thread anton
Tnx. We changed this parameters in hadoop-default.xml. -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Thursday, April 20, 2006 11:53 PM To: nutch-dev@lucene.apache.org Subject: Re: mapred.map.tasks One more thing. This parameter should be set in mapred-default.xml

Re: nutch user meeting in San Francisco: May 18th

2006-04-20 Thread Doug Cutting
Folks can say whether they'll attend at: http://www.evite.com/app/publicUrl/[EMAIL PROTECTED]/nutch-1 Doug

nutch user meeting in San Francisco: May 18th

2006-04-20 Thread Stefan Groschupf
(with apologies for multiple postings) Dear Nutch users, Dear Nutch developers, Dear Hadoop developers, we would love to invite you to the Nutch user meeting in San Francisco. Date: Thursday, May 18th, 2006 Time: 7 PM. Location: Cafe Du Soleil, 200 Fillmore St, San Francisco, CA 94117. (Th

Re: mapred.map.tasks

2006-04-20 Thread Doug Cutting
One more thing. This parameter should be set in mapred-default.xml, not hadoop-site.xml or nutch-site.xml. Parameters in those latter files cannot be overridden by application settings, and mapred.map.tasks is sometimes overidden. Doug

Re: mapred.map.tasks

2006-04-20 Thread Doug Cutting
Anton Potehin wrote: We have a question on this property. Is it really preferred to set this parameter several times greater than number of available hosts? We do not understand why it should be so? It should be at least numHosts*mapred.tasktracker.tasks.maximum, so that all of the task slots

[jira] Commented: (NUTCH-173) PerHost Crawling Policy ( crawl.ignore.external.links )

2006-04-20 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-173?page=comments#action_12375421 ] Doug Cutting commented on NUTCH-173: +1, with a few modifications. Can you please re-generate this against the current sources? This patch does not apply for me. Also, t

[jira] Resolved: (NUTCH-250) Generate to log truncation caused by generate.max.per.host

2006-04-20 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-250?page=all ] Doug Cutting resolved NUTCH-250: Fix Version: 0.8-dev Resolution: Fixed Assign To: Doug Cutting I just committed this. Thanks, Rod. > Generate to log truncation caused by gener

[jira] Commented: (NUTCH-173) PerHost Crawling Policy ( crawl.ignore.external.links )

2006-04-20 Thread Christophe Noel (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-173?page=comments#action_12375300 ] Christophe Noel commented on NUTCH-173: --- We are TENS of nutch users using this precious patch. Most of nutch users are not making whole-web search engine (too much hardwa

dfs filesystem

2006-04-20 Thread Anton Potehin
Which of Linux file systems is most preferred for DFS name-node and data-node?