I assume you've already read http://wiki.apache.org/nutch/OptimizingCrawls<%20http://wiki.apache.org/nutch/OptimizingCrawls> and tried different values for :
<property> <name>fetcher.server.delay</name> <value>5.0</value> <description>The number of seconds the fetcher will delay between successive requests to the same server.</description> </property> <property> <name>fetcher.server.min.delay</name> <value>0.0</value> <description>The minimum number of seconds the fetcher will delay between successive requests to the same server. This value is applicable ONLY if fetcher.threads.per.host is greater than 1 (i.e. the host blocking is turned off).</description> </property> On 25 May 2011 12:52, webdev1977 <[email protected]> wrote: > Any ideas on how (even if it requires code changes) to speed up the > mapreduce > portion for a vertical crawl with a very (three right now) small number of > sites? > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Going-Beyond-the-Prototype-tp2923289p2984011.html > Sent from the Nutch - User mailing list archive at Nabble.com. > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com

