Hello All!  

When using nutch 1.3 in fully distributed mode, where does the fetching
occur? Does each node get a list of urls to fetch?  What property in
hadoop/mareduce, etc decides how many urls that a node gets to fetch?  I am
worried about memory on my nodes.  Some of the files in our enterprise are
very, very large.  Like 800mb pdf files. 

I am able to run inject on my cluster, but then the generate step fails and
I always loose one node from the cluster.  

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Nutch-1-3-Fetching-where-does-this-happen-tp3396326p3396326.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to