On Monday 15 August 2011 14:59:20 webdev1977 wrote: > I have been looking at pros and cons of running nutch locally in > psuedo-distributed mode. I have a very large machine with lots of > processors and memory (16gb). I am not able to get more machines to setup > a proper hadoop cluster. > > Is it worth the overhead to setup hadoop in pseduo distributed mode? Will I > see any gains in fetching large amounts of content from only three domains?
You've many cores that you don't utilize right now which you can in pseudo- mode. Fetching probably won't go faster since that's not a real bottleneck in many cases. The slow jobs are parsing, updating the crawldb (if it is large) or merging the linkdb (terrible performance). > > If it is worth it, can anyone point me to a good tutorial/post for setting > it up? Google hadoop nutch tutorial? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Is-running-nutch-in-psuedo-distributed- > mode-really-worth-it-tp3255677p3255677.html Sent from the Nutch - User > mailing list archive at Nabble.com. -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350

