Something seems to be missing here. It's clear that 1.x has more features and is a lot more stable than 2.x. Nutch 2.x can theoretically perform a lot better if you are going to crawl on a very large scale but i still haven't seen any numbers to support this assumption. Nutch 1.x can easily deal with many millions of records and deal with billions if you throw some hardware at it.
Most users are not going to crawl millions or records. In that case i personally choose 1.x. I prefer the stability and predictabilty above some performance you are not likely going to need anyway. Besides our large 1.x research cluster we still use 1.x in production for all our customers, running locally on a 2 core 512MB RAM VPS with a crawldb of over 5 million records and it runs fine, fast and keeps up with newly discovered URL's. The only significant improvements were a better scoring filter and integrating indexing in the fetcher. -----Original message----- > From:Lewis John Mcgibbney <lewis.mcgibb...@gmail.com> > Sent: Mon 25-Feb-2013 23:37 > To: user@nutch.apache.org > Subject: Re: Differences between 2.1 and 1.6 > > Hi Danilo, > > You can check out the architecture changes here > http://wiki.apache.org/nutch/#Nutch_2.x > > Nutch trunk (1.7-SNAPSHOT) is here > http://svn.apache.org/repos/asf/nutch/trunk/ > > 2.x is here > http://svn.apache.org/repos/asf/nutch/branches/2.x/ > > On Mon, Feb 25, 2013 at 1:56 PM, Danilo Fernandes < > dan...@kelsorfernandes.com.br> wrote: > > > Hi everyone, > > > > Somebody can tell me about differences between 2.1 and 1.6? > > > > The SVN trunk is 1.* or 2.*? > > > > Thanks, > > Danilo Fernandes > > > > > > > -- > *Lewis* >