RE: Differences between 2.1 and 1.6

Markus Jelsma Mon, 25 Feb 2013 15:09:01 -0800

Something seems to be missing here. It's clear that 1.x has more features and 
is a lot more stable than 2.x. Nutch 2.x can theoretically perform a lot better 
if you are going to crawl on a very large scale but i still haven't seen any 
numbers to support this assumption. Nutch 1.x can easily deal with many 
millions of records and deal with billions if you throw some hardware at it.

Most users are not going to crawl millions or records. In that case i 
personally choose 1.x. I prefer the stability and predictabilty above some 
performance you are not likely going to need anyway. 

Besides our large 1.x research cluster we still use 1.x in production for all 
our customers, running locally on a 2 core 512MB RAM VPS with a crawldb of over 
5 million records and it runs fine, fast and keeps up with newly discovered 
URL's. The only significant improvements were a better scoring filter and 
integrating indexing in the fetcher.

-----Original message-----
> From:Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>
> Sent: Mon 25-Feb-2013 23:37
> To: user@nutch.apache.org
> Subject: Re: Differences between 2.1 and 1.6
> 
> Hi Danilo,
> 
> You can check out the architecture changes here
> http://wiki.apache.org/nutch/#Nutch_2.x
> 
> Nutch trunk (1.7-SNAPSHOT) is here
> http://svn.apache.org/repos/asf/nutch/trunk/
> 
> 2.x is here
> http://svn.apache.org/repos/asf/nutch/branches/2.x/
> 
> On Mon, Feb 25, 2013 at 1:56 PM, Danilo Fernandes <
> dan...@kelsorfernandes.com.br> wrote:
> 
> > Hi everyone,
> >
> > Somebody can tell me about differences between 2.1 and 1.6?
> >
> > The SVN trunk is 1.* or 2.*?
> >
> > Thanks,
> > Danilo Fernandes
> >
> >
> 
> 
> -- 
> *Lewis*
>

RE: Differences between 2.1 and 1.6

Reply via email to