Hi Julien,

My hats off to you and the rest of the Nutch developer team for improving
Nutch over the past several years, to the level where anybody with heavy
duty crawling needs can just use it off the shelf.
I agree with you that Lucene vs Nutch is not as clear an analogy for the
library vs framework debate.

In my usage, I have tended to use Nutch more as a library (in one
application only the crawling part, and in another just Fetcher2 [a great
component], hacked up a bit to remove its dependency on the rest of Nutch).
The point I was trying to make, not very clearly, was that Nutch aggregates
other components (Hadoop for distributed processing, Lucene/Solr for
indexing and search, Tika for parsing, etc) along with its own custom
crawler component code and custom data flow design, into a platform for
end-to-end crawling, indexing and search, as opposed to, for example, being
a pure-play crawler library on top of Hadoop.

I look forward with interest to follow how this debate evolves regarding
OpenNLP and UIMA.

Cheers,
Jeyendran


-----Original Message-----
From: Julien Nioche [mailto:[email protected]] 
Sent: Monday, July 09, 2012 2:01 PM
To: [email protected]; [email protected]
Subject: Re: Apache "Text Analysis" top-level project?

Jeyendran,

One example I would suggest (at least according my view), is the difference
> between Lucene and Nutch. Being a library, Lucene has pretty much 
> taken over search engine software development. Nutch, on the other 
> hand, tries to be a full-fledged platform for crawling, indexing and 
> search, and has not gathered anywhere near the same usage levels.
>

That Nutch does not have the same audience as Lucene is completely
understandable given that they are quite different in scope and nature. Not
everybody needs to crawl on a large scale, but when they do they often use
Nutch. And by the way Nutch does not do indexing and search - it delegates
this to other tools like SOLR so it is mostly a crawler.

The comparison between UIMA and OpenNLP is a better illustration of the
difference between a framework and a library IMHO

Julien

--
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Reply via email to