Hi Julien, My hats off to you and the rest of the Nutch developer team for improving Nutch over the past several years, to the level where anybody with heavy duty crawling needs can just use it off the shelf. I agree with you that Lucene vs Nutch is not as clear an analogy for the library vs framework debate.
In my usage, I have tended to use Nutch more as a library (in one application only the crawling part, and in another just Fetcher2 [a great component], hacked up a bit to remove its dependency on the rest of Nutch). The point I was trying to make, not very clearly, was that Nutch aggregates other components (Hadoop for distributed processing, Lucene/Solr for indexing and search, Tika for parsing, etc) along with its own custom crawler component code and custom data flow design, into a platform for end-to-end crawling, indexing and search, as opposed to, for example, being a pure-play crawler library on top of Hadoop. I look forward with interest to follow how this debate evolves regarding OpenNLP and UIMA. Cheers, Jeyendran -----Original Message----- From: Julien Nioche [mailto:[email protected]] Sent: Monday, July 09, 2012 2:01 PM To: [email protected]; [email protected] Subject: Re: Apache "Text Analysis" top-level project? Jeyendran, One example I would suggest (at least according my view), is the difference > between Lucene and Nutch. Being a library, Lucene has pretty much > taken over search engine software development. Nutch, on the other > hand, tries to be a full-fledged platform for crawling, indexing and > search, and has not gathered anywhere near the same usage levels. > That Nutch does not have the same audience as Lucene is completely understandable given that they are quite different in scope and nature. Not everybody needs to crawl on a large scale, but when they do they often use Nutch. And by the way Nutch does not do indexing and search - it delegates this to other tools like SOLR so it is mostly a crawler. The comparison between UIMA and OpenNLP is a better illustration of the difference between a framework and a library IMHO Julien -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble
