The board is not enthusiastic about 'umbrella' projects. Cooperate, co-market -- great. Merge, not so good.
On Tuesday, July 10, 2012, Jeyendran Balakrishnan wrote: > Hi Julien, > > My hats off to you and the rest of the Nutch developer team for improving > Nutch over the past several years, to the level where anybody with heavy > duty crawling needs can just use it off the shelf. > I agree with you that Lucene vs Nutch is not as clear an analogy for the > library vs framework debate. > > In my usage, I have tended to use Nutch more as a library (in one > application only the crawling part, and in another just Fetcher2 [a great > component], hacked up a bit to remove its dependency on the rest of Nutch). > The point I was trying to make, not very clearly, was that Nutch aggregates > other components (Hadoop for distributed processing, Lucene/Solr for > indexing and search, Tika for parsing, etc) along with its own custom > crawler component code and custom data flow design, into a platform for > end-to-end crawling, indexing and search, as opposed to, for example, being > a pure-play crawler library on top of Hadoop. > > I look forward with interest to follow how this debate evolves regarding > OpenNLP and UIMA. > > Cheers, > Jeyendran > > > -----Original Message----- > From: Julien Nioche [mailto:[email protected] <javascript:;>] > Sent: Monday, July 09, 2012 2:01 PM > To: [email protected] <javascript:;>; > [email protected]<javascript:;> > Subject: Re: Apache "Text Analysis" top-level project? > > Jeyendran, > > One example I would suggest (at least according my view), is the difference > > between Lucene and Nutch. Being a library, Lucene has pretty much > > taken over search engine software development. Nutch, on the other > > hand, tries to be a full-fledged platform for crawling, indexing and > > search, and has not gathered anywhere near the same usage levels. > > > > That Nutch does not have the same audience as Lucene is completely > understandable given that they are quite different in scope and nature. Not > everybody needs to crawl on a large scale, but when they do they often use > Nutch. And by the way Nutch does not do indexing and search - it delegates > this to other tools like SOLR so it is mostly a crawler. > > The comparison between UIMA and OpenNLP is a better illustration of the > difference between a framework and a library IMHO > > Julien > > -- > * > *Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com > http://twitter.com/digitalpebble > >
