Re: Apache "Text Analysis" top-level project?

Benson Margulies Tue, 10 Jul 2012 00:58:31 -0700

The board is not enthusiastic about 'umbrella' projects. Cooperate,
co-market -- great. Merge, not so good.


On Tuesday, July 10, 2012, Jeyendran Balakrishnan wrote:

> Hi Julien,
>
> My hats off to you and the rest of the Nutch developer team for improving
> Nutch over the past several years, to the level where anybody with heavy
> duty crawling needs can just use it off the shelf.
> I agree with you that Lucene vs Nutch is not as clear an analogy for the
> library vs framework debate.
>
> In my usage, I have tended to use Nutch more as a library (in one
> application only the crawling part, and in another just Fetcher2 [a great
> component], hacked up a bit to remove its dependency on the rest of Nutch).
> The point I was trying to make, not very clearly, was that Nutch aggregates
> other components (Hadoop for distributed processing, Lucene/Solr for
> indexing and search, Tika for parsing, etc) along with its own custom
> crawler component code and custom data flow design, into a platform for
> end-to-end crawling, indexing and search, as opposed to, for example, being
> a pure-play crawler library on top of Hadoop.
>
> I look forward with interest to follow how this debate evolves regarding
> OpenNLP and UIMA.
>
> Cheers,
> Jeyendran
>
>
> -----Original Message-----
> From: Julien Nioche [mailto:[email protected] <javascript:;>]
> Sent: Monday, July 09, 2012 2:01 PM
> To: [email protected] <javascript:;>; 
> [email protected]<javascript:;>
> Subject: Re: Apache "Text Analysis" top-level project?
>
> Jeyendran,
>
> One example I would suggest (at least according my view), is the difference
> > between Lucene and Nutch. Being a library, Lucene has pretty much
> > taken over search engine software development. Nutch, on the other
> > hand, tries to be a full-fledged platform for crawling, indexing and
> > search, and has not gathered anywhere near the same usage levels.
> >
>
> That Nutch does not have the same audience as Lucene is completely
> understandable given that they are quite different in scope and nature. Not
> everybody needs to crawl on a large scale, but when they do they often use
> Nutch. And by the way Nutch does not do indexing and search - it delegates
> this to other tools like SOLR so it is mostly a crawler.
>
> The comparison between UIMA and OpenNLP is a better illustration of the
> difference between a framework and a library IMHO
>
> Julien
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
>
>

Re: Apache "Text Analysis" top-level project?

Reply via email to