Re: Apache "Text Analysis" top-level project?

Benson Margulies Wed, 11 Jul 2012 00:32:59 -0700

On Wednesday, July 11, 2012, Lance Norskog wrote:

> I did not articulate it well. I meant an "umbrella" project, not
> merging the code bases. Yes, merging the code bases would be a
> disaster. The "flagship" projects draw attention to  the less-active
> projects in the umbrella.



It is specifically umbrellas which have been found wanting from a
governance standpoint. Code base is not an issue. It is a two-layer pmc
that has been deemed undesirable. If several PMCs wanted to merge
governance and be a single pmc with detailed oversight over all the code, I
suppose that would be fine. I don't see Lucene/Solr doing that with other
projects; they are already large and complex. I could see opennlp and
mahout joining forces, but I would not presume to predict the opinions of
all the players.

Note that Lucene was once an umbrella and restructured as a single project,
as did WS.


>
> What prompted this is that I'm used to the activity levels of Solr and
> Mahout. These are very "alive" projects which attract new algorithms
> as well as improvements to the existing code. These projects have
> momentum: above it, the project attracts new people, and below it the
> project languishes.  OpenNLP seems to be a little below it. This
> suggestion is one way to push it up.
>
> As an example, I'm playing with using LSA to summarize documents and
> create tag clouds. I planned to contribute this to Lucene/Solr, and it
> did not occur to me that it might also work in OpenNLP.
>
> I'm only coming to this project now. What is the most recent new
> algorithm suite added? When? Are the contributors committers?
>
> On Tue, Jul 10, 2012 at 12:58 AM, Benson Margulies
> <[email protected] <javascript:;>> wrote:
> > The board is not enthusiastic about 'umbrella' projects. Cooperate,
> > co-market -- great. Merge, not so good.
> >
> > On Tuesday, July 10, 2012, Jeyendran Balakrishnan wrote:
> >
> >> Hi Julien,
> >>
> >> My hats off to you and the rest of the Nutch developer team for
> improving
> >> Nutch over the past several years, to the level where anybody with heavy
> >> duty crawling needs can just use it off the shelf.
> >> I agree with you that Lucene vs Nutch is not as clear an analogy for the
> >> library vs framework debate.
> >>
> >> In my usage, I have tended to use Nutch more as a library (in one
> >> application only the crawling part, and in another just Fetcher2 [a
> great
> >> component], hacked up a bit to remove its dependency on the rest of
> Nutch).
> >> The point I was trying to make, not very clearly, was that Nutch
> aggregates
> >> other components (Hadoop for distributed processing, Lucene/Solr for
> >> indexing and search, Tika for parsing, etc) along with its own custom
> >> crawler component code and custom data flow design, into a platform for
> >> end-to-end crawling, indexing and search, as opposed to, for example,
> being
> >> a pure-play crawler library on top of Hadoop.
> >>
> >> I look forward with interest to follow how this debate evolves regarding
> >> OpenNLP and UIMA.
> >>
> >> Cheers,
> >> Jeyendran
> >>
> >>
> >> -----Original Message-----
> >> From: Julien Nioche 
> >> [mailto:[email protected]<javascript:;><javascript:;>]
> >> Sent: Monday, July 09, 2012 2:01 PM
> >> To: [email protected] <javascript:;> <javascript:;>;
> [email protected] <javascript:;><javascript:;>
> >> Subject: Re: Apache "Text Analysis" top-level project?
> >>
> >> Jeyendran,
> >>
> >> One example I would suggest (at least according my view), is the
> difference
> >> > between Lucene and Nutch. Being a library, Lucene has pretty much
> >> > taken over search engine software development. Nutch, on the other
> >> > hand, tries to be a full-fledged platform for crawling, indexing and
> >> > search, and has not gathered anywhere near the same usage levels.
> >> >
> >>
> >> That Nutch does not have the same audience as Lucene is completely
> >> understandable given that they are quite different in scope and nature.
> Not
> >> everybody needs to crawl on a large scale, but when they do they often
> use
> >> Nutch. And by the way Nutch does not do indexing and search - it
> delegates
> >> this to other tools like SOLR so it is mostly a crawler.
> >>
> >> The comparison between UIMA and OpenNLP is a better illustration of the
> >> difference between a framework and a library IMHO
> >>
> >> Julien
> >>
> >> --
> >> *
> >> *Open Source Solutions for Text Engineering
> >>
> >> http://digitalpebble.blogspot.com/
> >> http://www.digitalpebble.com
> >> http://twitter.com/digitalpebble
> >>
> >>
>
>
>
> --
> Lance Norskog
> [email protected] <javascript:;>
>

Re: Apache "Text Analysis" top-level project?

Reply via email to