On Wednesday, July 11, 2012, Lance Norskog wrote: > I did not articulate it well. I meant an "umbrella" project, not > merging the code bases. Yes, merging the code bases would be a > disaster. The "flagship" projects draw attention to the less-active > projects in the umbrella.
It is specifically umbrellas which have been found wanting from a governance standpoint. Code base is not an issue. It is a two-layer pmc that has been deemed undesirable. If several PMCs wanted to merge governance and be a single pmc with detailed oversight over all the code, I suppose that would be fine. I don't see Lucene/Solr doing that with other projects; they are already large and complex. I could see opennlp and mahout joining forces, but I would not presume to predict the opinions of all the players. Note that Lucene was once an umbrella and restructured as a single project, as did WS. > > What prompted this is that I'm used to the activity levels of Solr and > Mahout. These are very "alive" projects which attract new algorithms > as well as improvements to the existing code. These projects have > momentum: above it, the project attracts new people, and below it the > project languishes. OpenNLP seems to be a little below it. This > suggestion is one way to push it up. > > As an example, I'm playing with using LSA to summarize documents and > create tag clouds. I planned to contribute this to Lucene/Solr, and it > did not occur to me that it might also work in OpenNLP. > > I'm only coming to this project now. What is the most recent new > algorithm suite added? When? Are the contributors committers? > > On Tue, Jul 10, 2012 at 12:58 AM, Benson Margulies > <[email protected] <javascript:;>> wrote: > > The board is not enthusiastic about 'umbrella' projects. Cooperate, > > co-market -- great. Merge, not so good. > > > > On Tuesday, July 10, 2012, Jeyendran Balakrishnan wrote: > > > >> Hi Julien, > >> > >> My hats off to you and the rest of the Nutch developer team for > improving > >> Nutch over the past several years, to the level where anybody with heavy > >> duty crawling needs can just use it off the shelf. > >> I agree with you that Lucene vs Nutch is not as clear an analogy for the > >> library vs framework debate. > >> > >> In my usage, I have tended to use Nutch more as a library (in one > >> application only the crawling part, and in another just Fetcher2 [a > great > >> component], hacked up a bit to remove its dependency on the rest of > Nutch). > >> The point I was trying to make, not very clearly, was that Nutch > aggregates > >> other components (Hadoop for distributed processing, Lucene/Solr for > >> indexing and search, Tika for parsing, etc) along with its own custom > >> crawler component code and custom data flow design, into a platform for > >> end-to-end crawling, indexing and search, as opposed to, for example, > being > >> a pure-play crawler library on top of Hadoop. > >> > >> I look forward with interest to follow how this debate evolves regarding > >> OpenNLP and UIMA. > >> > >> Cheers, > >> Jeyendran > >> > >> > >> -----Original Message----- > >> From: Julien Nioche > >> [mailto:[email protected]<javascript:;><javascript:;>] > >> Sent: Monday, July 09, 2012 2:01 PM > >> To: [email protected] <javascript:;> <javascript:;>; > [email protected] <javascript:;><javascript:;> > >> Subject: Re: Apache "Text Analysis" top-level project? > >> > >> Jeyendran, > >> > >> One example I would suggest (at least according my view), is the > difference > >> > between Lucene and Nutch. Being a library, Lucene has pretty much > >> > taken over search engine software development. Nutch, on the other > >> > hand, tries to be a full-fledged platform for crawling, indexing and > >> > search, and has not gathered anywhere near the same usage levels. > >> > > >> > >> That Nutch does not have the same audience as Lucene is completely > >> understandable given that they are quite different in scope and nature. > Not > >> everybody needs to crawl on a large scale, but when they do they often > use > >> Nutch. And by the way Nutch does not do indexing and search - it > delegates > >> this to other tools like SOLR so it is mostly a crawler. > >> > >> The comparison between UIMA and OpenNLP is a better illustration of the > >> difference between a framework and a library IMHO > >> > >> Julien > >> > >> -- > >> * > >> *Open Source Solutions for Text Engineering > >> > >> http://digitalpebble.blogspot.com/ > >> http://www.digitalpebble.com > >> http://twitter.com/digitalpebble > >> > >> > > > > -- > Lance Norskog > [email protected] <javascript:;> >
