Jeffrey,

Are you looking to cluster a whole corpus of documents of just the search 
results?  If it's the latter, use Carrot2.  If it's the former, look at Mahout. 
 Clustering top 1M matching documents doesn't really make sense.  Usually top 
100-200 is sufficient.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Jeffrey Tiong <jeffrey.ti...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Friday, June 12, 2009 12:44:55 AM
> Subject: Re: Faceting on text fields
> 
> Hi all,
> 
> We are thinking of using the carrot clustering too. But we saw that carrot
> maybe can only cluster up to 1000 search snippets. Does anyone know how can
> we cluster snippets that is much more than that ? (maybe in the million
> range?)
> 
> And what is the difference between mahout and carrot?
> 
> Thank!
> 
> Jeffrey
> 
> On Thu, Jun 11, 2009 at 9:47 PM, Michael Ludwig wrote:
> 
> > Yao Ge schrieb:
> >
> >> BTW, Carrot2 has a very impressive Clustering Workbench (based on
> >> eclipse) that has built-in integration with Solr. If you have a Solr
> >> service running, it is a just a matter of point the workbench to it.
> >> The clustering results and visualization are amazing.
> >> (http://project.carrot2.org/download.html).
> >>
> >
> > A new world opens up for me ...
> >
> > Thanks for pointing out how cool this is!
> >
> > Hint for other newcomers: Open the View Menu to configure the details of
> > how you perform your search, e.g. your Solr URL in case it differs from
> > the default, or your "summary field", which is what gets used to analyze
> > the data in order to determine clusters, if I understand correctly.
> >
> > Michael Ludwig
> >

Reply via email to