Re: clustering after search

Ted Dunning Tue, 02 Nov 2010 23:53:56 -0700

You definitely can do this.

Assuming that you are using Lucene to do the search should be able to adapt
the lucene vector exporter to export a
specified list of documents as vectors and then run the normal clustering
operations.  The clustering should be reasonably
fast, but the export of a few hundred documents from Lucene will probably be
pretty slow for a large index.

If you mean just sort a set of documents into pre-existing clusters, you can
do that as well.  You would start with the same
exporter, but would need to glue more pieces together to build the
classifier part.

My guess is that I am still not quite understanding what you want.  Did my
suggestions come at all close?

On Tue, Nov 2, 2010 at 11:48 PM, Borbála Siklósi <[email protected]> wrote:

> Yes I know carrot, but that is not a possibility for me to use that. There
> isn't any way to tell mahout which subset of documents to cluster?
>
> 2010/11/2 Ted Dunning <[email protected]>
>
> > Have you looked at Carrot?   It works very well
> >
> > http://search.carrot2.org/stable/search
> >
> > On Tue, Nov 2, 2010 at 11:54 AM, Borbála Siklósi <[email protected]>
> > wrote:
> >
> > > Maybe I have quite a simple question, but I haven't been able to find
> out
> > > the solution. I have a solr index of doucuments and I run kmeans
> > clustering
> > > on them. It all works fine. How can I do that I make a keyword search
> on
> > > the
> > > solr index and run the clustering only on the result set? Can I someway
> > > determine what documents the algorithm should cluster?
> > >
> >
>

Re: clustering after search

Reply via email to