You definitely can do this. Assuming that you are using Lucene to do the search should be able to adapt the lucene vector exporter to export a specified list of documents as vectors and then run the normal clustering operations. The clustering should be reasonably fast, but the export of a few hundred documents from Lucene will probably be pretty slow for a large index.
If you mean just sort a set of documents into pre-existing clusters, you can do that as well. You would start with the same exporter, but would need to glue more pieces together to build the classifier part. My guess is that I am still not quite understanding what you want. Did my suggestions come at all close? On Tue, Nov 2, 2010 at 11:48 PM, Borbála Siklósi <[email protected]> wrote: > Yes I know carrot, but that is not a possibility for me to use that. There > isn't any way to tell mahout which subset of documents to cluster? > > 2010/11/2 Ted Dunning <[email protected]> > > > Have you looked at Carrot? It works very well > > > > http://search.carrot2.org/stable/search > > > > On Tue, Nov 2, 2010 at 11:54 AM, Borbála Siklósi <[email protected]> > > wrote: > > > > > Maybe I have quite a simple question, but I haven't been able to find > out > > > the solution. I have a solr index of doucuments and I run kmeans > > clustering > > > on them. It all works fine. How can I do that I make a keyword search > on > > > the > > > solr index and run the clustering only on the result set? Can I someway > > > determine what documents the algorithm should cluster? > > > > > >
