> Both of the clustering algorithms that ship with Solr (Lingo and STC) are > designed to allow one document to appear in more than one cluster, which > actually does make sense in many scenarios. There's no easy way to force > them to produce hard clusterings because this would require a complete > change in the way the algorithms work. If you need each document to belong > to exactly one cluster, you'd have to post-process the clusters to remove > the redundant document assignments. >
On the second thought, I have a simple implementation of k-means clustering that could do hard clustering for you. It's not available yet, it will most probably be part of the next major release of Carrot2 (the package that does the clustering). Please watch this issue http://issues.carrot2.org/browse/CARROT-791 to get updates on this. Cheers, S.