Using the reuters 21578 data set and the cluster_reuters.sh script

VS

first ingesting the data into SOLR and then invoking mahout on the SOLR
index (clustering on the contents of the field "text")

defined as


and executing a  "similar" command set


I get vastly differing results:

The lucene / kmeans approach yeids 20 cluster whereas the solr approach
yields just one cluster.

I'm obviously doing something wrong. Any pointers?

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Clustering-using-Solr-Index-vs-Lucene-Index-Different-Results-tp4036013.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Reply via email to