Reposting - I wasn't subscribed to the group earlier
VS first ingesting the data into SOLR and then invoking mahout on the SOLR index (clustering on the contents of the field "text") defined as Field: text Field-Type:org.apache.solr.schema.TextFieldProperties:Indexed,Tokenized,Multivalued,TermVector StoredSchema:Indexed,Tokenized,Multivalued,TermVector StoredIndex:(unstored field) PI Gap:100 Docs:21578 Index Analyzer: org.apache.solr.analysis.TokenizerChain Query Analyzer: org.apache.solr.analysis.TokenizerChain and executing a "similar" command set I get vastly differing results: The lucene / kmeans approach yeids 20 cluster whereas the solr approach yields just one cluster. I'm obviously doing something wrong. Any pointers? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Clustering-using-Solr-Index-vs-Lucene-Index-Different-Results-tp4037198.html Sent from the Mahout User List mailing list archive at Nabble.com.