Reposting - I wasn't subscribed to the group earlier

VS 

first ingesting the data into SOLR and then invoking mahout on the SOLR
index (clustering on the contents of the field "text") 

defined as 

Field: text
Field-Type:org.apache.solr.schema.TextFieldProperties:Indexed,Tokenized,Multivalued,TermVector
StoredSchema:Indexed,Tokenized,Multivalued,TermVector StoredIndex:(unstored
field)
PI Gap:100
Docs:21578

Index Analyzer:
org.apache.solr.analysis.TokenizerChain
Query Analyzer:
org.apache.solr.analysis.TokenizerChain
and executing a  "similar" command set 

I get vastly differing results: 

The lucene / kmeans approach yeids 20 cluster whereas the solr approach
yields just one cluster. 

I'm obviously doing something wrong. Any pointers? 

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Clustering-using-Solr-Index-vs-Lucene-Index-Different-Results-tp4037198.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Reply via email to