Chris,
 
I assume you ran the kmeans algorithm?
 
I believe the clusteredPoints file should prefix the document vectors with the 
text version of the processed documents (assuming seq2sparse was run with named 
vector (-nv) option),  
as shown in "Cluster documents using kmeans", step 3. here:
https://cwiki.apache.org/MAHOUT/quick-tour-of-text-analysis-using-the-mahout-command-line.html
 
But for the cluster id part (the Key), I believe one does have to map that 
numeric key with the corresponding ids from main cluster results (i.e., in 
"clusters-<n>-final" results).

As I recall the corresponding keys in the "final" folder will be CL-<id> or 
VL-<id>, specifying the state of the final cluster (converged or not):
http://lucene.472066.n3.nabble.com/retrieve-k-means-result-td1386091.html

I believe you just need to parse the ids from the clusteredPoints output (the 
Key) and map them to the number following "CL-" or "VL-" in the "final" output 
to identify the corresponding clusters.
 
Dan  

________________________________
 From: Christopher Laux <ctl...@gmail.com>
To: user@mahout.apache.org 
Sent: Sunday, November 18, 2012 11:37 AM
Subject: Conversion of point numbers to key strings
  
Hi all,

I can read mahout's output in "clusteredPoints" but that only provides
point numbers. When I input the data to a sequence file I used strings as
keys. Is there any way of recovering the key strings from the point
numbers? Or do I have to keep track of that myself?

Thanks,
Chris

Reply via email to