Chris, I assume you ran the kmeans algorithm? I believe the clusteredPoints file should prefix the document vectors with the text version of the processed documents (assuming seq2sparse was run with named vector (-nv) option), as shown in "Cluster documents using kmeans", step 3. here: https://cwiki.apache.org/MAHOUT/quick-tour-of-text-analysis-using-the-mahout-command-line.html But for the cluster id part (the Key), I believe one does have to map that numeric key with the corresponding ids from main cluster results (i.e., in "clusters-<n>-final" results).
As I recall the corresponding keys in the "final" folder will be CL-<id> or VL-<id>, specifying the state of the final cluster (converged or not): http://lucene.472066.n3.nabble.com/retrieve-k-means-result-td1386091.html I believe you just need to parse the ids from the clusteredPoints output (the Key) and map them to the number following "CL-" or "VL-" in the "final" output to identify the corresponding clusters. Dan ________________________________ From: Christopher Laux <ctl...@gmail.com> To: user@mahout.apache.org Sent: Sunday, November 18, 2012 11:37 AM Subject: Conversion of point numbers to key strings Hi all, I can read mahout's output in "clusteredPoints" but that only provides point numbers. When I input the data to a sequence file I used strings as keys. Is there any way of recovering the key strings from the point numbers? Or do I have to keep track of that myself? Thanks, Chris