Thank you Suneel! I will give this a try and let you know how it goes! Ronald ________________________________________ From: Suneel Marthi [suneel_mar...@yahoo.com] Sent: Tuesday, February 11, 2014 5:44 PM To: user@mahout.apache.org Subject: Re: seqdumper output?
You should run the clusterdump on /home/r9r/seqTest/seqKmeans/clusters-1-final/part-xxxxx to see the points that are in the cluster. But u need a dictionary for that which wouldn't be available if the vectors were generated from CSV. So one way to generate a dictionary for a CSV and verify the clustering output would be to go through the below process :- 1. Convert CSV file to a lucene index (see http://glaforge.appspot.com/article/lucene-s-fun for sample code). 2. Run the lucene index from (1) through Mahout's lucene2seq utility - this converts the lucene indexes into sequencefiles 3. Run the output of (2) thru seq2sparse - this should generate tf-idf vectors, dictionary, tf-vectors, wordcounts 4. Run the output of (3) thru KMeans Driver. Please give this a try. On Tuesday, February 11, 2014 3:33 PM, "Allen, Ronald L." <allen...@ornl.gov> wrote: Hello, I have done something wrong with clustering a CSV file and can't quite figure it out. I am using Mahout 0.9 on a local machine only. Below is the output from seqdumper, and I am not sure how to interpret it. Can anyone help? Input Path: file:/home/r9r/seqTest/seqKmeans/clusters-1-final/_policy Key class: class org.apache.hadoop.io.Text Value Class: class org.apache.mahout.clustering.iterator.ClusteringPolicyWritable Key: : Value: org.apache.mahout.clustering.iterator.ClusteringPolicyWritable@78be9eb3 Count: 1 Input Path: file:/home/r9r/seqTest/seqKmeans/clusters-1-final/part-00000 Key class: class org.apache.hadoop.io.IntWritable Value Class: class org.apache.mahout.clustering.iterator.ClusterWritable Key: 0: Value: org.apache.mahout.clustering.iterator.ClusterWritable@592ea0f8 Count: 1 Input Path: file:/home/r9r/seqTest/seqKmeans/clusters-1-final/part-00001 Key class: class org.apache.hadoop.io.IntWritable Value Class: class org.apache.mahout.clustering.iterator.ClusterWritable Key: 1: Value: org.apache.mahout.clustering.iterator.ClusterWritable@44a2786 Count: 1 There's probably a good chance I am still not getting my CSV data into something usable. I can get it into a sequence file, but this is the output. Thanks, Ronald