Hello, Jeff: I did run the commands again with parameters you wanted me to add. However, when I ran the following clusterdump command, I still had the same output:
mahout clusterdump -s /user/hadoop/articles-kmeans/clusters-1 -d /user/hadoop/articles-seqdir-sparse-kmeans/dictionary.file-0 -dt sequencefile -b 100 -n 20 Am I missing some arguments? Thanks again for your help, Jeff. On Thu, Aug 11, 2011 at 6:49 PM, Yosep Kim <[email protected]> wrote: > What a fast response!!! Thanks for the quick answer. I will let you know > how it goes! Thanks! > > > On Thu, Aug 11, 2011 at 6:47 PM, Jeff Eastman <[email protected]> wrote: > >> You'll want to add the -nv option to seq2sparse to get NamedVectors out >> and add the -cl argument to k-means to get the clustered documents. Then the >> clusterdump should give you what you are seeking. >> >> -----Original Message----- >> From: Yosep Kim [mailto:[email protected]] >> Sent: Thursday, August 11, 2011 3:43 PM >> To: [email protected] >> Subject: How to convert >> >> Hello, Everyone! >> >> This is Yosep Kim, and I just started playing with Mahout. >> I successfully installed it on my box and got a example data clustered >> using a K-Means clustering algorithm. My input data was all text >> documents >> (i.e. new articles). I ran a clusterdump command, I get some cool >> information. However, I was not able to find a way to translate this back >> to the original document. It looks like the algorithm created clusters >> based on all the words inside of documents. Did I understand this >> correctly? How can I create clusters based on documents so I can see that >> "document1.txt and document2.txt are in Cluster 1"? I'd appreciate your >> help!! Thanks. >> >> >> :CL-16397{n=1032 c=[0:0.125, 0.5:0.019, 0.8m:0.014, 00:0.096, 0000:0.008, >> 001:0.015, 00139:0.014, 001 >> Top Terms: >> c => >> 2.458502088406289 >> software => >> 2.375095306671867 >> java => >> 2.2093305677868598 >> project => >> 1.989917316871096 >> application => >> 1.957329582567363 >> using => >> 1.916300386652466 >> web => >> 1.9046723985856817 >> development => >> 1.8707247066867443 >> >> By the way, Mahout is way cool, and I can't wait to be part of this >> "movement". >> >> Yosep >> > >
