What a fast response!!! Thanks for the quick answer. I will let you know how it goes! Thanks!
On Thu, Aug 11, 2011 at 6:47 PM, Jeff Eastman <[email protected]> wrote: > You'll want to add the -nv option to seq2sparse to get NamedVectors out and > add the -cl argument to k-means to get the clustered documents. Then the > clusterdump should give you what you are seeking. > > -----Original Message----- > From: Yosep Kim [mailto:[email protected]] > Sent: Thursday, August 11, 2011 3:43 PM > To: [email protected] > Subject: How to convert > > Hello, Everyone! > > This is Yosep Kim, and I just started playing with Mahout. > I successfully installed it on my box and got a example data clustered > using a K-Means clustering algorithm. My input data was all text documents > (i.e. new articles). I ran a clusterdump command, I get some cool > information. However, I was not able to find a way to translate this back > to the original document. It looks like the algorithm created clusters > based on all the words inside of documents. Did I understand this > correctly? How can I create clusters based on documents so I can see that > "document1.txt and document2.txt are in Cluster 1"? I'd appreciate your > help!! Thanks. > > > :CL-16397{n=1032 c=[0:0.125, 0.5:0.019, 0.8m:0.014, 00:0.096, 0000:0.008, > 001:0.015, 00139:0.014, 001 > Top Terms: > c => > 2.458502088406289 > software => > 2.375095306671867 > java => > 2.2093305677868598 > project => > 1.989917316871096 > application => > 1.957329582567363 > using => > 1.916300386652466 > web => > 1.9046723985856817 > development => > 1.8707247066867443 > > By the way, Mahout is way cool, and I can't wait to be part of this > "movement". > > Yosep >
