Cool, it worked. Now I need to do my part to extract what files belong to what clusters by combing through the output file.
Thanks! 1.0: /filename1.txt = [110:5.496, 19:6.563, 196:5... On Thu, Aug 11, 2011 at 7:14 PM, Jeff Eastman <[email protected]> wrote: > You need to also add the -p argument to clusterdump, specifying your > clusteredPoints directory. > > -----Original Message----- > From: Yosep Kim [mailto:[email protected]] > Sent: Thursday, August 11, 2011 4:11 PM > To: [email protected] > Subject: Re: How to convert > > Hello, Jeff: > > I did run the commands again with parameters you wanted me to add. > However, > when I ran the following clusterdump command, I still had the same output: > > mahout clusterdump -s /user/hadoop/articles-kmeans/clusters-1 -d > /user/hadoop/articles-seqdir-sparse-kmeans/dictionary.file-0 -dt > sequencefile -b 100 -n 20 > > Am I missing some arguments? > > Thanks again for your help, Jeff. > > On Thu, Aug 11, 2011 at 6:49 PM, Yosep Kim <[email protected]> wrote: > > > What a fast response!!! Thanks for the quick answer. I will let you know > > how it goes! Thanks! > > > > > > On Thu, Aug 11, 2011 at 6:47 PM, Jeff Eastman <[email protected]> > wrote: > > > >> You'll want to add the -nv option to seq2sparse to get NamedVectors out > >> and add the -cl argument to k-means to get the clustered documents. Then > the > >> clusterdump should give you what you are seeking. > >> > >> -----Original Message----- > >> From: Yosep Kim [mailto:[email protected]] > >> Sent: Thursday, August 11, 2011 3:43 PM > >> To: [email protected] > >> Subject: How to convert > >> > >> Hello, Everyone! > >> > >> This is Yosep Kim, and I just started playing with Mahout. > >> I successfully installed it on my box and got a example data clustered > >> using a K-Means clustering algorithm. My input data was all text > >> documents > >> (i.e. new articles). I ran a clusterdump command, I get some cool > >> information. However, I was not able to find a way to translate this > back > >> to the original document. It looks like the algorithm created clusters > >> based on all the words inside of documents. Did I understand this > >> correctly? How can I create clusters based on documents so I can see > that > >> "document1.txt and document2.txt are in Cluster 1"? I'd appreciate your > >> help!! Thanks. > >> > >> > >> :CL-16397{n=1032 c=[0:0.125, 0.5:0.019, 0.8m:0.014, 00:0.096, > 0000:0.008, > >> 001:0.015, 00139:0.014, 001 > >> Top Terms: > >> c => > >> 2.458502088406289 > >> software => > >> 2.375095306671867 > >> java => > >> 2.2093305677868598 > >> project => > >> 1.989917316871096 > >> application => > >> 1.957329582567363 > >> using => > >> 1.916300386652466 > >> web => > >> 1.9046723985856817 > >> development => > >> 1.8707247066867443 > >> > >> By the way, Mahout is way cool, and I can't wait to be part of this > >> "movement". > >> > >> Yosep > >> > > > > >
