Cool, it worked.  Now I need to do my part to extract what files belong to
what clusters by combing through the output file.

Thanks!

1.0: /filename1.txt = [110:5.496, 19:6.563, 196:5...


On Thu, Aug 11, 2011 at 7:14 PM, Jeff Eastman <[email protected]> wrote:

> You need to also add the -p argument to clusterdump, specifying your
> clusteredPoints directory.
>
> -----Original Message-----
> From: Yosep Kim [mailto:[email protected]]
> Sent: Thursday, August 11, 2011 4:11 PM
> To: [email protected]
> Subject: Re: How to convert
>
> Hello, Jeff:
>
> I did run the commands again with parameters you wanted me to add.
>  However,
> when I ran the following clusterdump command, I still had the same output:
>
>   mahout clusterdump -s /user/hadoop/articles-kmeans/clusters-1 -d
> /user/hadoop/articles-seqdir-sparse-kmeans/dictionary.file-0 -dt
> sequencefile -b 100 -n 20
>
> Am I missing some arguments?
>
> Thanks again for your help, Jeff.
>
> On Thu, Aug 11, 2011 at 6:49 PM, Yosep Kim <[email protected]> wrote:
>
> > What a fast response!!!  Thanks for the quick answer. I will let you know
> > how it goes!  Thanks!
> >
> >
> > On Thu, Aug 11, 2011 at 6:47 PM, Jeff Eastman <[email protected]>
> wrote:
> >
> >> You'll want to add the -nv option to seq2sparse to get NamedVectors out
> >> and add the -cl argument to k-means to get the clustered documents. Then
> the
> >> clusterdump should give you what you are seeking.
> >>
> >> -----Original Message-----
> >> From: Yosep Kim [mailto:[email protected]]
> >> Sent: Thursday, August 11, 2011 3:43 PM
> >> To: [email protected]
> >> Subject: How to convert
> >>
> >> Hello, Everyone!
> >>
> >> This is Yosep Kim, and I just started playing with Mahout.
> >>  I successfully installed it on my box and got a example data clustered
> >> using a K-Means clustering algorithm.  My input data was all text
> >> documents
> >> (i.e. new articles).  I ran a clusterdump command, I get some cool
> >> information.  However, I was not able to find a way to translate this
> back
> >> to the original document.  It looks like the algorithm created clusters
> >> based on all the words inside of documents.  Did I understand this
> >> correctly?  How can I create clusters based on documents so I can see
> that
> >> "document1.txt and document2.txt are in Cluster 1"?  I'd appreciate your
> >> help!!  Thanks.
> >>
> >>
> >> :CL-16397{n=1032 c=[0:0.125, 0.5:0.019, 0.8m:0.014, 00:0.096,
> 0000:0.008,
> >> 001:0.015, 00139:0.014, 001
> >>        Top Terms:
> >>                c                                       =>
> >> 2.458502088406289
> >>                software                                =>
> >> 2.375095306671867
> >>                java                                    =>
> >>  2.2093305677868598
> >>                project                                 =>
> >> 1.989917316871096
> >>                application                             =>
> >> 1.957329582567363
> >>                using                                   =>
> >> 1.916300386652466
> >>                web                                     =>
> >>  1.9046723985856817
> >>                development                             =>
> >>  1.8707247066867443
> >>
> >> By the way, Mahout is way cool, and I can't wait to be part of this
> >> "movement".
> >>
> >> Yosep
> >>
> >
> >
>

Reply via email to