Re: Check the input files present in cluster

Madhusudan Joshi Wed, 06 Apr 2011 22:02:26 -0700

Thank you. Adding --nameVector parameter returned the members of the cluster
during clusterdump.


On Wed, Apr 6, 2011 at 12:27 PM, Geek Gamer <geek4...@gmail.com> wrote:

> How are you preparing the vectors? You will get the cluster members if
> these
> are named vectors. you can prepare named vectors from a sequence file using
> $MAHOUT_HOME/bin/mahout seq2sparse
>
> add the parameter --namedVector to the command to create named vectors, the
> same clusterdump command will then yield the members of the clusters.
> Hope this helped.
>
>
> On Wed, Apr 6, 2011 at 9:23 AM, Madhusudan Joshi <
> madhusudanrjo...@gmail.com
> > wrote:
>
> > The command I used to cluster dump is
> >
> > mahout clusterdump -s mytest/kmeans/clusters-1 -p
> > mytest/kmeans/clusteredPoints -d mytest/seqdir-sparse/dictionary.file-0
> -dt
> > sequencefile -n 20 -o Desktop/ClusterDump/Kmeans/cl1.txt
> >
> > I tried the reuters example and then clustered using my sample files. The
> > output of my sample files is
> >
> > CL-0{n=2 c=[article:3.009, first:3.279, third:3.279] r=[first:3.279,
> > third:3.279]}
> >    Top Terms:
> >        third                                   =>  3.2787654399871826
> >        first                                   =>  3.2787654399871826
> >        article                                 =>  3.0087521076202393
> >    Weight:  Point:
> >    1.0: [article:3.009, first:6.558]
> >    1.0: [article:3.009, third:6.558]
> > VL-1{n=1 c=[article:3.009, second:6.558] r=[article:0.000, first:0.000,
> > fourth:0.000, second:0.000, third:0.000]}
> >    Top Terms:
> >        second                                  =>   6.557530879974365
> >        article                                 =>  3.0087521076202393
> >    Weight:  Point:
> >    1.0: [article:3.009, second:6.558]
> > VL-3{n=1 c=[article:3.009, fourth:6.558] r=[article:0.000, first:0.000,
> > fourth:0.000, second:0.000, third:0.000]}
> >    Top Terms:
> >        fourth                                  =>   6.557530879974365
> >        article                                 =>  3.0087521076202393
> >    Weight:  Point:
> >    1.0: [article:3.009, fourth:6.558]
> >
> > The output showed the number of documents present in the cluster but did
> > not
> > mention which documents. I need to be able to check which documents are
> > present in any given clusters.
> >
> > On Tue, Apr 5, 2011 at 11:34 PM, Jeff Eastman <jeast...@narus.com>
> wrote:
> >
> > > You are going to have to be much more explicit in terms of what command
> > > line invocations you did and what results you got in order for anybody
> to
> > be
> > > able help you much here. Have you tried the clustering examples in the
> > wiki?
> > >
> > > -----Original Message-----
> > > From: Madhusudan Joshi [mailto:madhusudanrjo...@gmail.com]
> > > Sent: Monday, April 04, 2011 10:23 PM
> > > To: user@mahout.apache.org
> > > Subject: Check the input files present in cluster
> > >
> > > Hi,
> > >
> > > I am new to mahout and trying out clustering. I created a cluster using
> > > kmeans in bash. I want to know which files are present in a given
> > clusters.
> > > I tried looking for it in cluster dumper but didn't find the required
> > > solution. Can anyone help me with this?
> > >
> > > Thanks.
> > >
> > > --
> > > Everything we hear is an opinion, not a fact.
> > > Everything we see is perspective, not the truth.
> > >
> >
> >
> >
> > --
> > Everything we hear is an opinion, not a fact.
> > Everything we see is perspective, not the truth.
> >
>



-- 
Everything we hear is an opinion, not a fact.
Everything we see is perspective, not the truth.

Re: Check the input files present in cluster

Reply via email to