Ah.

I get it.  Ish.

I think, but am not entirely sure that there are two outputs possible that
you might be seeing.

One is the centroids of the vectors themselves.  These tend to densify, but
I am not sure if these actually are dense vectors (I would tend to think
so).  That might be what you are seeing.

The second is the assignment of your original vectors to the nearest
cluster.  Here, the vector is just your original vector.  This output could
be in the form of a cluster id followed by the id's on all the vectors in
that cluster.  That doesn't look like what you are seeing.

Can you say what the actual commands you are running?  Without that, it is
a bit hard to say what you are seeing.






On Sun, Aug 11, 2013 at 10:57 PM, Ashwini P <ashwini.a...@gmail.com> wrote:

> Hi Ted,
>
> My apologies for not framing the question on clusterdumper properly. I am
> getting the output from clusterdumper in the expected format.  A sample
> vector from the  clusterdumper output is as shown below:
>
>     1.0: /all-exchanges-strings.lc.txt = [amex:0.161, ase:0.161, asx:0.161,
> biffex:0.161, bse:0.161, cboe:0.161, cbt:0.161, cme:0.161, comex:0.161,
> cse:0.161, fox:0.136, fse:0.161, hkse:0.161, ipe:0.161, jse:0.161,
> klce:0.161, klse:0.161, liffe:0.161, lme:0.161, lse:0.161, mase:0.161,
> mise:0.161, mnse:0.161, mose:0.161, nasdaq:0.161, nyce:0.161, nycsce:0.161,
> nymex:0.161, nyse:0.161, ose:0.161, pse:0.161, set:0.136, simex:0.161,
> sse:0.161, stse:0.161, tose:0.161, tse:0.161, wce:0.161, zse:0.161]
>
> What I originally wanted to know is that are this vectors just the way
> clusterdumper prints them( i.e. are they dense vectors) or are they sparse
> vectors and  the clusterdumper iterates over the non-zero values and prints
> only those values. If they are sparse vectors, Can you kindly tell me in
> which directory are the vectors generated by the algorithm so I can read
> them.
>
> If the vectors are in dense format then I need to convert them to sparse
> vectors. As can be seen from the clusterdump outsput sample above,only the
> features which have non-zero values for each vector are being printed. the
> set of features which have non-zero values will differ from vector to
> vector. Consider we have 3 vectors f1,f2,f3 each with a set of nonzero
> features s1,s2 and s3 respectively. What I want is a set
>              S={s1 U s2 U s3}
> i.e. S is the union of the sets of non-zero features for each vector so
> that I can convert the dense vectors to sparse vectors.
>
> Your thoughts on this are welcome.
>
> Thanks,
> Ashvini
>
>
>
> On Mon, Aug 12, 2013 at 10:55 AM, Ted Dunning <ted.dunn...@gmail.com>
> wrote:
>
> > Aside from your issues with clusterdumper, the values you want can be had
> > from a sparse vector using v.iterateNonZero() and v.norm(0).
> >
> > The issue with clusterdumper is odd.
> >
> > Are you saying that the display shows all the components of the vector?
>  Or
> > that there is an in-memory representation that has been densified?
> >
> >
> >
> > On Sun, Aug 11, 2013 at 9:24 PM, Ashwini P <ashwini.a...@gmail.com>
> wrote:
> >
> > > Hello,
> > >
> > > I am new to mahout. I want to know how I can get the list of features
> > that
> > > where extracted from the corpus by seq2sparse and the count of the
> total
> > > number of features.
> > >
> > > My problem is that when I view the clustering output using
> clusterdumper
> > I
> > > get only dense vectors  for each point that belongs in the cluster but
> I
> > > want the sparse vector for each point. What I want to know is that are
> > the
> > > vectors output from the clustering algorithm stored as dense vector or
> is
> > > the clusterdumper  converting the vectors to dense vectors. If the
> > > clustering algorithm generates sparse vectors I can directly use them
> or
> > > else I will have to convert the vectors from dense to sparse for which
> I
> > > need the information mentioned in the above paragraph.
> > >
> > > Your suggestions on this are welcome.
> > >
> > > Thanks,
> > > Ashvini
> > >
> >
>

Reply via email to