Yes, how to run canopy/ kmeans on usigma output? What is the connecting
step? Please update on the same.

Thanks,
Rajesh
On May 30, 2013 10:09 PM, "Dmitriy Lyubimov" <dlie...@gmail.com> wrote:

> I.e. i guess you want to run kmeans directly on usigma output.
> On May 30, 2013 9:37 AM, "Dmitriy Lyubimov" <dlie...@gmail.com> wrote:
>
> > I believe this flow describes how to use lanczos svd in mahout to arrive
> > at the same reduction as ssvd already provides with pca and USigma
> options
> > in one step. This flow is irrelevant when working with ssvd, it already
> > does it all internally for you.
> > On May 30, 2013 5:45 AM, "Rajesh Nikam" <rajeshni...@gmail.com> wrote:
> >
> >> Hi Suneel/Dmitriy,
> >>
> >> I got mahout-examples-0.8-SNAPSHOT-job.jar compiled from trunk.
> >> Now I got -us param as your mentioned for the input set working.
> >>
> >> Steps followed are:
> >>
> >> mahout arff.vector --input /mnt/cluster/t/PE_EXE/input-set.arff --output
> >> /user/hadoop/t/input-set-vector/ --dictOut /mnt/cluster/t/input-set-dict
> >>
> >> hadoop jar mahout-examples-0.8-SNAPSHOT-job.jar
> >> org.apache.mahout.math.hadoop.stochasticsvd.SSVDCli --input
> >> /user/hadoop/t/input-set-vector/ --output /user/hadoop/t/input-set-svd/
> -k
> >> 50 --reduceTasks 2 -U true -V false -us true -ow
> >>
> >> Not able to understand what needs to be provided input to
> >> cleansvd/transpose/matrixmult as mentioned on following page, what needs
> >> to
> >> be used U/V/USigma and how.
> >>
> >> Again how to understand which features got in reduced matrix.
> >>
> >> https://cwiki.apache.org/MAHOUT/dimensional-reduction.html
> >>
> >> At a high level, the steps we're going to perform are:
> >>
> >> bin/mahout svd (original -> svdOut)
> >> bin/mahout cleansvd ...
> >> bin/mahout transpose svdOut -> svdT
> >> bin/mahout transpose original -> originalT
> >> bin/mahout matrixmult originalT svdT -> newMatrix
> >> bin/mahout kmeans newMatrix
> >>
> >> Thanks,
> >>
> >> Rajesh
> >>
> >>
> >>
> >> On Mon, May 27, 2013 at 11:31 AM, Suneel Marthi <
> suneel_mar...@yahoo.com
> >> >wrote:
> >>
> >> > Ahha, I see your problem now.
> >> >
> >> > The additional line in trunk was added as part of Mahout-1097 (long
> >> after
> >> > Mahout-0.7 release) and hence you wouldn't see the change in
> >> > mahout-examples-0.7-job.jar that you are working off of.  This fix is
> >> > presently available in trunk (and will be part of Mahout-0.8).
> >> >
> >> > I would recommend to work off of trunk for now and u should be good.
> >> >
> >> >
> >> >
> >> >
> >> > ________________________________
> >> >  From: Rajesh Nikam <rajeshni...@gmail.com>
> >> > To: user@mahout.apache.org
> >> > Sent: Monday, May 27, 2013 1:52 AM
> >> > Subject: Re: Fwd: Re: convert input for SVD
> >> >
> >> >
> >> > Hi Dmitriy / Suneel,
> >> >
> >> > You are pointing me to the correct solution. However I see difference
> >> > options in source code downloaded from (mahout-trunk.zip) and
> >> > mahout-examples-0.7-job.jar.
> >> >
> >> > Could you please verify the same at your end.
> >> >
> >> > ==>> from mahout-trunk.zip <<==
> >> >
> >> >     addOption("uHalfSigma",
> >> >               "uhs",
> >> >               "Compute U * Sigma^0.5",
> >> >               String.valueOf(false));
> >> > *    addOption("uSigma", "us", "Compute U * Sigma",
> >> > String.valueOf(false));*
> >> >     addOption("computeV", "V", "compute V (true/false)",
> >> > String.valueOf(true));
> >> >
> >> >
> >> > ==>> mahout-examples-0.7-job.jar <<==
> >> >
> >> >     addOption("uHalfSigma", "uhs", "Compute U as UHat=U x
> >> pow(Sigma,0.5)",
> >> > String.valueOf(false));
> >> >
> >> >     addOption("computeV", "V", "compute V (true/false)",
> >> > String.valueOf(true));
> >> >     addOption("vHalfSigma", "vhs", "compute V as VHat= V x
> >> pow(Sigma,0.5)",
> >> > String.valueOf(false));
> >> >
> >> >
> >> > Thanks,
> >> > Rajesh
> >> >
> >> >
> >> > On Fri, May 24, 2013 at 10:48 PM, Dmitriy Lyubimov <dlie...@gmail.com
> >> > >wrote:
> >> >
> >> > > "ssvd -us true...." should do this . Suneel says it still works on
> >> trunk.
> >> > >
> >> > >
> >> > > On Fri, May 24, 2013 at 9:38 AM, Rajesh Nikam <
> rajeshni...@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > Thanks Dmitriy & Suneel for comments. As you suggested I need to
> >> use U
> >> > *
> >> > > > Sigma.
> >> > > >
> >> > > > It means Need to get multiplication of these matrices.
> >> > > >
> >> > > > Which Mahout props to use for this?
> >> > > >
> >> > > > Other question was how to get features that are selected in U?
> >> > > > On May 24, 2013 8:45 PM, "Suneel Marthi" <suneel_mar...@yahoo.com
> >
> >> > > wrote:
> >> > > >
> >> > > > > Rajesh,
> >> > > > >
> >> > > > > I am working off of trunk and this works fine.
> >> > > > >
> >> > > > > As Dmitriy says u do need USigma.
> >> > > > >
> >> > > > > It would help to paste the entire stacktrace you are seeing with
> >> > > > > MatrixColumnMeansJob.
> >> > > > >
> >> > > > > If you are still seeing an issue, I would suggest that you work
> >> off
> >> > of
> >> > > > > trunk.
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > ________________________________
> >> > > > >  From: Dmitriy Lyubimov <dlie...@gmail.com>
> >> > > > > To: user@mahout.apache.org
> >> > > > > Sent: Friday, May 24, 2013 9:52 AM
> >> > > > > Subject: Re: Fwd: Re: convert input for SVD
> >> > > > >
> >> > > > >
> >> > > > > I think last time i verified this flow was as of
> >> > > > > https://issues.apache.org/jira/browse/MAHOUT-1097. It was
> woking
> >> > then.
> >> > > > Did
> >> > > > > not look at it since.
> >> > > > > On May 24, 2013 6:42 AM, "Dmitriy Lyubimov" <dlie...@gmail.com>
> >> > wrote:
> >> > > > >
> >> > > > > > Rajesh, you will get more help if you stay on the list.
> >> > > > > >
> >> > > > > > you do need u *sigma output. there is no substitute.
> >> > > > > >
> >> > > > > > If this option is indeed no longer there, i have no knowledge
> of
> >> > it.
> >> > > > > Maybe
> >> > > > > > there was some work committed that screwed that  but at the
> >> moment
> >> > i
> >> > > > have
> >> > > > > > no time to look at it. Obviously it was there at the time
> >> > > documentation
> >> > > > > was
> >> > > > > > written. I guess you may obtain an earlier snapshot as interim
> >> > > solution
> >> > > > > if
> >> > > > > > it is indeed the case.
> >> > > > > >
> >> > > > > > ---------- Forwarded message ----------
> >> > > > > > From: "Rajesh Nikam" <rajeshni...@gmail.com>
> >> > > > > > Date: May 24, 2013 3:20 AM
> >> > > > > > Subject: Re: convert input for SVD
> >> > > > > > To: <user@mahout.apache.org>
> >> > > > > > Cc:
> >> > > > > >
> >> > > > > > > Hello Dmitriy,
> >> > > > > > >
> >> > > > > > > Thanks for reply.
> >> > > > > > >
> >> > > > > > > I see similar discussion on following link where I see your
> >> > reply.
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> http://www.searchworkings.org/forum/-/message_boards/view_message/517870#_19_message_519704
> >> > > > > > >
> >> > > > > > > I do also have same problem, need to apply dimensionality
> >> > reduction
> >> > > > and
> >> > > > > > use
> >> > > > > > > clustering algo on reduced features.
> >> > > > > > >
> >> > > > > > > Seems parameters for ssvd are changed from mentioned in
> >> > > SSVD-CLI.pdf.
> >> > > > > It
> >> > > > > > no
> >> > > > > > > longer shows *-us *as parameter
> >> > > > > > >
> >> > > > > > > I am using mahout-examples-0.7-job.jar
> >> > > > > > >
> >> > > > > > > mahout ssvd --input /user/hadoop/t/input-set-vector/
> --output
> >> > > > > > > /user/hadoop/t/input-set-svd/ -k 200 --reduceTasks 2 -pca
> >> true -U
> >> > > > true
> >> > > > > -V
> >> > > > > > > false *-us true* -ow -q 1
> >> > > > > > >
> >> > > > > > > giving option as "*-pca true*" gives error as
> >> > > > > > >
> >> > > > > > > at
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
> >> > > > > > >         at
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
> >> > > > > > >
> >> > > > > > > So I removed it.
> >> > > > > > >
> >> > > > > > > mahout ssvd --input /user/hadoop/t/input-set-vector/
> --output
> >> > > > > > > /user/hadoop/t/input-set-svd/ -k 200 --reduceTasks 2 -U true
> >> -V
> >> > > false
> >> > > > > > *-us
> >> > > > > > > true* -ow -q 1
> >> > > > > > >
> >> > > > > > > *>> *with above command *>> Unexpected -us *while processing
> >> > > > > Job-Specific
> >> > > > > > > Options.
> >> > > > > > >
> >> > > > > > > I tried with "-U false -V false -uhs true" it just generated
> >> > sigma
> >> > > > file
> >> > > > > > as
> >> > > > > > > expected however no "Usigma"
> >> > > > > > >
> >> > > > > > > hadoop fs -lsr /user/hadoop/t/PE_EXE/input-set-svd/
> >> > > > > > >
> >> > > > > > > -rw-r--r--   2 hadoop supergroup       1712 2013-05-24 15:34
> >> > > > > > > /user/hadoop/t/PE_EXE/input-set-svd/sigma
> >> > > > > > >
> >> > > > > > > Then with *"-U true -V false -uhs true" *output dir U is
> >> created.
> >> > > > > > > *
> >> > > > > > > *drwxr-xr-x   - hadoop supergroup          0 2013-05-24
> 15:39
> >> > > > > > > /user/hadoop/t/PE_EXE/input-set-svd/U
> >> > > > > > > -rw-r--r--   2 hadoop supergroup       1712 2013-05-24 15:39
> >> > > > > > > /user/hadoop/t/PE_EXE/input-set-svd/sigma*
> >> > > > > > > *
> >> > > > > > >
> >> > > > > > > My problem is how to use these U/V/sigma file as input to
> >> > > > > canopy/kmeans ?
> >> > > > > > >
> >> > > > > > > How to identify which important features from U/Sigma that
> are
> >> > > > retained
> >> > > > > > in
> >> > > > > > > dimensionality reduction ?
> >> > > > > > >
> >> > > > > > > Thanks in Advance !
> >> > > > > > > Rajesh
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > On Fri, May 24, 2013 at 7:01 AM, Dmitriy Lyubimov <
> >> > > dlie...@gmail.com
> >> > > > >
> >> > > > > > wrote:
> >> > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/download/attachments/27832158/SSVD-CLI.pdf?version=17&modificationDate=1349999085000
> >> > > > > > > > :
> >> > > > > > > >
> >> > > > > > > > "In most cases where you might be looking to reduce
> >> > > > > > > > dimensionality while retaining variance, you probably need
> >> > > > > combination
> >> > > > > > of
> >> > > > > > > > options -pca true -U false -V
> >> > > > > > > > false -us true.
> >> > > > > > > >
> >> > > > > > > > See ยง3 for details"
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > On Thu, May 23, 2013 at 6:24 PM, Dmitriy Lyubimov <
> >> > > > dlie...@gmail.com
> >> > > > > >
> >> > > > > > > > wrote:
> >> > > > > > > >
> >> > > > > > > > > Also, for the dimensionality reduction it is important
> >> among
> >> > > > other
> >> > > > > > things
> >> > > > > > > > > to re-center your input first, which is why you also
> want
> >> > "-pca
> >> > > > > > true".
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > On Thu, May 23, 2013 at 6:23 PM, Dmitriy Lyubimov <
> >> > > > > dlie...@gmail.com
> >> > > > > > > > >wrote:
> >> > > > > > > > >
> >> > > > > > > > >> did you specify -us option? SSVD by default produces
> only
> >> > U, V
> >> > > > and
> >> > > > > > > > Sigma.
> >> > > > > > > > >> but it can produce more, e.g. U*Sigma, U*sqrt(Sigma)
> >> etc. if
> >> > > you
> >> > > > > > ask for
> >> > > > > > > > >> it. And, alternatively, you can suppress any of U, V
> (you
> >> > > can't
> >> > > > > > suppress
> >> > > > > > > > >> sigma but that doesn't cost anything in space anyway).
> >> > > > > > > > >>
> >> > > > > > > > >>
> >> > > > > > > > >> On Thu, May 23, 2013 at 6:20 PM, Rajesh Nikam <
> >> > > > > > rajeshni...@gmail.com
> >> > > > > > > > >wrote:
> >> > > > > > > > >>
> >> > > > > > > > >>> I got all three U, V & sigma from ssvd, however which
> to
> >> > use
> >> > > as
> >> > > > > > input
> >> > > > > > > > to
> >> > > > > > > > >>> canopy?
> >> > > > > > > > >>> On May 24, 2013 6:47 AM, "Dmitriy Lyubimov" <
> >> > > dlie...@gmail.com
> >> > > > >
> >> > > > > > wrote:
> >> > > > > > > > >>>
> >> > > > > > > > >>> > I think you want U*Sigma
> >> > > > > > > > >>> >
> >> > > > > > > > >>> > What you want is ssvd ... -pca true ... -us true ...
> >> see
> >> > > the
> >> > > > > > manual
> >> > > > > > > > >>> >
> >> > > > > > > > >>> >
> >> > > > > > > > >>> >
> >> > > > > > > > >>> >
> >> > > > > > > > >>> > On Thu, May 23, 2013 at 6:07 PM, Rajesh Nikam <
> >> > > > > > rajeshni...@gmail.com
> >> > > > > > > > >
> >> > > > > > > > >>> > wrote:
> >> > > > > > > > >>> >
> >> > > > > > > > >>> > > Sorry for confusion. Here number of clusters are
> >> > decided
> >> > > by
> >> > > > > > canopy.
> >> > > > > > > > >>> With
> >> > > > > > > > >>> > > data as it has 60 to 70 clusters.
> >> > > > > > > > >>> > >
> >> > > > > > > > >>> > > My question is which part from ssvd output U, V,
> >> Sigma
> >> > > > should
> >> > > > > > be
> >> > > > > > > > >>> used as
> >> > > > > > > > >>> > > input to canopy?
> >> > > > > > > > >>> > >  On May 24, 2013 3:56 AM, "Ted Dunning" <
> >> > > > > ted.dunn...@gmail.com
> >> > > > > > >
> >> > > > > > > > >>> wrote:
> >> > > > > > > > >>> > >
> >> > > > > > > > >>> > > > Rajesh,
> >> > > > > > > > >>> > > >
> >> > > > > > > > >>> > > > This is very confusing.
> >> > > > > > > > >>> > > >
> >> > > > > > > > >>> > > > You have 1500 things that you are clustering
> into
> >> > more
> >> > > > than
> >> > > > > > 1400
> >> > > > > > > > >>> > > clusters.
> >> > > > > > > > >>> > > >
> >> > > > > > > > >>> > > > There is no way for most of these clusters to
> >> have >1
> >> > > > > member
> >> > > > > > just
> >> > > > > > > > >>> > because
> >> > > > > > > > >>> > > > there aren't enough clusters compared to the
> >> items.
> >> > > > > > > > >>> > > >
> >> > > > > > > > >>> > > > Is there a typo here?
> >> > > > > > > > >>> > > >
> >> > > > > > > > >>> > > >
> >> > > > > > > > >>> > > >
> >> > > > > > > > >>> > > >
> >> > > > > > > > >>> > > > On Thu, May 23, 2013 at 5:34 AM, Rajesh Nikam <
> >> > > > > > > > >>> rajeshni...@gmail.com>
> >> > > > > > > > >>> > > > wrote:
> >> > > > > > > > >>> > > >
> >> > > > > > > > >>> > > > > Hi,
> >> > > > > > > > >>> > > > >
> >> > > > > > > > >>> > > > > I have input test set of 1500 instances with
> >> 1000+
> >> > > > > > features. I
> >> > > > > > > > >>> want
> >> > > > > > > > >>> > to
> >> > > > > > > > >>> > > to
> >> > > > > > > > >>> > > > > SVD to reduce features. I have followed
> >> following
> >> > > steps
> >> > > > > > with
> >> > > > > > > > >>> generate
> >> > > > > > > > >>> > > > 1400+
> >> > > > > > > > >>> > > > > clusters 99% of clusters contain 1 instance :(
> >> > > > > > > > >>> > > > >
> >> > > > > > > > >>> > > > > Please let me know what is wrong in below
> steps
> >> -
> >> > > > > > > > >>> > > > >
> >> > > > > > > > >>> > > > >
> >> > > > > > > > >>> > > > > mahout arff.vector --input
> >> > > > /mnt/cluster/t/input-set.arff
> >> > > > > > > > --output
> >> > > > > > > > >>> > > > > /user/hadoop/t/input-set-vector/ --dictOut
> >> > > > > > > > >>> > > /mnt/cluster/t/input-set-dict
> >> > > > > > > > >>> > > > >
> >> > > > > > > > >>> > > > > mahout ssvd --input
> >> > /user/hadoop/t/input-set-vector/
> >> > > > > > --output
> >> > > > > > > > >>> > > > > /user/hadoop/t/input-set-svd/ -k 200
> >> --reduceTasks
> >> > 2
> >> > > > -ow
> >> > > > > > > > >>> > > > >
> >> > > > > > > > >>> > > > > mahout canopy -i
> >> */user/hadoop/t/input-set-svd/U*
> >> > -o
> >> > > > > > > > >>> > > > > /user/hadoop/t/input-set-canopy-centroids -dm
> >> > > > > > > > >>> > > > >
> >> > > > org.apache.mahout.common.distance.TanimotoDistanceMeasure
> >> > > > > > *-t1
> >> > > > > > > > >>> 0.001
> >> > > > > > > > >>> > > -t2
> >> > > > > > > > >>> > > > > 0.002*
> >> > > > > > > > >>> > > > >
> >> > > > > > > > >>> > > > > mahout kmeans -i
> >> */user/hadoop/t/input-set-svd/U*
> >> > -c
> >> > > > > > > > >>> > > > >
> >> > > > > /user/hadoop/t/input-set-canopy-centroids/clusters-0-final
> >> > > > > > -cl
> >> > > > > > > > -o
> >> > > > > > > > >>> > > > > /user/hadoop/t/input-set-kmeans-clusters -ow
> -x
> >> 10
> >> > > -dm
> >> > > > > > > > >>> > > > >
> >> > > > org.apache.mahout.common.distance.TanimotoDistanceMeasure
> >> > > > > > > > >>> > > > >
> >> > > > > > > > >>> > > > > mahout clusterdump -dt sequencefile -i
> >> > > > > > > > >>> > > > >
> >> > > > > /user/hadoop/t/input-set-kmeans-clusters/clusters-1-final/
> >> > > > > > -n
> >> > > > > > > > 20
> >> > > > > > > > >>> -b
> >> > > > > > > > >>> > 100
> >> > > > > > > > >>> > > > -o
> >> > > > > > > > >>> > > > > /mnt/cluster/t/cdump-input-set.txt -p
> >> > > > > > > > >>> > > > >
> >> > > > /user/hadoop/t/input-set-kmeans-clusters/clusteredPoints/
> >> > > > > > > > >>> --evaluate
> >> > > > > > > > >>> > > > >
> >> > > > > > > > >>> > > > > Thanks in advance !
> >> > > > > > > > >>> > > > >
> >> > > > > > > > >>> > > > > Rajesh
> >> > > > > > > > >>> > > > >
> >> > > > > > > > >>> > > > >
> >> > > > > > > > >>> > > > >
> >> > > > > > > > >>> > > > >
> >> > > > > > > > >>> > > > > On Wed, May 22, 2013 at 2:18 AM, Dmitriy
> >> Lyubimov <
> >> > > > > > > > >>> dlie...@gmail.com
> >> > > > > > > > >>> > >
> >> > > > > > > > >>> > > > > wrote:
> >> > > > > > > > >>> > > > >
> >> > > > > > > > >>> > > > > > PPS As far as the tool for arff, i am
> frankly
> >> not
> >> > > > sure.
> >> > > > > > but
> >> > > > > > > > it
> >> > > > > > > > >>> > sounds
> >> > > > > > > > >>> > > > > like
> >> > > > > > > > >>> > > > > > you've already solved this.
> >> > > > > > > > >>> > > > > >
> >> > > > > > > > >>> > > > > >
> >> > > > > > > > >>> > > > > > On Tue, May 21, 2013 at 1:41 PM, Dmitriy
> >> > Lyubimov <
> >> > > > > > > > >>> > dlie...@gmail.com
> >> > > > > > > > >>> > > >
> >> > > > > > > > >>> > > > > > wrote:
> >> > > > > > > > >>> > > > > >
> >> > > > > > > > >>> > > > > > > ps as far as U, V data "close to zero",
> yes
> >> > > that's
> >> > > > > what
> >> > > > > > > > you'd
> >> > > > > > > > >>> > > expect.
> >> > > > > > > > >>> > > > > > >
> >> > > > > > > > >>> > > > > > > Here, by "close to zero" it still means
> much
> >> > > bigger
> >> > > > > > than a
> >> > > > > > > > >>> > rounding
> >> > > > > > > > >>> > > > > error
> >> > > > > > > > >>> > > > > > > of course. e.g. 1E-12 is indeed a small
> >> number,
> >> > > and
> >> > > > > > 1E-16
> >> > > > > > > > to
> >> > > > > > > > >>> > 1E-18
> >> > > > > > > > >>> > > > > would
> >> > > > > > > > >>> > > > > > be
> >> > > > > > > > >>> > > > > > > indeed "close to zero" for the purposes of
> >> > > > > singularity.
> >> > > > > > > > >>> > 1E-2..1E-5
> >> > > > > > > > >>> > > > are
> >> > > > > > > > >>> > > > > > > actually quite  "sizeable" numbers by the
> >> scale
> >> > > of
> >> > > > > > IEEE 754
> >> > > > > > > > >>> > > > > arithmetics.
> >> > > > > > > > >>> > > > > > >
> >> > > > > > > > >>> > > > > > > U and V are orthonormal (which means their
> >> > column
> >> > > > > > vectors
> >> > > > > > > > >>> have
> >> > > > > > > > >>> > > > > euclidiean
> >> > > > > > > > >>> > > > > > > norm of 1) . Note that for large m and n
> >> (large
> >> > > > > inputs)
> >> > > > > > > > they
> >> > > > > > > > >>> are
> >> > > > > > > > >>> > > also
> >> > > > > > > > >>> > > > > > > extremely skinny. The larger input is, the
> >> > > smaller
> >> > > > > the
> >> > > > > > > > >>> element
> >> > > > > > > > >>> > of U
> >> > > > > > > > >>> > > > > > or/and
> >> > > > > > > > >>> > > > > > > V is gonna be.
> >> > > > > > > > >>> > > > > > >
> >> > > > > > > > >>> > > > > > >
> >> > > > > > > > >>> > > > > > >
> >> > > > > > > > >>> > > > > > > On Tue, May 21, 2013 at 8:48 AM, Dmitriy
> >> > > Lyubimov <
> >> > > > > > > > >>> > > dlie...@gmail.com
> >> > > > > > > > >>> > > > > > >wrote:
> >> > > > > > > > >>> > > > > > >
> >> > > > > > > > >>> > > > > > >> Sounds like dimensionality reduction to
> me.
> >> > You
> >> > > > may
> >> > > > > > want
> >> > > > > > > > to
> >> > > > > > > > >>> use
> >> > > > > > > > >>> > > ssvd
> >> > > > > > > > >>> > > > > > -pca
> >> > > > > > > > >>> > > > > > >>
> >> > > > > > > > >>> > > > > > >> Apologies for brevity. Sent from my
> Android
> >> > > phone.
> >> > > > > > > > >>> > > > > > >> -Dmitriy
> >> > > > > > > > >>> > > > > > >> On May 21, 2013 6:27 AM, "Rajesh Nikam" <
> >> > > > > > > > >>> rajeshni...@gmail.com>
> >> > > > > > > > >>> > > > > wrote:
> >> > > > > > > > >>> > > > > > >>
> >> > > > > > > > >>> > > > > > >>> Hello Ted,
> >> > > > > > > > >>> > > > > > >>>
> >> > > > > > > > >>> > > > > > >>> Thanks for reply.
> >> > > > > > > > >>> > > > > > >>>
> >> > > > > > > > >>> > > > > > >>> I have started exploring SVD based on
> its
> >> > > mention
> >> > > > > of
> >> > > > > > > > could
> >> > > > > > > > >>> help
> >> > > > > > > > >>> > > to
> >> > > > > > > > >>> > > > > drop
> >> > > > > > > > >>> > > > > > >>> features which are not relevant for
> >> > clustering.
> >> > > > > > > > >>> > > > > > >>>
> >> > > > > > > > >>> > > > > > >>> My objective is reduce number of
> features
> >> > > before
> >> > > > > > passing
> >> > > > > > > > >>> them
> >> > > > > > > > >>> > to
> >> > > > > > > > >>> > > > > > >>> clustering
> >> > > > > > > > >>> > > > > > >>> and just keep important features.
> >> > > > > > > > >>> > > > > > >>>
> >> > > > > > > > >>> > > > > > >>> arff/csv==> ssvd (for dimensionality
> >> > reduction)
> >> > > > ==>
> >> > > > > > > > >>> clustering
> >> > > > > > > > >>> > > > > > >>>
> >> > > > > > > > >>> > > > > > >>> Could you please illustrate mahout props
> >> to
> >> > > join
> >> > > > > > above
> >> > > > > > > > >>> > pipeline.
> >> > > > > > > > >>> > > > > > >>>
> >> > > > > > > > >>> > > > > > >>> I think, Lanczos SVD needs to be used
> for
> >> mxm
> >> > > > > matrix.
> >> > > > > > > > >>> > > > > > >>>
> >> > > > > > > > >>> > > > > > >>> I have tried check ssvd, I have used
> >> > > arff.vector
> >> > > > to
> >> > > > > > > > covert
> >> > > > > > > > >>> > > arff/csv
> >> > > > > > > > >>> > > > > to
> >> > > > > > > > >>> > > > > > >>> vector file which is then give as input
> to
> >> > ssvd
> >> > > > and
> >> > > > > > them
> >> > > > > > > > >>> dumped
> >> > > > > > > > >>> > > U,
> >> > > > > > > > >>> > > > V
> >> > > > > > > > >>> > > > > > and
> >> > > > > > > > >>> > > > > > >>> sigma using vectordump.
> >> > > > > > > > >>> > > > > > >>>
> >> > > > > > > > >>> > > > > > >>> I see most of the values dumped are near
> >> to
> >> > 0.
> >> > > I
> >> > > > > dont
> >> > > > > > > > >>> > understand
> >> > > > > > > > >>> > > is
> >> > > > > > > > >>> > > > > > this
> >> > > > > > > > >>> > > > > > >>> correct or not.
> >> > > > > > > > >>> > > > > > >>>
> >> > > > > > > > >>> > > > > > >>>
> >> > > > > > > > >>> > > > > > >>>
> >> > > > > > > > >>> > > > > >
> >> > > > > > > > >>> > > > >
> >> > > > > > > > >>> > > >
> >> > > > > > > > >>> > >
> >> > > > > > > > >>> >
> >> > > > > > > > >>>
> >> > > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> {0:0.01066724825049657,1:0.016715498597386844,2:2.0187750952311708E-4,3:3.401020567221039E-4,4:-1.2388403347280688E-4,5:6.41502463540719E-5,6:-1.359187582538833E-4,7:6.329813140445419E-5,8:1.670015585746444E-4,9:3.5415113034592744E-4,10:7.108868213280763E-4,11:0.020553517552052456,12:-0.015118680942548916,13:0.007981746711271956,14:-0.003251236468768259,15:0.0038075014396303053,16:-0.0010925318534013683,17:-0.0026943024876179833,18:-0.001744794617721648,19:-0.0024528466548735714}
> >> > > > > > > > >>> > > > > > >>>
> >> > > > > > > > >>> > > > > > >>>
> >> > > > > > > > >>> > > > > >
> >> > > > > > > > >>> > > > >
> >> > > > > > > > >>> > > >
> >> > > > > > > > >>> > >
> >> > > > > > > > >>> >
> >> > > > > > > > >>>
> >> > > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> {0:0.029978614322360833,1:-0.01431521245087889,2:1.3318592088199427E-4,3:1.495356283071516E-4,4:8.762709213918985E-5,5:1.2765191352425177E-
> >> > > > > > > > >>> > > > > > >>>
> >> > > > > > > > >>> > > > > > >>> Thanks,
> >> > > > > > > > >>> > > > > > >>> Rajesh
> >> > > > > > > > >>> > > > > > >>>
> >> > > > > > > > >>> > > > > > >>>
> >> > > > > > > > >>> > > > > > >>>
> >> > > > > > > > >>> > > > > > >>> On Tue, May 21, 2013 at 11:35 AM, Ted
> >> > Dunning <
> >> > > > > > > > >>> > > > ted.dunn...@gmail.com
> >> > > > > > > > >>> > > > > >
> >> > > > > > > > >>> > > > > > >>> wrote:
> >> > > > > > > > >>> > > > > > >>>
> >> > > > > > > > >>> > > > > > >>> > Are you using Lanczos instead of SSVD
> >> for a
> >> > > > > reason?
> >> > > > > > > > >>> > > > > > >>> >
> >> > > > > > > > >>> > > > > > >>> >
> >> > > > > > > > >>> > > > > > >>> >
> >> > > > > > > > >>> > > > > > >>> >
> >> > > > > > > > >>> > > > > > >>> > On Mon, May 20, 2013 at 4:13 AM,
> Rajesh
> >> > > Nikam <
> >> > > > > > > > >>> > > > > rajeshni...@gmail.com
> >> > > > > > > > >>> > > > > > >
> >> > > > > > > > >>> > > > > > >>> > wrote:
> >> > > > > > > > >>> > > > > > >>> >
> >> > > > > > > > >>> > > > > > >>> > > Hello,
> >> > > > > > > > >>> > > > > > >>> > >
> >> > > > > > > > >>> > > > > > >>> > > I have arff / csv file containing
> >> input
> >> > > data
> >> > > > > > that I
> >> > > > > > > > >>> want to
> >> > > > > > > > >>> > > > pass
> >> > > > > > > > >>> > > > > to
> >> > > > > > > > >>> > > > > > >>> svd :
> >> > > > > > > > >>> > > > > > >>> > > Lanczos Singular Value
> Decomposition.
> >> > > > > > > > >>> > > > > > >>> > >
> >> > > > > > > > >>> > > > > > >>> > > Which tool to use to convert it to
> >> > required
> >> > > > > > format ?
> >> > > > > > > > >>> > > > > > >>> > >
> >> > > > > > > > >>> > > > > > >>> > > Thanks in Advance !
> >> > > > > > > > >>> > > > > > >>> > >
> >> > > > > > > > >>> > > > > > >>> > > Thanks,
> >> > > > > > > > >>> > > > > > >>> > > Rajesh
> >> > > > > > > > >>> > > > > > >>> > >
> >> > > > > > > > >>> > > > > > >>> >
> >> > > > > > > > >>> > > > > > >>>
> >> > > > > > > > >>> > > > > > >>
> >> > > > > > > > >>> > > > > > >
> >> > > > > > > > >>> > > > > >
> >> > > > > > > > >>> > > > >
> >> > > > > > > > >>> > > >
> >> > > > > > > > >>> > >
> >> > > > > > > > >>> >
> >> > > > > > > > >>>
> >> > > > > > > > >>
> >> > > > > > > > >>
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>

Reply via email to