I.e. i guess you want to run kmeans directly on usigma output.
On May 30, 2013 9:37 AM, "Dmitriy Lyubimov" <dlie...@gmail.com> wrote:

> I believe this flow describes how to use lanczos svd in mahout to arrive
> at the same reduction as ssvd already provides with pca and USigma options
> in one step. This flow is irrelevant when working with ssvd, it already
> does it all internally for you.
> On May 30, 2013 5:45 AM, "Rajesh Nikam" <rajeshni...@gmail.com> wrote:
>
>> Hi Suneel/Dmitriy,
>>
>> I got mahout-examples-0.8-SNAPSHOT-job.jar compiled from trunk.
>> Now I got -us param as your mentioned for the input set working.
>>
>> Steps followed are:
>>
>> mahout arff.vector --input /mnt/cluster/t/PE_EXE/input-set.arff --output
>> /user/hadoop/t/input-set-vector/ --dictOut /mnt/cluster/t/input-set-dict
>>
>> hadoop jar mahout-examples-0.8-SNAPSHOT-job.jar
>> org.apache.mahout.math.hadoop.stochasticsvd.SSVDCli --input
>> /user/hadoop/t/input-set-vector/ --output /user/hadoop/t/input-set-svd/ -k
>> 50 --reduceTasks 2 -U true -V false -us true -ow
>>
>> Not able to understand what needs to be provided input to
>> cleansvd/transpose/matrixmult as mentioned on following page, what needs
>> to
>> be used U/V/USigma and how.
>>
>> Again how to understand which features got in reduced matrix.
>>
>> https://cwiki.apache.org/MAHOUT/dimensional-reduction.html
>>
>> At a high level, the steps we're going to perform are:
>>
>> bin/mahout svd (original -> svdOut)
>> bin/mahout cleansvd ...
>> bin/mahout transpose svdOut -> svdT
>> bin/mahout transpose original -> originalT
>> bin/mahout matrixmult originalT svdT -> newMatrix
>> bin/mahout kmeans newMatrix
>>
>> Thanks,
>>
>> Rajesh
>>
>>
>>
>> On Mon, May 27, 2013 at 11:31 AM, Suneel Marthi <suneel_mar...@yahoo.com
>> >wrote:
>>
>> > Ahha, I see your problem now.
>> >
>> > The additional line in trunk was added as part of Mahout-1097 (long
>> after
>> > Mahout-0.7 release) and hence you wouldn't see the change in
>> > mahout-examples-0.7-job.jar that you are working off of.  This fix is
>> > presently available in trunk (and will be part of Mahout-0.8).
>> >
>> > I would recommend to work off of trunk for now and u should be good.
>> >
>> >
>> >
>> >
>> > ________________________________
>> >  From: Rajesh Nikam <rajeshni...@gmail.com>
>> > To: user@mahout.apache.org
>> > Sent: Monday, May 27, 2013 1:52 AM
>> > Subject: Re: Fwd: Re: convert input for SVD
>> >
>> >
>> > Hi Dmitriy / Suneel,
>> >
>> > You are pointing me to the correct solution. However I see difference
>> > options in source code downloaded from (mahout-trunk.zip) and
>> > mahout-examples-0.7-job.jar.
>> >
>> > Could you please verify the same at your end.
>> >
>> > ==>> from mahout-trunk.zip <<==
>> >
>> >     addOption("uHalfSigma",
>> >               "uhs",
>> >               "Compute U * Sigma^0.5",
>> >               String.valueOf(false));
>> > *    addOption("uSigma", "us", "Compute U * Sigma",
>> > String.valueOf(false));*
>> >     addOption("computeV", "V", "compute V (true/false)",
>> > String.valueOf(true));
>> >
>> >
>> > ==>> mahout-examples-0.7-job.jar <<==
>> >
>> >     addOption("uHalfSigma", "uhs", "Compute U as UHat=U x
>> pow(Sigma,0.5)",
>> > String.valueOf(false));
>> >
>> >     addOption("computeV", "V", "compute V (true/false)",
>> > String.valueOf(true));
>> >     addOption("vHalfSigma", "vhs", "compute V as VHat= V x
>> pow(Sigma,0.5)",
>> > String.valueOf(false));
>> >
>> >
>> > Thanks,
>> > Rajesh
>> >
>> >
>> > On Fri, May 24, 2013 at 10:48 PM, Dmitriy Lyubimov <dlie...@gmail.com
>> > >wrote:
>> >
>> > > "ssvd -us true...." should do this . Suneel says it still works on
>> trunk.
>> > >
>> > >
>> > > On Fri, May 24, 2013 at 9:38 AM, Rajesh Nikam <rajeshni...@gmail.com>
>> > > wrote:
>> > >
>> > > > Thanks Dmitriy & Suneel for comments. As you suggested I need to
>> use U
>> > *
>> > > > Sigma.
>> > > >
>> > > > It means Need to get multiplication of these matrices.
>> > > >
>> > > > Which Mahout props to use for this?
>> > > >
>> > > > Other question was how to get features that are selected in U?
>> > > > On May 24, 2013 8:45 PM, "Suneel Marthi" <suneel_mar...@yahoo.com>
>> > > wrote:
>> > > >
>> > > > > Rajesh,
>> > > > >
>> > > > > I am working off of trunk and this works fine.
>> > > > >
>> > > > > As Dmitriy says u do need USigma.
>> > > > >
>> > > > > It would help to paste the entire stacktrace you are seeing with
>> > > > > MatrixColumnMeansJob.
>> > > > >
>> > > > > If you are still seeing an issue, I would suggest that you work
>> off
>> > of
>> > > > > trunk.
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > ________________________________
>> > > > >  From: Dmitriy Lyubimov <dlie...@gmail.com>
>> > > > > To: user@mahout.apache.org
>> > > > > Sent: Friday, May 24, 2013 9:52 AM
>> > > > > Subject: Re: Fwd: Re: convert input for SVD
>> > > > >
>> > > > >
>> > > > > I think last time i verified this flow was as of
>> > > > > https://issues.apache.org/jira/browse/MAHOUT-1097. It was woking
>> > then.
>> > > > Did
>> > > > > not look at it since.
>> > > > > On May 24, 2013 6:42 AM, "Dmitriy Lyubimov" <dlie...@gmail.com>
>> > wrote:
>> > > > >
>> > > > > > Rajesh, you will get more help if you stay on the list.
>> > > > > >
>> > > > > > you do need u *sigma output. there is no substitute.
>> > > > > >
>> > > > > > If this option is indeed no longer there, i have no knowledge of
>> > it.
>> > > > > Maybe
>> > > > > > there was some work committed that screwed that  but at the
>> moment
>> > i
>> > > > have
>> > > > > > no time to look at it. Obviously it was there at the time
>> > > documentation
>> > > > > was
>> > > > > > written. I guess you may obtain an earlier snapshot as interim
>> > > solution
>> > > > > if
>> > > > > > it is indeed the case.
>> > > > > >
>> > > > > > ---------- Forwarded message ----------
>> > > > > > From: "Rajesh Nikam" <rajeshni...@gmail.com>
>> > > > > > Date: May 24, 2013 3:20 AM
>> > > > > > Subject: Re: convert input for SVD
>> > > > > > To: <user@mahout.apache.org>
>> > > > > > Cc:
>> > > > > >
>> > > > > > > Hello Dmitriy,
>> > > > > > >
>> > > > > > > Thanks for reply.
>> > > > > > >
>> > > > > > > I see similar discussion on following link where I see your
>> > reply.
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://www.searchworkings.org/forum/-/message_boards/view_message/517870#_19_message_519704
>> > > > > > >
>> > > > > > > I do also have same problem, need to apply dimensionality
>> > reduction
>> > > > and
>> > > > > > use
>> > > > > > > clustering algo on reduced features.
>> > > > > > >
>> > > > > > > Seems parameters for ssvd are changed from mentioned in
>> > > SSVD-CLI.pdf.
>> > > > > It
>> > > > > > no
>> > > > > > > longer shows *-us *as parameter
>> > > > > > >
>> > > > > > > I am using mahout-examples-0.7-job.jar
>> > > > > > >
>> > > > > > > mahout ssvd --input /user/hadoop/t/input-set-vector/ --output
>> > > > > > > /user/hadoop/t/input-set-svd/ -k 200 --reduceTasks 2 -pca
>> true -U
>> > > > true
>> > > > > -V
>> > > > > > > false *-us true* -ow -q 1
>> > > > > > >
>> > > > > > > giving option as "*-pca true*" gives error as
>> > > > > > >
>> > > > > > > at
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
>> > > > > > >         at
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
>> > > > > > >
>> > > > > > > So I removed it.
>> > > > > > >
>> > > > > > > mahout ssvd --input /user/hadoop/t/input-set-vector/ --output
>> > > > > > > /user/hadoop/t/input-set-svd/ -k 200 --reduceTasks 2 -U true
>> -V
>> > > false
>> > > > > > *-us
>> > > > > > > true* -ow -q 1
>> > > > > > >
>> > > > > > > *>> *with above command *>> Unexpected -us *while processing
>> > > > > Job-Specific
>> > > > > > > Options.
>> > > > > > >
>> > > > > > > I tried with "-U false -V false -uhs true" it just generated
>> > sigma
>> > > > file
>> > > > > > as
>> > > > > > > expected however no "Usigma"
>> > > > > > >
>> > > > > > > hadoop fs -lsr /user/hadoop/t/PE_EXE/input-set-svd/
>> > > > > > >
>> > > > > > > -rw-r--r--   2 hadoop supergroup       1712 2013-05-24 15:34
>> > > > > > > /user/hadoop/t/PE_EXE/input-set-svd/sigma
>> > > > > > >
>> > > > > > > Then with *"-U true -V false -uhs true" *output dir U is
>> created.
>> > > > > > > *
>> > > > > > > *drwxr-xr-x   - hadoop supergroup          0 2013-05-24 15:39
>> > > > > > > /user/hadoop/t/PE_EXE/input-set-svd/U
>> > > > > > > -rw-r--r--   2 hadoop supergroup       1712 2013-05-24 15:39
>> > > > > > > /user/hadoop/t/PE_EXE/input-set-svd/sigma*
>> > > > > > > *
>> > > > > > >
>> > > > > > > My problem is how to use these U/V/sigma file as input to
>> > > > > canopy/kmeans ?
>> > > > > > >
>> > > > > > > How to identify which important features from U/Sigma that are
>> > > > retained
>> > > > > > in
>> > > > > > > dimensionality reduction ?
>> > > > > > >
>> > > > > > > Thanks in Advance !
>> > > > > > > Rajesh
>> > > > > > >
>> > > > > > >
>> > > > > > > On Fri, May 24, 2013 at 7:01 AM, Dmitriy Lyubimov <
>> > > dlie...@gmail.com
>> > > > >
>> > > > > > wrote:
>> > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/download/attachments/27832158/SSVD-CLI.pdf?version=17&modificationDate=1349999085000
>> > > > > > > > :
>> > > > > > > >
>> > > > > > > > "In most cases where you might be looking to reduce
>> > > > > > > > dimensionality while retaining variance, you probably need
>> > > > > combination
>> > > > > > of
>> > > > > > > > options -pca true -U false -V
>> > > > > > > > false -us true.
>> > > > > > > >
>> > > > > > > > See ยง3 for details"
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > On Thu, May 23, 2013 at 6:24 PM, Dmitriy Lyubimov <
>> > > > dlie...@gmail.com
>> > > > > >
>> > > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > Also, for the dimensionality reduction it is important
>> among
>> > > > other
>> > > > > > things
>> > > > > > > > > to re-center your input first, which is why you also want
>> > "-pca
>> > > > > > true".
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > On Thu, May 23, 2013 at 6:23 PM, Dmitriy Lyubimov <
>> > > > > dlie...@gmail.com
>> > > > > > > > >wrote:
>> > > > > > > > >
>> > > > > > > > >> did you specify -us option? SSVD by default produces only
>> > U, V
>> > > > and
>> > > > > > > > Sigma.
>> > > > > > > > >> but it can produce more, e.g. U*Sigma, U*sqrt(Sigma)
>> etc. if
>> > > you
>> > > > > > ask for
>> > > > > > > > >> it. And, alternatively, you can suppress any of U, V (you
>> > > can't
>> > > > > > suppress
>> > > > > > > > >> sigma but that doesn't cost anything in space anyway).
>> > > > > > > > >>
>> > > > > > > > >>
>> > > > > > > > >> On Thu, May 23, 2013 at 6:20 PM, Rajesh Nikam <
>> > > > > > rajeshni...@gmail.com
>> > > > > > > > >wrote:
>> > > > > > > > >>
>> > > > > > > > >>> I got all three U, V & sigma from ssvd, however which to
>> > use
>> > > as
>> > > > > > input
>> > > > > > > > to
>> > > > > > > > >>> canopy?
>> > > > > > > > >>> On May 24, 2013 6:47 AM, "Dmitriy Lyubimov" <
>> > > dlie...@gmail.com
>> > > > >
>> > > > > > wrote:
>> > > > > > > > >>>
>> > > > > > > > >>> > I think you want U*Sigma
>> > > > > > > > >>> >
>> > > > > > > > >>> > What you want is ssvd ... -pca true ... -us true ...
>> see
>> > > the
>> > > > > > manual
>> > > > > > > > >>> >
>> > > > > > > > >>> >
>> > > > > > > > >>> >
>> > > > > > > > >>> >
>> > > > > > > > >>> > On Thu, May 23, 2013 at 6:07 PM, Rajesh Nikam <
>> > > > > > rajeshni...@gmail.com
>> > > > > > > > >
>> > > > > > > > >>> > wrote:
>> > > > > > > > >>> >
>> > > > > > > > >>> > > Sorry for confusion. Here number of clusters are
>> > decided
>> > > by
>> > > > > > canopy.
>> > > > > > > > >>> With
>> > > > > > > > >>> > > data as it has 60 to 70 clusters.
>> > > > > > > > >>> > >
>> > > > > > > > >>> > > My question is which part from ssvd output U, V,
>> Sigma
>> > > > should
>> > > > > > be
>> > > > > > > > >>> used as
>> > > > > > > > >>> > > input to canopy?
>> > > > > > > > >>> > >  On May 24, 2013 3:56 AM, "Ted Dunning" <
>> > > > > ted.dunn...@gmail.com
>> > > > > > >
>> > > > > > > > >>> wrote:
>> > > > > > > > >>> > >
>> > > > > > > > >>> > > > Rajesh,
>> > > > > > > > >>> > > >
>> > > > > > > > >>> > > > This is very confusing.
>> > > > > > > > >>> > > >
>> > > > > > > > >>> > > > You have 1500 things that you are clustering into
>> > more
>> > > > than
>> > > > > > 1400
>> > > > > > > > >>> > > clusters.
>> > > > > > > > >>> > > >
>> > > > > > > > >>> > > > There is no way for most of these clusters to
>> have >1
>> > > > > member
>> > > > > > just
>> > > > > > > > >>> > because
>> > > > > > > > >>> > > > there aren't enough clusters compared to the
>> items.
>> > > > > > > > >>> > > >
>> > > > > > > > >>> > > > Is there a typo here?
>> > > > > > > > >>> > > >
>> > > > > > > > >>> > > >
>> > > > > > > > >>> > > >
>> > > > > > > > >>> > > >
>> > > > > > > > >>> > > > On Thu, May 23, 2013 at 5:34 AM, Rajesh Nikam <
>> > > > > > > > >>> rajeshni...@gmail.com>
>> > > > > > > > >>> > > > wrote:
>> > > > > > > > >>> > > >
>> > > > > > > > >>> > > > > Hi,
>> > > > > > > > >>> > > > >
>> > > > > > > > >>> > > > > I have input test set of 1500 instances with
>> 1000+
>> > > > > > features. I
>> > > > > > > > >>> want
>> > > > > > > > >>> > to
>> > > > > > > > >>> > > to
>> > > > > > > > >>> > > > > SVD to reduce features. I have followed
>> following
>> > > steps
>> > > > > > with
>> > > > > > > > >>> generate
>> > > > > > > > >>> > > > 1400+
>> > > > > > > > >>> > > > > clusters 99% of clusters contain 1 instance :(
>> > > > > > > > >>> > > > >
>> > > > > > > > >>> > > > > Please let me know what is wrong in below steps
>> -
>> > > > > > > > >>> > > > >
>> > > > > > > > >>> > > > >
>> > > > > > > > >>> > > > > mahout arff.vector --input
>> > > > /mnt/cluster/t/input-set.arff
>> > > > > > > > --output
>> > > > > > > > >>> > > > > /user/hadoop/t/input-set-vector/ --dictOut
>> > > > > > > > >>> > > /mnt/cluster/t/input-set-dict
>> > > > > > > > >>> > > > >
>> > > > > > > > >>> > > > > mahout ssvd --input
>> > /user/hadoop/t/input-set-vector/
>> > > > > > --output
>> > > > > > > > >>> > > > > /user/hadoop/t/input-set-svd/ -k 200
>> --reduceTasks
>> > 2
>> > > > -ow
>> > > > > > > > >>> > > > >
>> > > > > > > > >>> > > > > mahout canopy -i
>> */user/hadoop/t/input-set-svd/U*
>> > -o
>> > > > > > > > >>> > > > > /user/hadoop/t/input-set-canopy-centroids -dm
>> > > > > > > > >>> > > > >
>> > > > org.apache.mahout.common.distance.TanimotoDistanceMeasure
>> > > > > > *-t1
>> > > > > > > > >>> 0.001
>> > > > > > > > >>> > > -t2
>> > > > > > > > >>> > > > > 0.002*
>> > > > > > > > >>> > > > >
>> > > > > > > > >>> > > > > mahout kmeans -i
>> */user/hadoop/t/input-set-svd/U*
>> > -c
>> > > > > > > > >>> > > > >
>> > > > > /user/hadoop/t/input-set-canopy-centroids/clusters-0-final
>> > > > > > -cl
>> > > > > > > > -o
>> > > > > > > > >>> > > > > /user/hadoop/t/input-set-kmeans-clusters -ow -x
>> 10
>> > > -dm
>> > > > > > > > >>> > > > >
>> > > > org.apache.mahout.common.distance.TanimotoDistanceMeasure
>> > > > > > > > >>> > > > >
>> > > > > > > > >>> > > > > mahout clusterdump -dt sequencefile -i
>> > > > > > > > >>> > > > >
>> > > > > /user/hadoop/t/input-set-kmeans-clusters/clusters-1-final/
>> > > > > > -n
>> > > > > > > > 20
>> > > > > > > > >>> -b
>> > > > > > > > >>> > 100
>> > > > > > > > >>> > > > -o
>> > > > > > > > >>> > > > > /mnt/cluster/t/cdump-input-set.txt -p
>> > > > > > > > >>> > > > >
>> > > > /user/hadoop/t/input-set-kmeans-clusters/clusteredPoints/
>> > > > > > > > >>> --evaluate
>> > > > > > > > >>> > > > >
>> > > > > > > > >>> > > > > Thanks in advance !
>> > > > > > > > >>> > > > >
>> > > > > > > > >>> > > > > Rajesh
>> > > > > > > > >>> > > > >
>> > > > > > > > >>> > > > >
>> > > > > > > > >>> > > > >
>> > > > > > > > >>> > > > >
>> > > > > > > > >>> > > > > On Wed, May 22, 2013 at 2:18 AM, Dmitriy
>> Lyubimov <
>> > > > > > > > >>> dlie...@gmail.com
>> > > > > > > > >>> > >
>> > > > > > > > >>> > > > > wrote:
>> > > > > > > > >>> > > > >
>> > > > > > > > >>> > > > > > PPS As far as the tool for arff, i am frankly
>> not
>> > > > sure.
>> > > > > > but
>> > > > > > > > it
>> > > > > > > > >>> > sounds
>> > > > > > > > >>> > > > > like
>> > > > > > > > >>> > > > > > you've already solved this.
>> > > > > > > > >>> > > > > >
>> > > > > > > > >>> > > > > >
>> > > > > > > > >>> > > > > > On Tue, May 21, 2013 at 1:41 PM, Dmitriy
>> > Lyubimov <
>> > > > > > > > >>> > dlie...@gmail.com
>> > > > > > > > >>> > > >
>> > > > > > > > >>> > > > > > wrote:
>> > > > > > > > >>> > > > > >
>> > > > > > > > >>> > > > > > > ps as far as U, V data "close to zero", yes
>> > > that's
>> > > > > what
>> > > > > > > > you'd
>> > > > > > > > >>> > > expect.
>> > > > > > > > >>> > > > > > >
>> > > > > > > > >>> > > > > > > Here, by "close to zero" it still means much
>> > > bigger
>> > > > > > than a
>> > > > > > > > >>> > rounding
>> > > > > > > > >>> > > > > error
>> > > > > > > > >>> > > > > > > of course. e.g. 1E-12 is indeed a small
>> number,
>> > > and
>> > > > > > 1E-16
>> > > > > > > > to
>> > > > > > > > >>> > 1E-18
>> > > > > > > > >>> > > > > would
>> > > > > > > > >>> > > > > > be
>> > > > > > > > >>> > > > > > > indeed "close to zero" for the purposes of
>> > > > > singularity.
>> > > > > > > > >>> > 1E-2..1E-5
>> > > > > > > > >>> > > > are
>> > > > > > > > >>> > > > > > > actually quite  "sizeable" numbers by the
>> scale
>> > > of
>> > > > > > IEEE 754
>> > > > > > > > >>> > > > > arithmetics.
>> > > > > > > > >>> > > > > > >
>> > > > > > > > >>> > > > > > > U and V are orthonormal (which means their
>> > column
>> > > > > > vectors
>> > > > > > > > >>> have
>> > > > > > > > >>> > > > > euclidiean
>> > > > > > > > >>> > > > > > > norm of 1) . Note that for large m and n
>> (large
>> > > > > inputs)
>> > > > > > > > they
>> > > > > > > > >>> are
>> > > > > > > > >>> > > also
>> > > > > > > > >>> > > > > > > extremely skinny. The larger input is, the
>> > > smaller
>> > > > > the
>> > > > > > > > >>> element
>> > > > > > > > >>> > of U
>> > > > > > > > >>> > > > > > or/and
>> > > > > > > > >>> > > > > > > V is gonna be.
>> > > > > > > > >>> > > > > > >
>> > > > > > > > >>> > > > > > >
>> > > > > > > > >>> > > > > > >
>> > > > > > > > >>> > > > > > > On Tue, May 21, 2013 at 8:48 AM, Dmitriy
>> > > Lyubimov <
>> > > > > > > > >>> > > dlie...@gmail.com
>> > > > > > > > >>> > > > > > >wrote:
>> > > > > > > > >>> > > > > > >
>> > > > > > > > >>> > > > > > >> Sounds like dimensionality reduction to me.
>> > You
>> > > > may
>> > > > > > want
>> > > > > > > > to
>> > > > > > > > >>> use
>> > > > > > > > >>> > > ssvd
>> > > > > > > > >>> > > > > > -pca
>> > > > > > > > >>> > > > > > >>
>> > > > > > > > >>> > > > > > >> Apologies for brevity. Sent from my Android
>> > > phone.
>> > > > > > > > >>> > > > > > >> -Dmitriy
>> > > > > > > > >>> > > > > > >> On May 21, 2013 6:27 AM, "Rajesh Nikam" <
>> > > > > > > > >>> rajeshni...@gmail.com>
>> > > > > > > > >>> > > > > wrote:
>> > > > > > > > >>> > > > > > >>
>> > > > > > > > >>> > > > > > >>> Hello Ted,
>> > > > > > > > >>> > > > > > >>>
>> > > > > > > > >>> > > > > > >>> Thanks for reply.
>> > > > > > > > >>> > > > > > >>>
>> > > > > > > > >>> > > > > > >>> I have started exploring SVD based on its
>> > > mention
>> > > > > of
>> > > > > > > > could
>> > > > > > > > >>> help
>> > > > > > > > >>> > > to
>> > > > > > > > >>> > > > > drop
>> > > > > > > > >>> > > > > > >>> features which are not relevant for
>> > clustering.
>> > > > > > > > >>> > > > > > >>>
>> > > > > > > > >>> > > > > > >>> My objective is reduce number of features
>> > > before
>> > > > > > passing
>> > > > > > > > >>> them
>> > > > > > > > >>> > to
>> > > > > > > > >>> > > > > > >>> clustering
>> > > > > > > > >>> > > > > > >>> and just keep important features.
>> > > > > > > > >>> > > > > > >>>
>> > > > > > > > >>> > > > > > >>> arff/csv==> ssvd (for dimensionality
>> > reduction)
>> > > > ==>
>> > > > > > > > >>> clustering
>> > > > > > > > >>> > > > > > >>>
>> > > > > > > > >>> > > > > > >>> Could you please illustrate mahout props
>> to
>> > > join
>> > > > > > above
>> > > > > > > > >>> > pipeline.
>> > > > > > > > >>> > > > > > >>>
>> > > > > > > > >>> > > > > > >>> I think, Lanczos SVD needs to be used for
>> mxm
>> > > > > matrix.
>> > > > > > > > >>> > > > > > >>>
>> > > > > > > > >>> > > > > > >>> I have tried check ssvd, I have used
>> > > arff.vector
>> > > > to
>> > > > > > > > covert
>> > > > > > > > >>> > > arff/csv
>> > > > > > > > >>> > > > > to
>> > > > > > > > >>> > > > > > >>> vector file which is then give as input to
>> > ssvd
>> > > > and
>> > > > > > them
>> > > > > > > > >>> dumped
>> > > > > > > > >>> > > U,
>> > > > > > > > >>> > > > V
>> > > > > > > > >>> > > > > > and
>> > > > > > > > >>> > > > > > >>> sigma using vectordump.
>> > > > > > > > >>> > > > > > >>>
>> > > > > > > > >>> > > > > > >>> I see most of the values dumped are near
>> to
>> > 0.
>> > > I
>> > > > > dont
>> > > > > > > > >>> > understand
>> > > > > > > > >>> > > is
>> > > > > > > > >>> > > > > > this
>> > > > > > > > >>> > > > > > >>> correct or not.
>> > > > > > > > >>> > > > > > >>>
>> > > > > > > > >>> > > > > > >>>
>> > > > > > > > >>> > > > > > >>>
>> > > > > > > > >>> > > > > >
>> > > > > > > > >>> > > > >
>> > > > > > > > >>> > > >
>> > > > > > > > >>> > >
>> > > > > > > > >>> >
>> > > > > > > > >>>
>> > > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> {0:0.01066724825049657,1:0.016715498597386844,2:2.0187750952311708E-4,3:3.401020567221039E-4,4:-1.2388403347280688E-4,5:6.41502463540719E-5,6:-1.359187582538833E-4,7:6.329813140445419E-5,8:1.670015585746444E-4,9:3.5415113034592744E-4,10:7.108868213280763E-4,11:0.020553517552052456,12:-0.015118680942548916,13:0.007981746711271956,14:-0.003251236468768259,15:0.0038075014396303053,16:-0.0010925318534013683,17:-0.0026943024876179833,18:-0.001744794617721648,19:-0.0024528466548735714}
>> > > > > > > > >>> > > > > > >>>
>> > > > > > > > >>> > > > > > >>>
>> > > > > > > > >>> > > > > >
>> > > > > > > > >>> > > > >
>> > > > > > > > >>> > > >
>> > > > > > > > >>> > >
>> > > > > > > > >>> >
>> > > > > > > > >>>
>> > > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> {0:0.029978614322360833,1:-0.01431521245087889,2:1.3318592088199427E-4,3:1.495356283071516E-4,4:8.762709213918985E-5,5:1.2765191352425177E-
>> > > > > > > > >>> > > > > > >>>
>> > > > > > > > >>> > > > > > >>> Thanks,
>> > > > > > > > >>> > > > > > >>> Rajesh
>> > > > > > > > >>> > > > > > >>>
>> > > > > > > > >>> > > > > > >>>
>> > > > > > > > >>> > > > > > >>>
>> > > > > > > > >>> > > > > > >>> On Tue, May 21, 2013 at 11:35 AM, Ted
>> > Dunning <
>> > > > > > > > >>> > > > ted.dunn...@gmail.com
>> > > > > > > > >>> > > > > >
>> > > > > > > > >>> > > > > > >>> wrote:
>> > > > > > > > >>> > > > > > >>>
>> > > > > > > > >>> > > > > > >>> > Are you using Lanczos instead of SSVD
>> for a
>> > > > > reason?
>> > > > > > > > >>> > > > > > >>> >
>> > > > > > > > >>> > > > > > >>> >
>> > > > > > > > >>> > > > > > >>> >
>> > > > > > > > >>> > > > > > >>> >
>> > > > > > > > >>> > > > > > >>> > On Mon, May 20, 2013 at 4:13 AM, Rajesh
>> > > Nikam <
>> > > > > > > > >>> > > > > rajeshni...@gmail.com
>> > > > > > > > >>> > > > > > >
>> > > > > > > > >>> > > > > > >>> > wrote:
>> > > > > > > > >>> > > > > > >>> >
>> > > > > > > > >>> > > > > > >>> > > Hello,
>> > > > > > > > >>> > > > > > >>> > >
>> > > > > > > > >>> > > > > > >>> > > I have arff / csv file containing
>> input
>> > > data
>> > > > > > that I
>> > > > > > > > >>> want to
>> > > > > > > > >>> > > > pass
>> > > > > > > > >>> > > > > to
>> > > > > > > > >>> > > > > > >>> svd :
>> > > > > > > > >>> > > > > > >>> > > Lanczos Singular Value Decomposition.
>> > > > > > > > >>> > > > > > >>> > >
>> > > > > > > > >>> > > > > > >>> > > Which tool to use to convert it to
>> > required
>> > > > > > format ?
>> > > > > > > > >>> > > > > > >>> > >
>> > > > > > > > >>> > > > > > >>> > > Thanks in Advance !
>> > > > > > > > >>> > > > > > >>> > >
>> > > > > > > > >>> > > > > > >>> > > Thanks,
>> > > > > > > > >>> > > > > > >>> > > Rajesh
>> > > > > > > > >>> > > > > > >>> > >
>> > > > > > > > >>> > > > > > >>> >
>> > > > > > > > >>> > > > > > >>>
>> > > > > > > > >>> > > > > > >>
>> > > > > > > > >>> > > > > > >
>> > > > > > > > >>> > > > > >
>> > > > > > > > >>> > > > >
>> > > > > > > > >>> > > >
>> > > > > > > > >>> > >
>> > > > > > > > >>> >
>> > > > > > > > >>>
>> > > > > > > > >>
>> > > > > > > > >>
>> > > > > > > > >
>> > > > > > > >
>> > > > > >
>> > > >
>> > >
>> >
>>
>

Reply via email to