Agree with Dmitriy.



________________________________
 From: Dmitriy Lyubimov <dlie...@gmail.com>
To: user@mahout.apache.org 
Cc: Suneel Marthi <suneel_mar...@yahoo.com> 
Sent: Thursday, May 30, 2013 12:39 PM
Subject: Re: Fwd: Re: convert input for SVD
 


I.e. i guess you want to run kmeans directly on usigma output. 
On May 30, 2013 9:37 AM, "Dmitriy Lyubimov" <dlie...@gmail.com> wrote:

I believe this flow describes how to use lanczos svd in mahout to arrive at the 
same reduction as ssvd already provides with pca and USigma options in one 
step. This flow is irrelevant when working with ssvd, it already does it all 
internally for you.
>On May 30, 2013 5:45 AM, "Rajesh Nikam" <rajeshni...@gmail.com> wrote:
>
>Hi Suneel/Dmitriy,
>>
>>I got mahout-examples-0.8-SNAPSHOT-job.jar compiled from trunk.
>>Now I got -us param as your mentioned for the input set working.
>>
>>Steps followed are:
>>
>>mahout arff.vector --input /mnt/cluster/t/PE_EXE/input-set.arff --output
>>/user/hadoop/t/input-set-vector/ --dictOut /mnt/cluster/t/input-set-dict
>>
>>hadoop jar mahout-examples-0.8-SNAPSHOT-job.jar
>>org.apache.mahout.math.hadoop.stochasticsvd.SSVDCli --input
>>/user/hadoop/t/input-set-vector/ --output /user/hadoop/t/input-set-svd/ -k
>>50 --reduceTasks 2 -U true -V false -us true -ow
>>
>>Not able to understand what needs to be provided input to
>>cleansvd/transpose/matrixmult as mentioned on following page, what needs to
>>be used U/V/USigma and how.
>>
>>Again how to understand which features got in reduced matrix.
>>
>>https://cwiki.apache.org/MAHOUT/dimensional-reduction.html
>>
>>At a high level, the steps we're going to perform are:
>>
>>bin/mahout svd (original -> svdOut)
>>bin/mahout cleansvd ...
>>bin/mahout transpose svdOut -> svdT
>>bin/mahout transpose original -> originalT
>>bin/mahout matrixmult originalT svdT -> newMatrix
>>bin/mahout kmeans newMatrix
>>
>>Thanks,
>>
>>Rajesh
>>
>>
>>
>>On Mon, May 27, 2013 at 11:31 AM, Suneel Marthi 
>><suneel_mar...@yahoo.com>wrote:
>>
>>> Ahha, I see your problem now.
>>>
>>> The additional line in trunk was added as part of Mahout-1097 (long after
>>> Mahout-0.7 release) and hence you wouldn't see the change in
>>> mahout-examples-0.7-job.jar that you are working off of.  This fix is
>>> presently available in trunk (and will be part of Mahout-0.8).
>>>
>>> I would recommend to work off of trunk for now and u should be good.
>>>
>>>
>>>
>>>
>>> ________________________________
>>>  From: Rajesh Nikam <rajeshni...@gmail.com>
>>> To: user@mahout.apache.org
>>> Sent: Monday, May 27, 2013 1:52 AM
>>> Subject: Re: Fwd: Re: convert input for SVD
>>>
>>>
>>> Hi Dmitriy / Suneel,
>>>
>>> You are pointing me to the correct solution. However I see difference
>>> options in source code downloaded from (mahout-trunk.zip) and
>>> mahout-examples-0.7-job.jar.
>>>
>>> Could you please verify the same at your end.
>>>
>>> ==>> from mahout-trunk.zip <<==
>>>
>>>     addOption("uHalfSigma",
>>>               "uhs",
>>>               "Compute U * Sigma^0.5",
>>>               String.valueOf(false));
>>> *    addOption("uSigma", "us", "Compute U * Sigma",
>>> String.valueOf(false));*
>>>     addOption("computeV", "V", "compute V (true/false)",
>>> String.valueOf(true));
>>>
>>>
>>> ==>> mahout-examples-0.7-job.jar <<==
>>>
>>>     addOption("uHalfSigma", "uhs", "Compute U as UHat=U x pow(Sigma,0.5)",
>>> String.valueOf(false));
>>>
>>>     addOption("computeV", "V", "compute V (true/false)",
>>> String.valueOf(true));
>>>     addOption("vHalfSigma", "vhs", "compute V as VHat= V x pow(Sigma,0.5)",
>>> String.valueOf(false));
>>>
>>>
>>> Thanks,
>>> Rajesh
>>>
>>>
>>> On Fri, May 24, 2013 at 10:48 PM, Dmitriy Lyubimov <dlie...@gmail.com
>>> >wrote:
>>>
>>> > "ssvd -us true...." should do this . Suneel says it still works on trunk.
>>> >
>>> >
>>> > On Fri, May 24, 2013 at 9:38 AM, Rajesh Nikam <rajeshni...@gmail.com>
>>> > wrote:
>>> >
>>> > > Thanks Dmitriy & Suneel for comments. As you suggested I need to use U
>>> *
>>> > > Sigma.
>>> > >
>>> > > It means Need to get multiplication of these matrices.
>>> > >
>>> > > Which Mahout props to use for this?
>>> > >
>>> > > Other question was how to get features that are selected in U?
>>> > > On May 24, 2013 8:45 PM, "Suneel Marthi" <suneel_mar...@yahoo.com>
>>> > wrote:
>>> > >
>>> > > > Rajesh,
>>> > > >
>>> > > > I am working off of trunk and this works fine.
>>> > > >
>>> > > > As Dmitriy says u do need USigma.
>>> > > >
>>> > > > It would help to paste the entire stacktrace you are seeing with
>>> > > > MatrixColumnMeansJob.
>>> > > >
>>> > > > If you are still seeing an issue, I would suggest that you work off
>>> of
>>> > > > trunk.
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > > ________________________________
>>> > > >  From: Dmitriy Lyubimov <dlie...@gmail.com>
>>> > > > To: user@mahout.apache.org
>>> > > > Sent: Friday, May 24, 2013 9:52 AM
>>> > > > Subject: Re: Fwd: Re: convert input for SVD
>>> > > >
>>> > > >
>>> > > > I think last time i verified this flow was as of
>>> > > > https://issues.apache.org/jira/browse/MAHOUT-1097. It was woking
>>> then.
>>> > > Did
>>> > > > not look at it since.
>>> > > > On May 24, 2013 6:42 AM, "Dmitriy Lyubimov" <dlie...@gmail.com>
>>> wrote:
>>> > > >
>>> > > > > Rajesh, you will get more help if you stay on the list.
>>> > > > >
>>> > > > > you do need u *sigma output. there is no substitute.
>>> > > > >
>>> > > > > If this option is indeed no longer there, i have no knowledge of
>>> it.
>>> > > > Maybe
>>> > > > > there was some work committed that screwed that  but at the moment
>>> i
>>> > > have
>>> > > > > no time to look at it. Obviously it was there at the time
>>> > documentation
>>> > > > was
>>> > > > > written. I guess you may obtain an earlier snapshot as interim
>>> > solution
>>> > > > if
>>> > > > > it is indeed the case.
>>> > > > >
>>> > > > > ---------- Forwarded message ----------
>>> > > > > From: "Rajesh Nikam" <rajeshni...@gmail.com>
>>> > > > > Date: May 24, 2013 3:20 AM
>>> > > > > Subject: Re: convert input for SVD
>>> > > > > To: <user@mahout.apache.org>
>>> > > > > Cc:
>>> > > > >
>>> > > > > > Hello Dmitriy,
>>> > > > > >
>>> > > > > > Thanks for reply.
>>> > > > > >
>>> > > > > > I see similar discussion on following link where I see your
>>> reply.
>>> > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> http://www.searchworkings.org/forum/-/message_boards/view_message/517870#_19_message_519704
>>> > > > > >
>>> > > > > > I do also have same problem, need to apply dimensionality
>>> reduction
>>> > > and
>>> > > > > use
>>> > > > > > clustering algo on reduced features.
>>> > > > > >
>>> > > > > > Seems parameters for ssvd are changed from mentioned in
>>> > SSVD-CLI.pdf.
>>> > > > It
>>> > > > > no
>>> > > > > > longer shows *-us *as parameter
>>> > > > > >
>>> > > > > > I am using mahout-examples-0.7-job.jar
>>> > > > > >
>>> > > > > > mahout ssvd --input /user/hadoop/t/input-set-vector/ --output
>>> > > > > > /user/hadoop/t/input-set-svd/ -k 200 --reduceTasks 2 -pca true -U
>>> > > true
>>> > > > -V
>>> > > > > > false *-us true* -ow -q 1
>>> > > > > >
>>> > > > > > giving option as "*-pca true*" gives error as
>>> > > > > >
>>> > > > > > at
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
>>> > > > > >         at
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
>>> > > > > >
>>> > > > > > So I removed it.
>>> > > > > >
>>> > > > > > mahout ssvd --input /user/hadoop/t/input-set-vector/ --output
>>> > > > > > /user/hadoop/t/input-set-svd/ -k 200 --reduceTasks 2 -U true -V
>>> > false
>>> > > > > *-us
>>> > > > > > true* -ow -q 1
>>> > > > > >
>>> > > > > > *>> *with above command *>> Unexpected -us *while processing
>>> > > > Job-Specific
>>> > > > > > Options.
>>> > > > > >
>>> > > > > > I tried with "-U false -V false -uhs true" it just generated
>>> sigma
>>> > > file
>>> > > > > as
>>> > > > > > expected however no "Usigma"
>>> > > > > >
>>> > > > > > hadoop fs -lsr /user/hadoop/t/PE_EXE/input-set-svd/
>>> > > > > >
>>> > > > > > -rw-r--r--   2 hadoop supergroup       1712 2013-05-24 15:34
>>> > > > > > /user/hadoop/t/PE_EXE/input-set-svd/sigma
>>> > > > > >
>>> > > > > > Then with *"-U true -V false -uhs true" *output dir U is created.
>>> > > > > > *
>>> > > > > > *drwxr-xr-x   - hadoop supergroup          0 2013-05-24 15:39
>>> > > > > > /user/hadoop/t/PE_EXE/input-set-svd/U
>>> > > > > > -rw-r--r--   2 hadoop supergroup       1712 2013-05-24 15:39
>>> > > > > > /user/hadoop/t/PE_EXE/input-set-svd/sigma*
>>> > > > > > *
>>> > > > > >
>>> > > > > > My problem is how to use these U/V/sigma file as input to
>>> > > > canopy/kmeans ?
>>> > > > > >
>>> > > > > > How to identify which important features from U/Sigma that are
>>> > > retained
>>> > > > > in
>>> > > > > > dimensionality reduction ?
>>> > > > > >
>>> > > > > > Thanks in Advance !
>>> > > > > > Rajesh
>>> > > > > >
>>> > > > > >
>>> > > > > > On Fri, May 24, 2013 at 7:01 AM, Dmitriy Lyubimov <
>>> > dlie...@gmail.com
>>> > > >
>>> > > > > wrote:
>>> > > > > >
>>> > > > > > >
>>> > > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://cwiki.apache.org/confluence/download/attachments/27832158/SSVD-CLI.pdf?version=17&modificationDate=1349999085000
>>> > > > > > > :
>>> > > > > > >
>>> > > > > > > "In most cases where you might be looking to reduce
>>> > > > > > > dimensionality while retaining variance, you probably need
>>> > > > combination
>>> > > > > of
>>> > > > > > > options -pca true -U false -V
>>> > > > > > > false -us true.
>>> > > > > > >
>>> > > > > > > See §3 for details"
>>> > > > > > >
>>> > > > > > >
>>> > > > > > > On Thu, May 23, 2013 at 6:24 PM, Dmitriy Lyubimov <
>>> > > dlie...@gmail.com
>>> > > > >
>>> > > > > > > wrote:
>>> > > > > > >
>>> > > > > > > > Also, for the dimensionality reduction it is important among
>>> > > other
>>> > > > > things
>>> > > > > > > > to re-center your input first, which is why you also want
>>> "-pca
>>> > > > > true".
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > > On Thu, May 23, 2013 at 6:23 PM, Dmitriy Lyubimov <
>>> > > > dlie...@gmail.com
>>> > > > > > > >wrote:
>>> > > > > > > >
>>> > > > > > > >> did you specify -us option? SSVD by default produces only
>>> U, V
>>> > > and
>>> > > > > > > Sigma.
>>> > > > > > > >> but it can produce more, e.g. U*Sigma, U*sqrt(Sigma) etc. if
>>> > you
>>> > > > > ask for
>>> > > > > > > >> it. And, alternatively, you can suppress any of U, V (you
>>> > can't
>>> > > > > suppress
>>> > > > > > > >> sigma but that doesn't cost anything in space anyway).
>>> > > > > > > >>
>>> > > > > > > >>
>>> > > > > > > >> On Thu, May 23, 2013 at 6:20 PM, Rajesh Nikam <
>>> > > > > rajeshni...@gmail.com
>>> > > > > > > >wrote:
>>> > > > > > > >>
>>> > > > > > > >>> I got all three U, V & sigma from ssvd, however which to
>>> use
>>> > as
>>> > > > > input
>>> > > > > > > to
>>> > > > > > > >>> canopy?
>>> > > > > > > >>> On May 24, 2013 6:47 AM, "Dmitriy Lyubimov" <
>>> > dlie...@gmail.com
>>> > > >
>>> > > > > wrote:
>>> > > > > > > >>>
>>> > > > > > > >>> > I think you want U*Sigma
>>> > > > > > > >>> >
>>> > > > > > > >>> > What you want is ssvd ... -pca true ... -us true ... see
>>> > the
>>> > > > > manual
>>> > > > > > > >>> >
>>> > > > > > > >>> >
>>> > > > > > > >>> >
>>> > > > > > > >>> >
>>> > > > > > > >>> > On Thu, May 23, 2013 at 6:07 PM, Rajesh Nikam <
>>> > > > > rajeshni...@gmail.com
>>> > > > > > > >
>>> > > > > > > >>> > wrote:
>>> > > > > > > >>> >
>>> > > > > > > >>> > > Sorry for confusion. Here number of clusters are
>>> decided
>>> > by
>>> > > > > canopy.
>>> > > > > > > >>> With
>>> > > > > > > >>> > > data as it has 60 to 70 clusters.
>>> > > > > > > >>> > >
>>> > > > > > > >>> > > My question is which part from ssvd output U, V, Sigma
>>> > > should
>>> > > > > be
>>> > > > > > > >>> used as
>>> > > > > > > >>> > > input to canopy?
>>> > > > > > > >>> > >  On May 24, 2013 3:56 AM, "Ted Dunning" <
>>> > > > ted.dunn...@gmail.com
>>> > > > > >
>>> > > > > > > >>> wrote:
>>> > > > > > > >>> > >
>>> > > > > > > >>> > > > Rajesh,
>>> > > > > > > >>> > > >
>>> > > > > > > >>> > > > This is very confusing.
>>> > > > > > > >>> > > >
>>> > > > > > > >>> > > > You have 1500 things that you are clustering into
>>> more
>>> > > than
>>> > > > > 1400
>>> > > > > > > >>> > > clusters.
>>> > > > > > > >>> > > >
>>> > > > > > > >>> > > > There is no way for most of these clusters to have >1
>>> > > > member
>>> > > > > just
>>> > > > > > > >>> > because
>>> > > > > > > >>> > > > there aren't enough clusters compared to the items.
>>> > > > > > > >>> > > >
>>> > > > > > > >>> > > > Is there a typo here?
>>> > > > > > > >>> > > >
>>> > > > > > > >>> > > >
>>> > > > > > > >>> > > >
>>> > > > > > > >>> > > >
>>> > > > > > > >>> > > > On Thu, May 23, 2013 at 5:34 AM, Rajesh Nikam <
>>> > > > > > > >>> rajeshni...@gmail.com>
>>> > > > > > > >>> > > > wrote:
>>> > > > > > > >>> > > >
>>> > > > > > > >>> > > > > Hi,
>>> > > > > > > >>> > > > >
>>> > > > > > > >>> > > > > I have input test set of 1500 instances with 1000+
>>> > > > > features. I
>>> > > > > > > >>> want
>>> > > > > > > >>> > to
>>> > > > > > > >>> > > to
>>> > > > > > > >>> > > > > SVD to reduce features. I have followed following
>>> > steps
>>> > > > > with
>>> > > > > > > >>> generate
>>> > > > > > > >>> > > > 1400+
>>> > > > > > > >>> > > > > clusters 99% of clusters contain 1 instance :(
>>> > > > > > > >>> > > > >
>>> > > > > > > >>> > > > > Please let me know what is wrong in below steps -
>>> > > > > > > >>> > > > >
>>> > > > > > > >>> > > > >
>>> > > > > > > >>> > > > > mahout arff.vector --input
>>> > > /mnt/cluster/t/input-set.arff
>>> > > > > > > --output
>>> > > > > > > >>> > > > > /user/hadoop/t/input-set-vector/ --dictOut
>>> > > > > > > >>> > > /mnt/cluster/t/input-set-dict
>>> > > > > > > >>> > > > >
>>> > > > > > > >>> > > > > mahout ssvd --input
>>> /user/hadoop/t/input-set-vector/
>>> > > > > --output
>>> > > > > > > >>> > > > > /user/hadoop/t/input-set-svd/ -k 200 --reduceTasks
>>> 2
>>> > > -ow
>>> > > > > > > >>> > > > >
>>> > > > > > > >>> > > > > mahout canopy -i */user/hadoop/t/input-set-svd/U*
>>> -o
>>> > > > > > > >>> > > > > /user/hadoop/t/input-set-canopy-centroids -dm
>>> > > > > > > >>> > > > >
>>> > > org.apache.mahout.common.distance.TanimotoDistanceMeasure
>>> > > > > *-t1
>>> > > > > > > >>> 0.001
>>> > > > > > > >>> > > -t2
>>> > > > > > > >>> > > > > 0.002*
>>> > > > > > > >>> > > > >
>>> > > > > > > >>> > > > > mahout kmeans -i */user/hadoop/t/input-set-svd/U*
>>> -c
>>> > > > > > > >>> > > > >
>>> > > > /user/hadoop/t/input-set-canopy-centroids/clusters-0-final
>>> > > > > -cl
>>> > > > > > > -o
>>> > > > > > > >>> > > > > /user/hadoop/t/input-set-kmeans-clusters -ow -x 10
>>> > -dm
>>> > > > > > > >>> > > > >
>>> > > org.apache.mahout.common.distance.TanimotoDistanceMeasure
>>> > > > > > > >>> > > > >
>>> > > > > > > >>> > > > > mahout clusterdump -dt sequencefile -i
>>> > > > > > > >>> > > > >
>>> > > > /user/hadoop/t/input-set-kmeans-clusters/clusters-1-final/
>>> > > > > -n
>>> > > > > > > 20
>>> > > > > > > >>> -b
>>> > > > > > > >>> > 100
>>> > > > > > > >>> > > > -o
>>> > > > > > > >>> > > > > /mnt/cluster/t/cdump-input-set.txt -p
>>> > > > > > > >>> > > > >
>>> > > /user/hadoop/t/input-set-kmeans-clusters/clusteredPoints/
>>> > > > > > > >>> --evaluate
>>> > > > > > > >>> > > > >
>>> > > > > > > >>> > > > > Thanks in advance !
>>> > > > > > > >>> > > > >
>>> > > > > > > >>> > > > > Rajesh
>>> > > > > > > >>> > > > >
>>> > > > > > > >>> > > > >
>>> > > > > > > >>> > > > >
>>> > > > > > > >>> > > > >
>>> > > > > > > >>> > > > > On Wed, May 22, 2013 at 2:18 AM, Dmitriy Lyubimov <
>>> > > > > > > >>> dlie...@gmail.com
>>> > > > > > > >>> > >
>>> > > > > > > >>> > > > > wrote:
>>> > > > > > > >>> > > > >
>>> > > > > > > >>> > > > > > PPS As far as the tool for arff, i am frankly not
>>> > > sure.
>>> > > > > but
>>> > > > > > > it
>>> > > > > > > >>> > sounds
>>> > > > > > > >>> > > > > like
>>> > > > > > > >>> > > > > > you've already solved this.
>>> > > > > > > >>> > > > > >
>>> > > > > > > >>> > > > > >
>>> > > > > > > >>> > > > > > On Tue, May 21, 2013 at 1:41 PM, Dmitriy
>>> Lyubimov <
>>> > > > > > > >>> > dlie...@gmail.com
>>> > > > > > > >>> > > >
>>> > > > > > > >>> > > > > > wrote:
>>> > > > > > > >>> > > > > >
>>> > > > > > > >>> > > > > > > ps as far as U, V data "close to zero", yes
>>> > that's
>>> > > > what
>>> > > > > > > you'd
>>> > > > > > > >>> > > expect.
>>> > > > > > > >>> > > > > > >
>>> > > > > > > >>> > > > > > > Here, by "close to zero" it still means much
>>> > bigger
>>> > > > > than a
>>> > > > > > > >>> > rounding
>>> > > > > > > >>> > > > > error
>>> > > > > > > >>> > > > > > > of course. e.g. 1E-12 is indeed a small number,
>>> > and
>>> > > > > 1E-16
>>> > > > > > > to
>>> > > > > > > >>> > 1E-18
>>> > > > > > > >>> > > > > would
>>> > > > > > > >>> > > > > > be
>>> > > > > > > >>> > > > > > > indeed "close to zero" for the purposes of
>>> > > > singularity.
>>> > > > > > > >>> > 1E-2..1E-5
>>> > > > > > > >>> > > > are
>>> > > > > > > >>> > > > > > > actually quite  "sizeable" numbers by the scale
>>> > of
>>> > > > > IEEE 754
>>> > > > > > > >>> > > > > arithmetics.
>>> > > > > > > >>> > > > > > >
>>> > > > > > > >>> > > > > > > U and V are orthonormal (which means their
>>> column
>>> > > > > vectors
>>> > > > > > > >>> have
>>> > > > > > > >>> > > > > euclidiean
>>> > > > > > > >>> > > > > > > norm of 1) . Note that for large m and n (large
>>> > > > inputs)
>>> > > > > > > they
>>> > > > > > > >>> are
>>> > > > > > > >>> > > also
>>> > > > > > > >>> > > > > > > extremely skinny. The larger input is, the
>>> > smaller
>>> > > > the
>>> > > > > > > >>> element
>>> > > > > > > >>> > of U
>>> > > > > > > >>> > > > > > or/and
>>> > > > > > > >>> > > > > > > V is gonna be.
>>> > > > > > > >>> > > > > > >
>>> > > > > > > >>> > > > > > >
>>> > > > > > > >>> > > > > > >
>>> > > > > > > >>> > > > > > > On Tue, May 21, 2013 at 8:48 AM, Dmitriy
>>> > Lyubimov <
>>> > > > > > > >>> > > dlie...@gmail.com
>>> > > > > > > >>> > > > > > >wrote:
>>> > > > > > > >>> > > > > > >
>>> > > > > > > >>> > > > > > >> Sounds like dimensionality reduction to me.
>>> You
>>> > > may
>>> > > > > want
>>> > > > > > > to
>>> > > > > > > >>> use
>>> > > > > > > >>> > > ssvd
>>> > > > > > > >>> > > > > > -pca
>>> > > > > > > >>> > > > > > >>
>>> > > > > > > >>> > > > > > >> Apologies for brevity. Sent from my Android
>>> > phone.
>>> > > > > > > >>> > > > > > >> -Dmitriy
>>> > > > > > > >>> > > > > > >> On May 21, 2013 6:27 AM, "Rajesh Nikam" <
>>> > > > > > > >>> rajeshni...@gmail.com>
>>> > > > > > > >>> > > > > wrote:
>>> > > > > > > >>> > > > > > >>
>>> > > > > > > >>> > > > > > >>> Hello Ted,
>>> > > > > > > >>> > > > > > >>>
>>> > > > > > > >>> > > > > > >>> Thanks for reply.
>>> > > > > > > >>> > > > > > >>>
>>> > > > > > > >>> > > > > > >>> I have started exploring SVD based on its
>>> > mention
>>> > > > of
>>> > > > > > > could
>>> > > > > > > >>> help
>>> > > > > > > >>> > > to
>>> > > > > > > >>> > > > > drop
>>> > > > > > > >>> > > > > > >>> features which are not relevant for
>>> clustering.
>>> > > > > > > >>> > > > > > >>>
>>> > > > > > > >>> > > > > > >>> My objective is reduce number of features
>>> > before
>>> > > > > passing
>>> > > > > > > >>> them
>>> > > > > > > >>> > to
>>> > > > > > > >>> > > > > > >>> clustering
>>> > > > > > > >>> > > > > > >>> and just keep important features.
>>> > > > > > > >>> > > > > > >>>
>>> > > > > > > >>> > > > > > >>> arff/csv==> ssvd (for dimensionality
>>> reduction)
>>> > > ==>
>>> > > > > > > >>> clustering
>>> > > > > > > >>> > > > > > >>>
>>> > > > > > > >>> > > > > > >>> Could you please illustrate mahout props to
>>> > join
>>> > > > > above
>>> > > > > > > >>> > pipeline.
>>> > > > > > > >>> > > > > > >>>
>>> > > > > > > >>> > > > > > >>> I think, Lanczos SVD needs to be used for mxm
>>> > > > matrix.
>>> > > > > > > >>> > > > > > >>>
>>> > > > > > > >>> > > > > > >>> I have tried check ssvd, I have used
>>> > arff.vector
>>> > > to
>>> > > > > > > covert
>>> > > > > > > >>> > > arff/csv
>>> > > > > > > >>> > > > > to
>>> > > > > > > >>> > > > > > >>> vector file which is then give as input to
>>> ssvd
>>> > > and
>>> > > > > them
>>> > > > > > > >>> dumped
>>> > > > > > > >>> > > U,
>>> > > > > > > >>> > > > V
>>> > > > > > > >>> > > > > > and
>>> > > > > > > >>> > > > > > >>> sigma using vectordump.
>>> > > > > > > >>> > > > > > >>>
>>> > > > > > > >>> > > > > > >>> I see most of the values dumped are near to
>>> 0.
>>> > I
>>> > > > dont
>>> > > > > > > >>> > understand
>>> > > > > > > >>> > > is
>>> > > > > > > >>> > > > > > this
>>> > > > > > > >>> > > > > > >>> correct or not.
>>> > > > > > > >>> > > > > > >>>
>>> > > > > > > >>> > > > > > >>>
>>> > > > > > > >>> > > > > > >>>
>>> > > > > > > >>> > > > > >
>>> > > > > > > >>> > > > >
>>> > > > > > > >>> > > >
>>> > > > > > > >>> > >
>>> > > > > > > >>> >
>>> > > > > > > >>>
>>> > > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> {0:0.01066724825049657,1:0.016715498597386844,2:2.0187750952311708E-4,3:3.401020567221039E-4,4:-1.2388403347280688E-4,5:6.41502463540719E-5,6:-1.359187582538833E-4,7:6.329813140445419E-5,8:1.670015585746444E-4,9:3.5415113034592744E-4,10:7.108868213280763E-4,11:0.020553517552052456,12:-0.015118680942548916,13:0.007981746711271956,14:-0.003251236468768259,15:0.0038075014396303053,16:-0.0010925318534013683,17:-0.0026943024876179833,18:-0.001744794617721648,19:-0.0024528466548735714}
>>> > > > > > > >>> > > > > > >>>
>>> > > > > > > >>> > > > > > >>>
>>> > > > > > > >>> > > > > >
>>> > > > > > > >>> > > > >
>>> > > > > > > >>> > > >
>>> > > > > > > >>> > >
>>> > > > > > > >>> >
>>> > > > > > > >>>
>>> > > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> {0:0.029978614322360833,1:-0.01431521245087889,2:1.3318592088199427E-4,3:1.495356283071516E-4,4:8.762709213918985E-5,5:1.2765191352425177E-
>>> > > > > > > >>> > > > > > >>>
>>> > > > > > > >>> > > > > > >>> Thanks,
>>> > > > > > > >>> > > > > > >>> Rajesh
>>> > > > > > > >>> > > > > > >>>
>>> > > > > > > >>> > > > > > >>>
>>> > > > > > > >>> > > > > > >>>
>>> > > > > > > >>> > > > > > >>> On Tue, May 21, 2013 at 11:35 AM, Ted
>>> Dunning <
>>> > > > > > > >>> > > > ted.dunn...@gmail.com
>>> > > > > > > >>> > > > > >
>>> > > > > > > >>> > > > > > >>> wrote:
>>> > > > > > > >>> > > > > > >>>
>>> > > > > > > >>> > > > > > >>> > Are you using Lanczos instead of SSVD for a
>>> > > > reason?
>>> > > > > > > >>> > > > > > >>> >
>>> > > > > > > >>> > > > > > >>> >
>>> > > > > > > >>> > > > > > >>> >
>>> > > > > > > >>> > > > > > >>> >
>>> > > > > > > >>> > > > > > >>> > On Mon, May 20, 2013 at 4:13 AM, Rajesh
>>> > Nikam <
>>> > > > > > > >>> > > > > rajeshni...@gmail.com
>>> > > > > > > >>> > > > > > >
>>> > > > > > > >>> > > > > > >>> > wrote:
>>> > > > > > > >>> > > > > > >>> >
>>> > > > > > > >>> > > > > > >>> > > Hello,
>>> > > > > > > >>> > > > > > >>> > >
>>> > > > > > > >>> > > > > > >>> > > I have arff / csv file containing input
>>> > data
>>> > > > > that I
>>> > > > > > > >>> want to
>>> > > > > > > >>> > > > pass
>>> > > > > > > >>> > > > > to
>>> > > > > > > >>> > > > > > >>> svd :
>>> > > > > > > >>> > > > > > >>> > > Lanczos Singular Value Decomposition.
>>> > > > > > > >>> > > > > > >>> > >
>>> > > > > > > >>> > > > > > >>> > > Which tool to use to convert it to
>>> required
>>> > > > > format ?
>>> > > > > > > >>> > > > > > >>> > >
>>> > > > > > > >>> > > > > > >>> > > Thanks in Advance !
>>> > > > > > > >>> > > > > > >>> > >
>>> > > > > > > >>> > > > > > >>> > > Thanks,
>>> > > > > > > >>> > > > > > >>> > > Rajesh
>>> > > > > > > >>> > > > > > >>> > >
>>> > > > > > > >>> > > > > > >>> >
>>> > > > > > > >>> > > > > > >>>
>>> > > > > > > >>> > > > > > >>
>>> > > > > > > >>> > > > > > >
>>> > > > > > > >>> > > > > >
>>> > > > > > > >>> > > > >
>>> > > > > > > >>> > > >
>>> > > > > > > >>> > >
>>> > > > > > > >>> >
>>> > > > > > > >>>
>>> > > > > > > >>
>>> > > > > > > >>
>>> > > > > > > >
>>> > > > > > >
>>> > > > >
>>> > >
>>> >
>>>
>>

Reply via email to