ps as far as U, V data "close to zero", yes that's what you'd expect.

Here, by "close to zero" it still means much bigger than a rounding error
of course. e.g. 1E-12 is indeed a small number, and 1E-16 to 1E-18 would be
indeed "close to zero" for the purposes of singularity. 1E-2..1E-5 are
actually quite  "sizeable" numbers by the scale of IEEE 754 arithmetics.

U and V are orthonormal (which means their column vectors have euclidiean
norm of 1) . Note that for large m and n (large inputs) they are also
extremely skinny. The larger input is, the smaller the element of U or/and
V is gonna be.


On Tue, May 21, 2013 at 8:48 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:

> Sounds like dimensionality reduction to me. You may want to use ssvd -pca
>
> Apologies for brevity. Sent from my Android phone.
> -Dmitriy
> On May 21, 2013 6:27 AM, "Rajesh Nikam" <rajeshni...@gmail.com> wrote:
>
>> Hello Ted,
>>
>> Thanks for reply.
>>
>> I have started exploring SVD based on its mention of could help to drop
>> features which are not relevant for clustering.
>>
>> My objective is reduce number of features before passing them to
>> clustering
>> and just keep important features.
>>
>> arff/csv==> ssvd (for dimensionality reduction) ==> clustering
>>
>> Could you please illustrate mahout props to join above pipeline.
>>
>> I think, Lanczos SVD needs to be used for mxm matrix.
>>
>> I have tried check ssvd, I have used arff.vector to covert arff/csv to
>> vector file which is then give as input to ssvd and them dumped U, V and
>> sigma using vectordump.
>>
>> I see most of the values dumped are near to 0. I dont understand is this
>> correct or not.
>>
>>
>> {0:0.01066724825049657,1:0.016715498597386844,2:2.0187750952311708E-4,3:3.401020567221039E-4,4:-1.2388403347280688E-4,5:6.41502463540719E-5,6:-1.359187582538833E-4,7:6.329813140445419E-5,8:1.670015585746444E-4,9:3.5415113034592744E-4,10:7.108868213280763E-4,11:0.020553517552052456,12:-0.015118680942548916,13:0.007981746711271956,14:-0.003251236468768259,15:0.0038075014396303053,16:-0.0010925318534013683,17:-0.0026943024876179833,18:-0.001744794617721648,19:-0.0024528466548735714}
>>
>> {0:0.029978614322360833,1:-0.01431521245087889,2:1.3318592088199427E-4,3:1.495356283071516E-4,4:8.762709213918985E-5,5:1.2765191352425177E-
>>
>> Thanks,
>> Rajesh
>>
>>
>>
>> On Tue, May 21, 2013 at 11:35 AM, Ted Dunning <ted.dunn...@gmail.com>
>> wrote:
>>
>> > Are you using Lanczos instead of SSVD for a reason?
>> >
>> >
>> >
>> >
>> > On Mon, May 20, 2013 at 4:13 AM, Rajesh Nikam <rajeshni...@gmail.com>
>> > wrote:
>> >
>> > > Hello,
>> > >
>> > > I have arff / csv file containing input data that I want to pass to
>> svd :
>> > > Lanczos Singular Value Decomposition.
>> > >
>> > > Which tool to use to convert it to required format ?
>> > >
>> > > Thanks in Advance !
>> > >
>> > > Thanks,
>> > > Rajesh
>> > >
>> >
>>
>

Reply via email to