Hi Jake, Thank you for your reply. Good to know that we can use Lanczos. I will have to look into SSVD algorithm closer to figure out whether the information loss is worth the gain in speed (and computational efficiency). I guess We will have to run more tests to see which works best to decide on which path to go by.
Esh On Jun 3, 2011, at 6:23 PM, Jake Mannix wrote: > With 50k columns, you're well within the "sweet spot" for traditional SVD > via Lanczos, so give it a try. > > SSVD will probably run faster, but you lose some information on what the > singular vectors "mean". If you don't need this information, SSVD may be > better for you. > > What would be awesome for *us* is if you tried both and told us what you > found, in terms of performance and relevance. :) > > -jake > > On Jun 3, 2011 4:49 PM, "Eshwaran Vijaya Kumar" <[email protected]> > wrote: > > Hello all, > We are trying to build a clustering system which will have an SVD > component. I believe Mahout has two SVD solvers: DistributedLanczosSolver > and SSVD. Could someone give me some tips on which would be a better choice > of a solver given that the size of the data will be roughly 100 million rows > with each row having roughly 50 K dimensions (100 million X 50000 ). We will > be working with text data so the resultant matrix should be relatively > sparse to begin with. > > Thanks > Eshwaran
