With 50k columns, you're well within the "sweet spot" for traditional SVD via Lanczos, so give it a try.
SSVD will probably run faster, but you lose some information on what the singular vectors "mean". If you don't need this information, SSVD may be better for you. What would be awesome for *us* is if you tried both and told us what you found, in terms of performance and relevance. :) -jake On Jun 3, 2011 4:49 PM, "Eshwaran Vijaya Kumar" <[email protected]> wrote: Hello all, We are trying to build a clustering system which will have an SVD component. I believe Mahout has two SVD solvers: DistributedLanczosSolver and SSVD. Could someone give me some tips on which would be a better choice of a solver given that the size of the data will be roughly 100 million rows with each row having roughly 50 K dimensions (100 million X 50000 ). We will be working with text data so the resultant matrix should be relatively sparse to begin with. Thanks Eshwaran
