Hello all, We are trying to build a clustering system which will have an SVD component. I believe Mahout has two SVD solvers: DistributedLanczosSolver and SSVD. Could someone give me some tips on which would be a better choice of a solver given that the size of the data will be roughly 100 million rows with each row having roughly 50 K dimensions (100 million X 50000 ). We will be working with text data so the resultant matrix should be relatively sparse to begin with.
Thanks Eshwaran
