Hi Jake,
  Thank you for your reply. Good to know that we can use Lanczos. I will have 
to look into SSVD algorithm closer to figure out whether the information loss 
is worth the gain in speed (and computational efficiency). I guess We will have 
to run more tests to see which works best to decide on which path to go by.


Esh

On Jun 3, 2011, at 6:23 PM, Jake Mannix wrote:

> With 50k columns, you're well within the "sweet spot" for traditional SVD
> via Lanczos, so give it a try.
> 
> SSVD will probably run faster, but you lose some information on what the
> singular vectors "mean".  If you don't need this information, SSVD may be
> better for you.
> 
> What would be awesome for *us* is if you tried both and told us what you
> found, in terms of performance and relevance.  :)
> 
>  -jake
> 
> On Jun 3, 2011 4:49 PM, "Eshwaran Vijaya Kumar" <[email protected]>
> wrote:
> 
> Hello all,
> We are trying to build a clustering system which will have an SVD
> component. I believe Mahout has two SVD solvers: DistributedLanczosSolver
> and SSVD. Could someone give me some tips on which would be a better choice
> of a solver given that the size of the data will be roughly 100 million rows
> with each row having roughly 50 K dimensions (100 million X 50000 ). We will
> be working with text data so the resultant matrix should be relatively
> sparse to begin with.
> 
> Thanks
> Eshwaran

Reply via email to