Jake, since we are on the topic, what's the running times of Lanczos on a ~1G worth sequence file input might be?
On Wed, Apr 6, 2011 at 11:11 AM, Jake Mannix <jake.man...@gmail.com> wrote: > > > On Thu, Mar 24, 2011 at 11:03 PM, Dmitriy Lyubimov <dlie...@gmail.com> > wrote: >> >> you can certainly try to write it out into a DRM (distributed row >> matrix) and run stochastic SVD on hadoop (off the trunk now). see >> MAHOUT-593. This is suitable if you have a good decay of singular >> values (but if you don't it probably just means you have so much noise >> that it masks the problem you are trying to solve in your data). > > You don't need to run it as stochastic, either. The regular LanczosSolver > will work on this data, if it lives as a DRM. > > -jake