On Sat, Oct 8, 2011 at 12:43 PM, Dan Brickley <dan...@danbri.org> wrote:
> ... > ...and I get as expected, a few less than 100 due to the cleaning (88). > Each > of these has 27683 values, which is the number of topic codes in my data. > > I'm reading this (correct me if I have this backwards) as if my topics are > now data points positioned in a new compressed version of a 'book space'. > What I was after was instead 100000 books in a new lower-dimensioned 'topic > space' (can I say this as: I want left singular vectors but I'm getting > right singular vectors?). Hence the attempt to transpose and rerun Lanczos; > I thought this the conceptually simplest if not most efficient way to get > there. I understand there are other routes but expected this one to work. > I don't know the options to Lanczos as well as I should. Your idea that you are getting right vectors only is correct and the idea to transpose in order to get the left vectors is sound. BUT I think that there is an option to get the left vectors as well as the right ones. Also, while you are at it, I think that the code in MAHOUT-792 might be able to do these decompositions at your scale much, much faster since they use an in-memory algorithm on a single machine to avoid all the kerfuffle of running a map-reduce.