On Sat, Oct 8, 2011 at 12:43 PM, Dan Brickley <dan...@danbri.org> wrote:

> ...
> ...and I get as expected, a few less than 100 due to the cleaning (88).
> Each
> of these has 27683 values, which is the number of topic codes in my data.
>
> I'm reading this (correct me if I have this backwards) as if my topics are
> now data points positioned in a new compressed version of a 'book space'.
> What I was after was instead 100000 books in a new lower-dimensioned 'topic
> space' (can I say this as: I want left singular vectors but I'm getting
> right singular vectors?). Hence the attempt to transpose and rerun Lanczos;
> I thought this the conceptually simplest if not most efficient way to get
> there. I understand there are other routes but expected this one to work.
>

I don't know the options to Lanczos as well as I should.  Your idea that you
are getting right vectors only is correct and the idea to transpose in order
to get the left vectors is sound.

BUT

I think that there is an option to get the left vectors as well as the right
ones.

Also, while you are at it, I think that the code in MAHOUT-792 might be able
to do these decompositions at your scale much, much faster since they use an
in-memory algorithm on a single machine to avoid all the kerfuffle of
running a map-reduce.

Reply via email to