Re: Understanding the SVD recommender

Dmitriy Lyubimov Thu, 17 Nov 2011 11:31:17 -0800

I will finish adding an option with Cholesky decomposition route to
SSVD some time early in Q1 2012.


RE: distributed QR: right now this option is architecturally "melted"
with SSVD because MR steps that do distributed QR doing extra job (i
would have to use more MR jobs if i did not, to do the same). Although
QR related math is modelled into nice and standalone set of "pipeline
preprocessors" (in other word, MR collector decorators)  which could
be wired up in a standalone MR QR solver, such wiring at the moment
does not exist (although it is not difficult to wire one up). However,
it would seem to me that QR as a completely isolated job would have
little value in machine learning applications.

Thanks.
-d

On Sun, Nov 6, 2011 at 3:26 PM, Ted Dunning <[email protected]> wrote:
> The tridiagonal is much smaller than you would need if you wanted all the
> eigenvalues.  Since you only want a small number, you only have a
> tri-diagonal matrix that is some multiple of that size.  In-memory
> decomposition makes total sense.
>
> Regarding QR decomposition, Dmitriy has already built a parallel version.
>  A large QR is required at one point in the SSVD algorithm.  I have shown a
> way to avoid this large QR using a much smaller Cholesky decomposition, but
> it isn't entirely clear that this is a net win.  It is a huge win with the
> sequential version.
>
> On Sun, Nov 6, 2011 at 3:17 PM, Sean Owen <[email protected]> wrote:
>
>> Yeah OK, that makes sense. That's a pretty helpful paper.
>>
>> I can get through the Lanczos algorithm without much trouble, I think.
>> The piece I'm looking for pointers on at the moment is the eigen
>> decomposition of the tridiagonal matrix.
>>
>> Jake am I right that this is done in memory right now? I am sure your
>> previous comment actually answers this next question, but I haven't
>> connected the dots yet. If we're getting eigenvectors of AT * A, and
>> that matrix is huge, then isn't the tridiagonal matrix still huge --
>> albeit it has only 3m entries, not m^2. It seems like the code just
>> calls to EigenDecomposer which treats it as a dense m x m matrix.
>>
>> (Is it worth me figuring out the QR decomposition enough to build a
>> distributed version of it?)
>>
>> On Sun, Nov 6, 2011 at 7:25 PM, Sebastian Schelter <[email protected]> wrote:
>> > On 06.11.2011 18:52, Sean Owen wrote:
>> >> Following up on this very old thread.
>> >>
>> >> I understood all this bit, thanks, that greatly clarified.
>> >>
>> >> You multiply a new user vector by V to project it into the new
>> >> "pseudo-item", reduced-dimension space.
>> >> And to get back to real items, you multiply by V's inverse, which is
>> >> its transpose.
>> >> And so you are really multiplying the user vector by V VT, which is
>> >> not a no-op, since those are truncated matrices and aren't actually
>> >> exact inverses (?)
>> >>
>> >> The original paper talks about cosine similarities between users or
>> >> items in the reduced-dimension space, but, can anyone shine light on
>> >> the point of that?
>> >
>> > I think this interpretation is coming from LSI where you project a query
>> > onto the document feature space and use the cosine to find the most
>> > similar documents (which can be done by simple matrix vector
>> > mulitplication as the singular vectors are orthonormal and computing dot
>> > products with the projected query is therefore equal to computing
>> cosines).
>> >
>> > Something similar could be done to find most similar items or users, the
>> > bad thing is that AFAIK you have to look at every other user or item as
>> > U and V are dense.
>> >
>> > Maybe this paper can help to shine light on that:
>> >
>> > "Using Linear Algebra for Intelligent Information Retrieval"
>> >
>> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.33.3138&rep=rep1&type=pdf
>> >
>> > From the paper also, it seems like they say the
>> >> predictions are just computed as vector products as above.
>> >>
>> >>
>> >> Finally, separately, I'm trying to understand the Lanczos method as
>> >> part of computing an SVD. Lanczos operates on a real symmetric matrix
>> >> right? And am I right that it comes into play when you are computing
>> >> and SVD...
>> >>
>> >> A = U * S * VT
>> >>
>> >> ... because U is actually the eigenvectors of (symmetric) A*AT and V
>> >> is the eigenvectors of AT*A? And so Lanczos is used to answer those
>> >> questions to complete the SVD
>> >>
>> >> On Fri, Jun 4, 2010 at 6:48 AM, Ted Dunning <[email protected]>
>> wrote:
>> >>> You are correct.  The paper has an appalling treatment of the folding
>> in
>> >>> approach.
>> >>>
>> >>> In fact, the procedure is dead simple.
>> >>>
>> >>> The basic idea is to leave the coordinate system derived in the
>> original SVD
>> >>> intact and simply project the new users into that space.
>> >>>
>> >>> The easiest way to see what is happening is to start again with the
>> original
>> >>> rating matrix A as decomposed:
>> >>>
>> >>>   A = U S V'
>> >>>
>> >>> where A is users x items.  If we multiply on the right by V, we get
>> >>>
>> >>>   A V = U S V' V = U S
>> >>>
>> >>> (because V' V = I, by definition).  This result is (users x items) x
>> (items
>> >>> x k) = users x k, that is, it gives a k dimensional vector for each
>> user.
>> >>>  Similarly, multiplication on the left by U' gives a k x items matrix
>> which,
>> >>> when transposed gives a k dimensional vector for each item.
>> >>>
>> >>> This implies that if we augment U with new user row vectors U_new, we
>> should
>> >>> be able to simply compute new k-dimensional vectors for the new users
>> and
>> >>> adjoin these new vectors to the previous vectors.  Concisely put,
>> >>>
>> >>> ( A     )     ( A V     )
>> >>> (       ) V = (         )
>> >>> ( A_new )     ( A_new V )
>> >>>
>> >>> This isn't really magical.  It just says that we can compute new user
>> >>> vectors at any time by multiplying the new users' ratings by V.
>> >>>
>> >>> The diagram in figure one is hideously confusing because it looks like
>> a
>> >>> picture of some kind of multiplication whereas it is really depicting
>> some
>> >>> odd kind of flow diagram.
>> >>>
>> >>> Does this solve the problem?
>> >>>
>> >>> On Thu, Jun 3, 2010 at 9:26 AM, Sean Owen <[email protected]> wrote:
>> >>>
>> >>>> Section 3 is hard to understand.
>> >>>>
>> >>>> - Ak and P are defined, but not used later
>> >>>> - Definition of P has UTk x Nu as a computation. UTk is a k x m
>> >>>> matrix, and Nu is "t" x 1. t is not defined.
>> >>>> - This only makes sense if t = m. But m is the number of users, and Nu
>> >>>> is a user vector, so should have a number of elements equal to n, the
>> >>>> number of items
>> >>>> - Sk * VTk is described as a k x "d" matrix but d is undefined
>> >>>> - The diagram suggests that VTk are multiplied by all the Nu, which
>> >>>> makes more sense -- but only if Nu are multiplied by VTk, not the
>> >>>> other way. And the diagram depicts neither of those.
>> >>>> - Conceptually I would understand Nu x VTk, but then P is defined by
>> >>>> an additional product with Uk
>> >>>>
>> >>>> In short... what?
>> >>>>
>> >>>>
>> >>>> On Thu, Jun 3, 2010 at 4:15 PM, Ted Dunning <[email protected]>
>> wrote:
>> >>>>> Fire away.
>> >>>>>
>> >>>>> On Thu, Jun 3, 2010 at 3:52 AM, Sean Owen <[email protected]> wrote:
>> >>>>>
>> >>>>>> Is anyone out there familiar enough with this to a) discuss this
>> paper
>> >>>>>> with me or b) point me to another writeup on the approach?
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >
>> >
>>
>

Re: Understanding the SVD recommender

Reply via email to