I think one crucial point is missing from this discussion. In Collaborative Filtering the matrix to decompose is only partially defined. A missing entry cannot simply be substituted by 0, it is unknown. Therefore, you cannot use standard SVD techniques as you would use for LSI for example.
On 25.03.2013 11:46, Sean Owen wrote: > Points from across several e-mails -- > > The initial item-feature matrix can be just random unit vectors too. I > have slightly better results with that. > > You are finding the least-squares solution of A = U M' for U given A > and M. Yes you can derive that analytically as the zero of the > derivative of the error function. > > With m users and n items, and k features, where k=n, then I suppose > you don't need any iterations at all since there is a trivial > solution: U = A, M = I(n) (the nxn identity matrix). You would not > find this on the first iteration, however, if you followed the > algorithm, because you would be starting from some random starting > point. But if you initialized M to the identity matrix, yes you'd find > the exact solution immediately. > > Yes it is an iterative algorithm and you have to define a convergence > criterion. I use average absolute difference in (U M') entries from > one iteration to the next. (Well, a sample.) It's possible that you > reach your criterion in 1 iteration, or, not. It depends on the > criterion. Usually when you restart ALS on updated data, you use the > previous M matrix as a starting point. If the change in data is small, > one iteration should usually leave you still "converged" actually. > But, from random starting point -- not typical. > > ALS is similar to SVD only in broad terms. The SVD is not always used > to make a low-rank factorization, and, its outputs carry more > guarantees -- they are orthonormal bases because it has factored out > scaling factors into the third matrix Sigma. I think of ALS as more > simplistic and therefore possibly faster. WIth k features I believe > (?) the SVD is necessarily a k-iteration process at least, whereas ALS > might be of use after 1 iteration. The SVD is not a "shortcut" for > ALS. If you go to the trouble of a full SVD, you can certainly use > that factorization as-is though. > > You need regularization. > > > It should be pointed out that the "ALS" often spoken of here is not > one that factors the input matrix A. There's a variant, that I have > had good results with, for 'implicit' feedback. There, you are > actually factoring the matrix P = (1 : A != 0, 0 : A == 0), and using > values in A as weights in the loss function. You're reconstructing > "interacted or not" and using input value as a confidence measure. > This "works" for ratings although the interpretation in this variant > doesn't line up with the nature of ratings. It works quite nicely for > things like clicks, etc. > > (Much more can be said on this point.) > > > > On Mon, Mar 25, 2013 at 2:19 AM, Dominik Huebner <cont...@dhuebner.com> wrote: >> It's quite hard for me to get the mathematical concepts of the ALS >> recommenders. It would be great if someone could help me to figure out >> the details. This is my current status: >> >> 1. The item-feature (M) matrix is initialized using the average ratings >> and random values (explicit case) >> >> 2. The user-feature (U) matrix is solved using the partial derivative of >> the error function with respect to u_i (the columns of row-vectors of U) >> >> Supposed we use as many features as items are known and the error >> function does not use any regularization. Would U be solved within the >> first iteration? If not, I do not understand why more than one iteration >> is needed. >> Furthermore, I believe to have understood that using fewer features than >> items and also applying regularization, does not allow to solve U in a >> way that the stopping criterion can be met after only one iteration. >> Thus, iteration is required to gradually converge to the stopping >> criterion. >> >> I hope I have pointed out my problems clearly enough. >>