Even more in-line. On Mon, Mar 25, 2013 at 11:46 AM, Sean Owen <sro...@gmail.com> wrote:
> Points from across several e-mails -- > > The initial item-feature matrix can be just random unit vectors too. I > have slightly better results with that. > > You are finding the least-squares solution of A = U M' for U given A > and M. Yes you can derive that analytically as the zero of the > derivative of the error function. > > With m users and n items, and k features, where k=n, then I suppose > you don't need any iterations at all since there is a trivial > solution: U = A, M = I(n) (the nxn identity matrix). You would not > find this on the first iteration, however, if you followed the > algorithm, because you would be starting from some random starting > point. But if you initialized M to the identity matrix, yes you'd find > the exact solution immediately. > Yes it is an iterative algorithm and you have to define a convergence > criterion. I use average absolute difference in (U M') entries from > one iteration to the next. (Well, a sample.) It's possible that you > reach your criterion in 1 iteration, or, not. It depends on the > criterion. Usually when you restart ALS on updated data, you use the > previous M matrix as a starting point. If the change in data is small, > one iteration should usually leave you still "converged" actually. > But, from random starting point -- not typical. > > ALS is similar to SVD only in broad terms. The SVD is not always used > to make a low-rank factorization, and, its outputs carry more > guarantees -- they are orthonormal bases because it has factored out > scaling factors into the third matrix Sigma. I think of ALS as more > simplistic and therefore possibly faster. Possibly. Or not. The stochastic project method gives you an SVD pretty quickly. > WIth k features I believe > (?) the SVD is necessarily a k-iteration process at least, whereas ALS > might be of use after 1 iteration. SVD need not be iterative at all. The SSVD code uses roughly 5 map-reduces to give you a high quality SVD approximation. There is the option to add 0, 1 or more extra iterations, but it is rare to need more than 1. ALS could well be of use after less work. This is especially try for incremental solutions. > The SVD is not a "shortcut" for > ALS. If you go to the trouble of a full SVD, you can certainly use > that factorization as-is though. > > You need regularization. > > > It should be pointed out that the "ALS" often spoken of here is not > one that factors the input matrix A. There's a variant, that I have > had good results with, for 'implicit' feedback. There, you are > actually factoring the matrix P = (1 : A != 0, 0 : A == 0), and using > values in A as weights in the loss function. You're reconstructing > "interacted or not" and using input value as a confidence measure. > This "works" for ratings although the interpretation in this variant > doesn't line up with the nature of ratings. It works quite nicely for > things like clicks, etc. > > (Much more can be said on this point.) > > > > On Mon, Mar 25, 2013 at 2:19 AM, Dominik Huebner <cont...@dhuebner.com> > wrote: > > It's quite hard for me to get the mathematical concepts of the ALS > > recommenders. It would be great if someone could help me to figure out > > the details. This is my current status: > > > > 1. The item-feature (M) matrix is initialized using the average ratings > > and random values (explicit case) > > > > 2. The user-feature (U) matrix is solved using the partial derivative of > > the error function with respect to u_i (the columns of row-vectors of U) > > > > Supposed we use as many features as items are known and the error > > function does not use any regularization. Would U be solved within the > > first iteration? If not, I do not understand why more than one iteration > > is needed. > > Furthermore, I believe to have understood that using fewer features than > > items and also applying regularization, does not allow to solve U in a > > way that the stopping criterion can be met after only one iteration. > > Thus, iteration is required to gradually converge to the stopping > > criterion. > > > > I hope I have pointed out my problems clearly enough. > > >