I think one crucial point is missing from this discussion. In
Collaborative Filtering the matrix to decompose is only partially
defined. A missing entry cannot simply be substituted by 0, it is
unknown. Therefore, you cannot use standard SVD techniques as you would
use for LSI for example.

On 25.03.2013 11:46, Sean Owen wrote:
> Points from across several e-mails --
> 
> The initial item-feature matrix can be just random unit vectors too. I
> have slightly better results with that.
> 
> You are finding the least-squares solution of A = U M' for U given A
> and M. Yes you can derive that analytically as the zero of the
> derivative of the error function.
> 
> With m users and n items, and k features, where k=n, then I suppose
> you don't need any iterations at all since there is a trivial
> solution: U = A, M = I(n) (the nxn identity matrix). You would not
> find this on the first iteration, however, if you followed the
> algorithm, because you would be starting from some random starting
> point. But if you initialized M to the identity matrix, yes you'd find
> the exact solution immediately.
> 
> Yes it is an iterative algorithm and you have to define a convergence
> criterion. I use average absolute difference in (U M') entries from
> one iteration to the next. (Well, a sample.) It's possible that you
> reach your criterion in 1 iteration, or, not. It depends on the
> criterion. Usually when you restart ALS on updated data, you use the
> previous M matrix as a starting point. If the change in data is small,
> one iteration should usually leave you still "converged" actually.
> But, from random starting point -- not typical.
> 
> ALS is similar to SVD only in broad terms. The SVD is not always used
> to make a low-rank factorization, and, its outputs carry more
> guarantees -- they are orthonormal bases because it has factored out
> scaling factors into the third matrix Sigma. I think of ALS as more
> simplistic and therefore possibly faster. WIth k features I believe
> (?) the SVD is necessarily a k-iteration process at least, whereas ALS
> might be of use after 1 iteration. The SVD is not a "shortcut" for
> ALS. If you go to the trouble of a full SVD, you can certainly use
> that factorization as-is though.
> 
> You need regularization.
> 
> 
> It should be pointed out that the "ALS" often spoken of here is not
> one that factors the input matrix A. There's a variant, that I have
> had good results with, for 'implicit' feedback. There, you are
> actually factoring the matrix P = (1 : A != 0, 0 : A == 0), and using
> values in A as weights in the loss function. You're reconstructing
> "interacted or not" and using input value as a confidence measure.
> This "works" for ratings although the interpretation in this variant
> doesn't line up with the nature of ratings. It works quite nicely for
> things like clicks, etc.
> 
> (Much more can be said on this point.)
> 
> 
> 
> On Mon, Mar 25, 2013 at 2:19 AM, Dominik Huebner <cont...@dhuebner.com> wrote:
>> It's quite hard for me to get the mathematical concepts of the ALS
>> recommenders. It would be great if someone could help me to figure out
>> the details. This is my current status:
>>
>> 1. The item-feature (M) matrix is initialized using the average ratings
>> and random values (explicit case)
>>
>> 2. The user-feature (U) matrix is solved using the partial derivative of
>> the error function with respect to u_i (the columns of row-vectors of U)
>>
>> Supposed we use as many features as items are known and the error
>> function does not use any regularization. Would U be solved within the
>> first iteration? If not, I do not understand why more than one iteration
>> is needed.
>> Furthermore, I believe to have understood that using fewer features than
>> items and also applying regularization, does not allow to solve U in a
>> way that the stopping criterion can be met after only one iteration.
>> Thus, iteration is required to gradually converge to the stopping
>> criterion.
>>
>> I hope I have pointed out my problems clearly enough.
>>

Reply via email to