Also: my personal understanding that use of straightforward SVD in recommenders produces oversmoothed (or overregularized) results.
Intuition behind this is as follows: consider Netflix example (users x movies). if I am a user, i probbably rated 40 or so movies out of say 100,000 that Netflix has. The typical strategy to populate input rating matrix (in order to keep it sparse, among other things), is to compute and average of my rating and subtract it from all rated positions (leaving unrated positions at 0). Now, say for the sake of this discussion, my average rating is 3. But some how I am heavily leaning to chick flicks. So out of all 40, i saw about 10 that i rated with the highest rating. So any reasonable inference should figure that if i rated 10 out of 10 chick flicks at their highest rating, then i am probably into that stuff. But the computation wouldn't see it the same way. Netflix may be having 10,000 chick flicks movies, and for computation it basically would look like i rated 10 movies at 5 and the rest of 10,000 of them as 3 (my average). So, 10 '5's out of 10,000 '3' s would still rate my interest in chick flicks pretty low (nearly back to what happens to be my average). In other words, regularization of the training is waaay up. So oversmoothing over sparse data is a problem here. Other algorithms deal with that by either not taking unrated examples into account at all (SGD-based factorization methods) or providing some sort of weighting to the regularization amplification based on the number of rated samples (i think that's ALS WR's way of coping with this). Am i making any sense? or i am fundamentally wrong in my understanding of the flow of basic SVD recommendation?
