Also:

my personal understanding that use of straightforward SVD in
recommenders produces oversmoothed (or overregularized) results.

Intuition behind this is as follows: consider Netflix example (users x movies).

if I am a user, i probbably rated 40 or so movies out of say 100,000
that Netflix has.

The typical strategy to populate input rating matrix (in order to keep
it sparse, among other things), is to compute and average of my rating
and subtract it from all rated positions (leaving unrated positions at
0).

Now, say for the sake of this discussion, my average rating is 3.  But
some how I am heavily leaning to chick flicks. So out of all 40, i saw
about 10 that i rated with the highest rating. So any reasonable
inference should figure that if i rated 10 out of 10 chick flicks at
their highest rating, then i am probably into that stuff.

But the computation wouldn't see it the same way. Netflix may be
having 10,000 chick flicks movies, and for computation it basically
would look like i rated 10 movies at 5 and the rest of 10,000 of them
as 3 (my average). So, 10 '5's out of 10,000 '3' s would still rate my
interest in chick flicks pretty low (nearly back to what happens to be
my average). In other words, regularization of the training is waaay
up.

So oversmoothing over sparse data is a problem here. Other algorithms
deal with that by either not taking unrated examples into account at
all (SGD-based factorization methods) or providing some sort of
weighting to the regularization amplification based on the number of
rated samples (i think that's ALS WR's way of coping with this).

Am i making any sense? or i am fundamentally wrong in my understanding
of the flow of basic SVD recommendation?

Reply via email to