It's not an issue of how to be careful with sparsity and subtracting
means, although that's a valuable point in itself. The question is
what the mean is supposed to be.

You can't think of missing ratings as 0 in general, and the example
here shows why: you're acting as if most movies are hated. Instead
they are excluded from the computation entirely.

m_x should be 4.5 in the example here. That's consistent with
literature and the other implementations earlier in this project.

I don't know the Hadoop implementation well enough, and wasn't sure
from the comments above, whether it does end up behaving as if it's
"4.5" or "3". If it's not 4.5 I would call that a bug. Items that
aren't co-rated can't meaningfully be included in this computation.


On Sun, Dec 1, 2013 at 8:29 AM, Ted Dunning <ted.dunn...@gmail.com> wrote:
> Good point Amit.
>
> Not sure how much this matters.  It may be that
> PearsonCorrelationSimilarity is bad name that should be
> PearonInspiredCorrelationSimilarity.  My guess is that this implementation
> is lifted directly from the very early recommendation literature and is
> reflective of the way that it was used back then.

Reply via email to