On Mon, Mar 25, 2013 at 1:41 PM, Koobas <koo...@gmail.com> wrote: >> But the assumption works nicely for click-like data. Better still when >> you can "weakly" prefer to reconstruct the 0 for missing observations >> and much more strongly prefer to reconstruct the "1" for observed >> data. >> > > This does seem intuitive. > How does the benefit manifest itself? > In lowering the RMSE of reconstructing the interaction matrix? > Are there any indicators that it results in better recommendations? > Koobas
In this approach you are no longer reconstructing the interaction matrix, so there is no RMSE vs the interaction matrix. You're reconstructing a matrix of 0 and 1. Because entries are weighted differently, you're not even minimizing RMSE over that matrix -- the point is to take some errors more seriously than others. You're minimizing a *weighted* RMSE, yes. Yes of course the goal is better recommendations. This broader idea is harder to measure. You can use mean average precision to measure the tendency to predict back interactions that were held out. Is it better? depends on better than *what*. Applying algorithms that treat input like ratings doesn't work as well on click-like data. The main problem is that these will tend to pay too much attention to large values. For example if an item was clicked 1000 times, and you are trying to actually reconstruct that "1000", then a 10% error "costs" (0.1*1000)^2 = 10000. But a 10% error in reconstructing an item that was clicked once "costs" (0.1*1)^2 = 0.01. The former is considered a million times more important error-wise than the latter, even though the intuition is that it's just 1000 times more important. Better than algorithms that ignore the weight entirely -- yes probably if only because you are using more information. But as in all things "it depends".