I think the model you're referring to can use explicit or implicit
feedback. It's using the values -- however they are derived -- as
weights in the loss function rather than values to be approximated
directly. So you still use P even with implicit feedback.

Of course you can also use ALS to factor R directly if you wanted, also.

Overfitting is as much an issue as in any ML algorithm. Hard to
quantify it more than that but you certainly don't want to use lambda
= 0.

The right value of lambda depends on the data -- depends even more on
what you mean by lambda! there are different usages in different
papers. More data means you need less lambda. The effective weight on
the overfitting / Tikhonov terms is about 1 in my experience -- these
terms should be weighted roughly like the loss function terms. But
that can mean using values for lambda much smaller than 1, since
lambda is just one multiplier of those terms in many formulations.

The rank has to be greater than the effective rank of the data (of
course). It's also something you have to fit to the data
experimentally. For normal-ish data sets of normal-ish size, the right
number of features is probably 20 - 100. I'd test in that range to
start.

More features tends to let the model overfit more, so in theory you
need more lambda with more features, all else equal.

It's *really* something you just have to fit to representative sample
data. The optimal answer is way too dependent on the nature,
distribution and size of the data to say more than the above.


On Tue, Jan 8, 2013 at 8:54 PM, Koobas <koo...@gmail.com> wrote:
>> Okay, I got a little bit further in my understanding.
> The matrix of ratings R is replaced with the binary matrix P.
> Then R is used again in regularization.
> I get it.
> This takes care of the situations when you have user-item interactions,
> but you don't have the rating.
> So, it can handle explicit feedback, implicit feedback, and mixed (partial
> / missing feedback).
> If I have implicit feedback, I just drop R altogether, right?
>
> Now the only remaining "trick" is Tikhonov regularization,
> which leads to a couple of questions:
> 1) How much of a problem overfitting is?
> 2) How do I pick lambda?
> 3) How do I pick the rank of the approximation in the first place?
>     How does the overfitting problem depend on the rank of the
> approximation?

Reply via email to