PS it might even be the case that hyperactive users simply are not as
informative as users that buy fewer items, and vice versa, which may have
some explanation based on information enthropy each such observation set
has, but if this is really why it works, it has nothing to do with desire
to fight overfitting  per se.


On Mon, Jun 16, 2014 at 12:50 PM, Dmitriy Lyubimov <dlie...@gmail.com>
wrote:

> Probably a question for Sebastian.
>
> As we know, the two papers (Hu-Koren-Volynsky and Zhou et. al) use
> slightly different loss functions.
>
> Zhou et al. are fairly unique in that they multiply norm of U, V vectors
> additionally by the number of observied interactions.
>
> The paper doesn't explain why it works except saying along the lines of
> "we tried several regularization matrices, and this one worked better in
> our case".
>
> I tried to figure why that is. And still not sure why it would be better.
> So b asically we say, by allowing smaller sets of observation having
> smaller regularization values, it is ok for smaller observation sets to
> overfit slightly more than larger observations sets.
>
> This seems to be counterintuitive. Intuition tells us, smaller sets
> actually would tend to overfit more, not less, and therefore might possibly
> use larger regularization rate, not smaller one. Sebastian, what's your
> take on weighing regularization in ALS-WR?
>
> thanks.
> -d
>

Reply via email to