there's been a great blog on that somewhere on richrelevance blog... But i
have a vague feeling based on what you are saying it may be all old news to
you...

[1] http://engineering.richrelevance.com/bandits-recommendation-systems/
and there's more in the series

On Sat, Sep 17, 2016 at 3:10 PM, Pat Ferrel <p...@occamsmachete.com> wrote:

> I’ve been thinking about how one would implement an application that only
> shows recommendations. This is partly because people want to build such
> things.
>
> There are many problems with this including cold start and overfit.
> However these problems also face MABs and are solved with sampling schemes.
> So imagine that you have several models from which to draw recommendations:
> 1) CF based recommender, 2) random recommendations, 3) popular recs (by
> some measure). If we look at each individual as facing an MAB with a
> sampling algo trained by them to pull recs from the 3 (or more) arms. This
> implies an MAB per user.
>
> The very first visit to the application would randomly draw from the
> choices and since there is no user data the recs engine would have to be
> able to respond (perhaps with random recs) the same would have to be true
> of the popular model (returning random), and random is always happy. The
> problem with this is that none of the arms are completely independent and
> the model driving each arm will change over time.
>
> The first time a user visits will result in a new MAB for them and will
> randomly draw from all arms but may get better responses from popular (with
> no user specific data yet in the system for cf). So the sampling will start
> to favor popular but will still explore other methods. When enough data is
> accumulated to start making good recs, the recommender will start to
> outperform popular and will get more of the user’s reinforcement.
>
> This seems to work with several unanswered questions and one problem to
> avoid—overfit. We would need a sampling method that would never fully
> converge or the user would never get a chance to show their
> expanding/changing preferences. The cf recommender will also overfit if
> non-cf items are not mixed in. Of the sampling methods I’ve seen for MABs,
> Greedy will not work but even  with some form of Bayesian/Thompson sampling
> the question is how to parameterize the sampling. With too little
> convergence we get sub-optimal exploit but we get the same with too much
> convergence and this will also overfit the cf recs.
>
> I imagine we could train a meta-model on the mature explore amount by
> trying different parameterization and finding if there is one answer for
> all or we could resort to heuristic rules—even business rules.
>
> If anyone has read this far, any ideas or comments?

Reply via email to