there's been a great blog on that somewhere on richrelevance blog... But i have a vague feeling based on what you are saying it may be all old news to you...
[1] http://engineering.richrelevance.com/bandits-recommendation-systems/ and there's more in the series On Sat, Sep 17, 2016 at 3:10 PM, Pat Ferrel <p...@occamsmachete.com> wrote: > I’ve been thinking about how one would implement an application that only > shows recommendations. This is partly because people want to build such > things. > > There are many problems with this including cold start and overfit. > However these problems also face MABs and are solved with sampling schemes. > So imagine that you have several models from which to draw recommendations: > 1) CF based recommender, 2) random recommendations, 3) popular recs (by > some measure). If we look at each individual as facing an MAB with a > sampling algo trained by them to pull recs from the 3 (or more) arms. This > implies an MAB per user. > > The very first visit to the application would randomly draw from the > choices and since there is no user data the recs engine would have to be > able to respond (perhaps with random recs) the same would have to be true > of the popular model (returning random), and random is always happy. The > problem with this is that none of the arms are completely independent and > the model driving each arm will change over time. > > The first time a user visits will result in a new MAB for them and will > randomly draw from all arms but may get better responses from popular (with > no user specific data yet in the system for cf). So the sampling will start > to favor popular but will still explore other methods. When enough data is > accumulated to start making good recs, the recommender will start to > outperform popular and will get more of the user’s reinforcement. > > This seems to work with several unanswered questions and one problem to > avoid—overfit. We would need a sampling method that would never fully > converge or the user would never get a chance to show their > expanding/changing preferences. The cf recommender will also overfit if > non-cf items are not mixed in. Of the sampling methods I’ve seen for MABs, > Greedy will not work but even with some form of Bayesian/Thompson sampling > the question is how to parameterize the sampling. With too little > convergence we get sub-optimal exploit but we get the same with too much > convergence and this will also overfit the cf recs. > > I imagine we could train a meta-model on the mature explore amount by > trying different parameterization and finding if there is one answer for > all or we could resort to heuristic rules—even business rules. > > If anyone has read this far, any ideas or comments?