Re: Evaluating recommendations through user observation

Ted Dunning Mon, 27 Dec 2010 16:35:36 -0800

On Mon, Dec 27, 2010 at 4:24 PM, Sebastian Schelter <[email protected]> wrote:


> From my experience the best insights are found by A/B testing
> different algorithms against live users and measuring relevant actions
> you want to see triggered by your recommender system (the number of
> recommended items put into a shopping cart for example).
>

Amen to this.  I only addressed off-line evaluation, but on-line evaluation
is far better if you have sufficient traffic.  Generally, offline testing is
only usable to weed out totally useless options and A/B testing is required
for more realistic assessment.

> > >
> > > On Mon, Dec 27, 2010 at 6:54 AM, Otis Gospodnetic
> > > <[email protected]> wrote:
> > > > Hi,
> > > >
> > > > I was wondering how people evaluate the quality of recommendations
> other
> > > than
> > > > RMSE and such in eval package.
> > >
> >
> > Off-line evaluation is difficult.  Your suggestion of MRR and related
> > measures is reasonable, but I prefer to count every presentation on the
> > first page as equivalent.
> >
> > The real problem is that historical data will only include presentations
> of
> > items from a single recommendation system.  That means that any new
> system
> > that brings in new recommendations is at a disadvantage at least in terms
> of
> > error bars around the estimated click through rate.
> >
> > Another option is to compute grouped AUC for clicked items relative to
> > unclicked items.  To do this, iterate over users with clicks.  Pick a
> random
> > clicked item and a random unclicked item.  Score 1 if clicked item has
> > higher score, 0 otherwise.  Ties can be broken at random, but I prefer to
> > score 0 or 0.5 for them.  Average score near 1 is awesome.
> >
> > I don't find it all that helpful to use the exact rank.  Rather, I like
> to
> > group all impressions that are shown in the same screenful together and
> then
> > ignore second and later pages.  I also prefer to measure changes in
> behavior
> > that has business value rather than just ratings.
>

Re: Evaluating recommendations through user observation

Reply via email to