Re: generic latent variable recommender question

Pat Ferrel Sun, 26 Jan 2014 09:36:55 -0800

Well I’m happy to raise the bar for stupid questions…

thanks

Some more anti-flood ideas. The video browser/recommender demo site for the 
solr-recommender has user preference training data and genre metadata (it would 
be nice to have other things). It makes this set of recommendations for each 
user where there is enough preference data:
1) user preference history based recs (the query is the user's thumbs-up or 
add-to-watchlist history)
2) user preference history based where recent item detail views is the query
3) user preference history based but weighted by all genres preferred. So the 
query is history on training data and preferred genres on item genre metadata. 
This is not the same as filtering by category.
4) user preference history based but weighted by your favorite non-trivial 
genre.
5) user preference history based but weighted by your second most favorite 
genre.
6) metadata alone recs based on using your preferred genres as the query.

#2 and #6 are probably the least successful and they are all just experiments. 
Many people filter by category/genre, here I chose to weight the query using 
Solr boosting on the category field. This results in fairly different sets of 
recs but since it’s only me looking at them I’ll reserve judgement on quality.

On an item detail page you get the following sets:
1) item similarity based recs, taken directly from the training item-item 
similarity matrix
2) item similarity based recs from using #1 as the query, subtly different than 
#1 for various reasons
3) #2 weighted by the genres of the video being viewed
4) #2 from the first list--click path based recs
5) user preference history based recs weighted to the genre of the current 
video (similar to #4 i the top list)

Here the eyeball test says #3 & #5 look best at least to me.

I think I’ll leave dithering out until it goes live because it would seem to 
make the eyeball test easier. I doubt all these experiments will survive.

On Jan 25, 2014, at 4:37 PM, Suneel Marthi <suneel_mar...@yahoo.com> wrote:

N(0, log\epsilon) =>   Normal Distribution with Mean = 0 and Variance = 
log(epsilon)

On Saturday, January 25, 2014 7:33 PM, Pat Ferrel <p...@occamsmachete.com> 
wrote:

For anti-flood and in the vein of “UI” you can build a recommender that 
recommends categories or genres then get recommendations weighted or filtered 
by those categories. A simple version of this is to just look at preference 
frequency by category for the current user. This is a lot like what Amazon does 
on their front page.

BTW can you explain your notation? s = log r + N(0,log \epsilon) 
N?, \epsilon? 

Showing my ignorance probably but why stop now?

On Jan 25, 2014, at 3:56 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:

Dithering is commonly done by re-ranking results using a noisy score.  Take
r to be the original rank (starting with 1).  Then compute a score as

     s = log r + N(0,log \epsilon)

and sort by this new score in ascending order.

Items will be shuffled by this method in such a way that the probability
that item 2k will appear before item k is nearly invariant with respect to
k.  Thus, item 3 will appear before item 1 about as often as item 30 will
appear before item 10.  The major effect here is to dredge deep results up
onto the first page (occasionally) so that the recommendation has broader
training data.

You can seed this with time in order to get the appearance of changing
recommendations even when no change in history is recorded.  Moreover, the
time varying seed can be held constant for a short period (a few minutes to
an hour or so) so that you also give the appearance of short-term
stability.  Both of these effects seem to entice users back to a
recommendation page.  Ironically, people seem more willing to return to the
first recommendation page than they are willing to click to the second page.

This addition of random noise obviously makes your best recommendation
results worse. The penalty is worthwhile to the extent that your
recommender learns enough to make results better tomorrow.  This has been
my universal experience for reasonable levels of dithering.

Anti-flood is quite a bit more heuristic and can be motivated by the idea
that recommenders are recommending individual items but users are being
shown an entire portfolio of items on the first page.  The probability of
making the user happy with any given page of recommendations is not
increased if you show items which are nearly identical because if they like
one item, they will very, very likely the others and if they don't like
one, they likely won't like the others.  On the other hand, if you were two
split the page between two groups of very distinctly different kinds of
items, if you miss on one group, you don't have a guaranteed miss on the
second group and thus you have hedged your bets and will have better user
satisfaction.

How you accomplish this is largely a UI question.  You could cluster the
items and show the users 1-2 items from each cluster with an option for
seeing the full cluster.  You can also use a synthetic score approach where
you penalize items that are too similar to items higher in the results
list.  The meaning of too similar is typically hand crafted to your domain.
It might be a test for the same author, or the same genre or whatever you
have handy.

On Sat, Jan 25, 2014 at 1:42 PM, Tevfik Aytekin <tevfik.ayte...@gmail.com>wrote:

> Hi Ted,
> Could you explain what do you mean by a "dithering step" and an
> "anti-flood step"?
> By dithering I guess you mean adding some sort of noise in order not
> to show the same results every time.
> But I have no clue about the anti-flood step.
> 
> Tevfik
> 
> On Sat, Jan 25, 2014 at 11:05 PM, Koobas <koo...@gmail.com> wrote:
>> On Sat, Jan 25, 2014 at 3:51 PM, Tevfik Aytekin <
> tevfik.ayte...@gmail.com>wrote:
>> 
>>> Case 1 is fine, in case 2, I don't think that a dot product (without
>>> normalization) will yield a meaningful distance measure. Cosine
>>> distance or a Pearson correlation would be better. The situation is
>>> similar to Latent Semantic Indexing in which documents are represented
>>> by their low rank approximations and similarities between them (that
>>> is, approximations) are computed using cosine similarity.
>>> There is no need to make any normalization in case 1 since the values
>>> in the feature vectors are formed to approximate the rating values.
>>> 
>>> That's exactly what I was thinking.
>> Thanks for your reply.
>> 
>> 
>>> On Sat, Jan 25, 2014 at 5:08 AM, Koobas <koo...@gmail.com> wrote:
>>>> A generic latent variable recommender question.
>>>> I passed the user-item matrix through a low rank approximation,
>>>> with either something like ALS or SVD, and now I have the feature
>>>> vectors for all users and all items.
>>>> 
>>>> Case 1:
>>>> I want to recommend items to a user.
>>>> I compute a dot product of the user’s feature vector with all feature
>>>> vectors of all the items.
>>>> I eliminate the ones that the user already has, and find the largest
>>> value
>>>> among the others, right?
>>>> 
>>>> Case 2:
>>>> I want to find similar items for an item.
>>>> Should I compute dot product of the item’s feature vector against
> feature
>>>> vectors of all the other items?
>>>>    OR
>>>> Should I compute the ANGLE between each par of feature vectors?
>>>> I.e., compute the cosine similarity?
>>>> I.e., normalize the vectors before computing the dot products?
>>>> 
>>>> If “yes” for case 2, is that something I should also do for case 1?
>>>

Re: generic latent variable recommender question

Reply via email to