A colleague of mine just build a MAP@k precision evaluator for the Mahout based 
cooccurrence recommender we’ve been working on and we ran some data scraped 
from rottentomatoes.com <http://rottentomatoes.com/> They have “fresh” and 
“rotten” reviews tied to reviewer ids.

A fair bit of discussion has gone on about how to use negative preferences. We 
have been saying that negative preferences might be predictive of positive 
preferences and the cross-cooccurrence code in the new 
SimilarityAnalysis.cooccurrence method can make the data usable.

We took the RT data for two “actions”: “fresh" as the primary, the best 
indicator of preference, and “rotten” as the secondary indicator. We found that 
MAP using only “fresh” was bettered by almost 20% when we included “rotten” as 
the secondary cross-cooccorrence action. For the strict out there we did not 
directly isolate the two actions, which is work remaining so some of the lift 
might be due to just having more data but it’s a really good first step because 
more data doesn't always translate to better performance and anyway it’s data 
you wouldn’t have otherwise. 

This opens up a new way to compare all sorts of other user signals, some long 
considered to be unusable by recommenders. Gender, location, category 
preferences are now fair game for testing.

BTW we used this recommender, which is based on Mahout Samsara’s matrix math, 
cooccurrence and LLR. 
https://github.com/pferrel/scala-parallel-universal-recommendation 
<https://github.com/pferrel/scala-parallel-universal-recommendation>

Reply via email to