A colleague of mine just build a MAP@k precision evaluator for the Mahout based cooccurrence recommender we’ve been working on and we ran some data scraped from rottentomatoes.com <http://rottentomatoes.com/> They have “fresh” and “rotten” reviews tied to reviewer ids.
A fair bit of discussion has gone on about how to use negative preferences. We have been saying that negative preferences might be predictive of positive preferences and the cross-cooccurrence code in the new SimilarityAnalysis.cooccurrence method can make the data usable. We took the RT data for two “actions”: “fresh" as the primary, the best indicator of preference, and “rotten” as the secondary indicator. We found that MAP using only “fresh” was bettered by almost 20% when we included “rotten” as the secondary cross-cooccorrence action. For the strict out there we did not directly isolate the two actions, which is work remaining so some of the lift might be due to just having more data but it’s a really good first step because more data doesn't always translate to better performance and anyway it’s data you wouldn’t have otherwise. This opens up a new way to compare all sorts of other user signals, some long considered to be unusable by recommenders. Gender, location, category preferences are now fair game for testing. BTW we used this recommender, which is based on Mahout Samsara’s matrix math, cooccurrence and LLR. https://github.com/pferrel/scala-parallel-universal-recommendation <https://github.com/pferrel/scala-parallel-universal-recommendation>