It's the same idea, but yes you'd have to re-implement it for Hadoop. Randomly select a subset of users. Identify a small number of most-preferred items for that user -- perhaps the video(s) watched most often. Hold these data points out as a test set. Run your process on all the rest.
Make recommendations for the selected users. You then just see how many in the list were among the test data you held out. The percentage of recs that were in the test list is precision, and the percent of the test list in the recs is recall. Precision and recall are not good tests, but among the only ones you can carry out in the lab. Slightly better are variations on these two metrics, like F1 measure and normalized discounted cumulative gain. Also look up mean average precision. On Sun, Aug 26, 2012 at 10:47 AM, Jonathan Hodges <hodg...@gmail.com> wrote: > Hi, > > We have been tasked with producing video recommendations for our users. We > get about 100 million video views per month and track users and the videos > they watch, but currently we don’t collect rating value or preference. > Later we plan on using implicit data like percentage of video watched to > surmise preferences but for the first release we are stuck with Boolean > viewing data. To that end we started by using Mahout’s distributed > RecommenderJob with LoglikelihoodSimilarity algorithm to generate 50 video > recommendations for each user. We would like to gauge how well we are doing > by offline measuring precision and recall of these recommendations. We know > we should divide the viewing data into training and test data, but not real > sure what steps to take next. For the non-distributed approach we would > leverage IRStatistics to get the precision and recall values, but it seems > there isn’t as simple a solution within the Mahout framework for the Hadoop > based calculations. > > Can someone please share/suggest their techniques for evaluating > recommendation accuracy with Mahout’s Hadoop-based distributed algorithms? > > Thanks in advance, > > Jonathan