It's the same idea, but yes you'd have to re-implement it for Hadoop.

Randomly select a subset of users. Identify a small number of
most-preferred items for that user -- perhaps the video(s) watched
most often. Hold these data points out as a test set. Run your process
on all the rest.

Make recommendations for the selected users. You then just see how
many in the list were among the test data you held out. The percentage
of recs that were in the test list is precision, and the percent of
the test list in the recs is recall.

Precision and recall are not good tests, but among the only ones you
can carry out in the lab. Slightly better are variations on these two
metrics, like F1 measure and normalized discounted cumulative gain.
Also look up mean average precision.

On Sun, Aug 26, 2012 at 10:47 AM, Jonathan Hodges <hodg...@gmail.com> wrote:
> Hi,
>
> We have been tasked with producing video recommendations for our users. We
> get about 100 million video views per month and track users and the videos
> they watch, but currently we don’t collect rating value or preference.
> Later we plan on using implicit data like percentage of video watched to
> surmise preferences but for the first release we are stuck with Boolean
> viewing data. To that end we started by using Mahout’s distributed
> RecommenderJob with LoglikelihoodSimilarity algorithm to generate 50 video
> recommendations for each user. We would like to gauge how well we are doing
> by offline measuring precision and recall of these recommendations. We know
> we should divide the viewing data into training and test data, but not real
> sure what steps to take next. For the non-distributed approach we would
> leverage IRStatistics to get the precision and recall values, but it seems
> there isn’t as simple a solution within the Mahout framework for the Hadoop
> based calculations.
>
> Can someone please share/suggest their techniques for evaluating
> recommendation accuracy with Mahout’s Hadoop-based distributed algorithms?
>
> Thanks in advance,
>
> Jonathan

Reply via email to