(This just posted to the list, but I believe it's a duplicate of a message from several days ago. See my previous response.)
On Wed, Jul 27, 2011 at 8:33 AM, MT <mael.tho...@telecom-bretagne.eu> wrote: > > I'm working on a common dataset that includes the user id, item id, and > timestamp (the moment the user bought the item). As there are no > preferences, I needed a binary item-based recommender, which I found in > Mahout (GenericBooleanPrefItemBasedRecommender and the Tanimoto > coefficient). Following the recommender documentation, I tried to evaluate > it with GenericRecommenderIRStatsEvaluator(), but I ran into a few problems. > > In fact, correct me if I'm wrong, but to me the evaluator will invariably > give us the same value for precision and recall. Since the items are all > rated with the binary 1.0 value, we give the recommender a threshold lower > than 1, thus for each user at items are considered relevant and removed from > the user's preferences to compute at recommendations. Precision and recall > are then computed with the two sets : relevant and retrieved items. Which > leads (I guess unless the recommender cannot compute at items) to precision > and recall being equal. > > Results are still useful though, since a value of 0.2 for precision tells us > that among the at recommended items, 20% were effectively bought by the > user. Although one can wonder if those items are the best recommendations, > the least we can say is that it somehow corresponds to the user's > preferences. > > However, I had a few ideas to give more meaning to precision and recall taht > I wanted to share, to get some advice before implementing them. > > I read this topic and I fully understand that IRStatsEvaluator is different > from classic evaluators (giving the MAE for example), but I feel that it > makes sense to have a parameter trainingPercentage that divides users' > preferences in two subsets of items. The first (typically 20%) are > considered as relevant items, which are to be predicted using the second > subset. This task is at the moment defined by at, resulting in often equal > numbers of items in the relevant and retrieved subset. This at value would > still be a parameter used to define the number of items retrieved. The > evaluator could then be run varying these two parameters to find the best > compromise between precision and recall. > > Furthermore, should the dataset contain a timestamp for each purchase, would > it not be logic to set the test set as the last items bought by the user ? > The evaluator would then follow what happens in real calculations. > > Finaly, I believe the documentation page has some mistakes in the last code > excerpt : > > evaluator.evaluate(builder, myModel, null, 3, > RecommenderIRStatusEvaluator.CHOOSE_THRESHOLD, > §1.0); > > should be > evaluator.evaluate(builder, null, myModel, null, 3, > GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1.0); > > > > Thanks for your help ! > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Mahout-Binary-Recommender-Evaluation-tp3202743p3202743.html > Sent from the Mahout User List mailing list archive at Nabble.com. >