Hi guys, I wonder if Mahout should have a "precision and recall" evaluator that calculates the relevant items data set without looking to the relevance threshold. This would be suitable for data sets with boolean preference nature. In addition, the relevant items can be removed from the training data set by random (removing first couple of preferred items every time wouldn't be a great idea).
On the other hand, having relevance threshold with RecommenderIRStatsEvaluator set to 1.0 removes exactly "at" number of items. As the recommender returns that number of items, the precision and recall would have the same value. Is this Ok or is it a bug, given that precision = intersection / num_recommended_items (where num_recommended_items is almost always "at") recall = intersection / num_relevant_items (also "at" as the previously mentioned why relevanceThreshold is 1.0)? -- Marko Ćirić [email protected]
