Re: Problems with Mahout's RecommenderIRStatsEvaluator

Pat Ferrel Sun, 17 Feb 2013 10:34:09 -0800

Time splits are fine but may contain anomalies that bias the data. If you are 
going to compare two recommenders based on time splits, make sure the data is 
exactly the same for each recommender. One time split we did to create a 90-10 
training to test set had a split date of 12/24!  Some form of random hold-out 
will be less prone to time based systematic variation like seasonality, 
holidays, day of week, and the like. Stay with the same data when comparing and 
at least the tests will vary together.

We still use time based splits, partly for the reasons Ted mentions but knowing 
the limitations is always good.

On Feb 16, 2013, at 3:12 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:

There are a variety of common time based effects which make time splits best in 
many practical cases.  Having the training data all be from the past emulates 
this better than random splits. 

For one thing, you can have the same user under different names in training and 
test.  For another thing, in real life you get data from the past of the user 
under consideration. As a third consideration, topical events can influence all 
users in common.  

These all mean that random training splits can have very large error in 
estimated performance. 

Sent from my iPhone

On Feb 16, 2013, at 1:41 PM, Tevfik Aytekin <tevfik.ayte...@gmail.com> wrote:

> What I mean is you can choose ratings randomly and try to recommend
> the ones above  the threshold
> 
> On Sat, Feb 16, 2013 at 10:32 PM, Sean Owen <sro...@gmail.com> wrote:
>> Sure, if you were predicting ratings for one movie given a set of ratings
>> for that movie and the ratings for many other movies. That isn't what the
>> recommender problem is. Here, the problem is to list N movies most likely
>> to be top-rated. The precision-recall test is, in turn, a test of top N
>> results, not a test over prediction accuracy. We aren't talking about RMSE
>> here or even any particular means of generating top N recommendations. You
>> don't even have to predict ratings to make a top N list.
>> 
>> 
>> On Sat, Feb 16, 2013 at 9:28 PM, Tevfik Aytekin 
>> <tevfik.ayte...@gmail.com>wrote:
>> 
>>> No, rating prediction is clearly a supervised ML problem
>>> 
>>> On Sat, Feb 16, 2013 at 10:15 PM, Sean Owen <sro...@gmail.com> wrote:
>>>> This is a good answer for evaluation of supervised ML, but, this is
>>>> unsupervised. Choosing randomly is choosing the 'right answers' randomly,
>>>> and that's plainly problematic.
>>>> 
>>>> 
>>>> On Sat, Feb 16, 2013 at 8:53 PM, Tevfik Aytekin <
>>> tevfik.ayte...@gmail.com>wrote:
>>>> 
>>>>> I think, it is better to choose ratings of the test user in a random
>>>>> fashion.
>>>>> 
>>>>> On Sat, Feb 16, 2013 at 9:37 PM, Sean Owen <sro...@gmail.com> wrote:
>>>>>> Yes. But: the test sample is small. Using 40% of your data to test is
>>>>>> probably quite too much.
>>>>>> 
>>>>>> My point is that it may be the least-bad thing to do. What test are
>>> you
>>>>>> proposing instead, and why is it coherent with what you're testing?
>>>>>> 
>>>>> 
>>>

Re: Problems with Mahout's RecommenderIRStatsEvaluator

Reply via email to