Re: Question about evaluating a Recommender System

Sean Owen Wed, 08 May 2013 12:33:41 -0700

Ah, yes that's right. Yes if you have a lot of these values, the test
is really not valid. It may look 'better' but isn't for just this
reason. You want to make sure the result doesn't have many of these or
else you would discard it. Look for log lines like "Unable to
recommend in X cases"


On Wed, May 8, 2013 at 8:00 PM, Zhongduo Lin <zhong...@gmail.com> wrote:
> This accounts for why a neighborhood size of 2 always gives me the best
> result. Thank you!
>
>
> Best Regards,
> Jimmy
>
> Zhongduo Lin (Jimmy)
> MASc candidate in ECE department
> University of Toronto
>
> On 2013-05-08 2:40 PM, Alejandro Bellogin Kouki wrote:
>>
>> AFAIK, the recommender would predict a NaN, which will be ignored by the
>> evaluator.
>>
>> However, I am not sure if there is any way to know how many of these
>> were actually produced in the evaluation step, that is, something like
>> the count of predictions with a NaN value.
>>
>> Cheers,
>> Alex
>>
>> Zhongduo Lin escribió:
>>>
>>> Thank you for the quick response.
>>>
>>> I agree that a neighborhood size of 2 will make the predictions more
>>> sensible. But my concern is that a neighborhood size of 2 can only
>>> predict a very small proportion of preference for each users. Let's
>>> take a look at the previous example,  how can it predict item 4 if
>>> item 4 happens to be chosen as in the test set? I think this is quite
>>> common in my case as well as for Amazon or eBay, since the rating is
>>> very sparse. So I just don't know how it can still be run.
>>>
>>> User 1                rated item 1, 2, 3, 4
>>> neighbour1 of user 1  rated item 1, 2
>>> neighbour2 of user 1  rated item 1, 3
>>>
>>>
>>> I wouldn't expect that the Root Mean Square error will have different
>>> performance than the Absolute difference, since in that case most of
>>> the predictions are close to 1, resulting a near zero error no matter
>>> I am using absolute difference or RMSE. How can I say "RMSE is worse
>>> relative to the variance of the data set" using Mahout? Unfortunately
>>> I got an error using the precision and recall evaluation method, I
>>> guess that's because the data are too sparse.
>>>
>>> Best Regards,
>>> Jimmy
>>>
>>>
>>> On 13-05-08 10:05 AM, Sean Owen wrote:
>>>>
>>>> It may be true that the results are best with a neighborhood size of
>>>> 2. Why is that surprising? Very similar people, by nature, rate
>>>> similar things, which makes the things you held out of a user's test
>>>> set likely to be found in the recommendations.
>>>>
>>>> The mapping you suggest is not that sensible, yes, since almost
>>>> everything maps to 1. Not surprisingly, most of your predictions are
>>>> near 1. That's "better" in an absolute sense, but RMSE is worse
>>>> relative to the variance of the data set. This is not a good mapping
>>>> -- or else, RMSE is not a very good metric, yes. So, don't do one of
>>>> those two things.
>>>>
>>>> Try mean average precision for a metric that is not directly related
>>>> to the prediction values.
>>>>
>>>> On Wed, May 8, 2013 at 2:45 PM, Zhongduo Lin <zhong...@gmail.com> wrote:
>>>>>
>>>>> Thank you for your reply.
>>>>>
>>>>> I think the evaluation process involves randomly choosing the
>>>>> evaluation
>>>>> proportion. The problem is that I always get the best result when I set
>>>>> neighbors to 2, which seems unreasonable to me. Since there should
>>>>> be many
>>>>> test case that the recommender system couldn't predict at all. So
>>>>> why did I
>>>>> still get a valid result? How does Mahout handle this case?
>>>>>
>>>>> Sorry I didn't make myself clear for the second question. Here is the
>>>>> problem: I have a set of inferred preference ranging from 0 to 1000.
>>>>> But I
>>>>> want to map it to 1 - 5. So there can be many ways for mapping.
>>>>> Let's take a
>>>>> simple example, if the mapping rule is like the following:
>>>>>          if (inferred_preference < 995) preference = 1;
>>>>>          else preference = inferred_preference - 995.
>>>>>
>>>>> You can see that this is a really bad mapping algorithms, but if we
>>>>> run the
>>>>> generated preference to Mahout, it is going to give me a really nice
>>>>> result
>>>>> because most of the preference is 1. So is there any other metric to
>>>>> evaluate this?
>>>>>
>>>>>
>>>>> Any help will be highly appreciated.
>>>>>
>>>>> Best Regards,
>>>>> Jimmy
>>>>>
>>>>>
>>>>> Zhongduo Lin (Jimmy)
>>>>> MASc candidate in ECE department
>>>>> University of Toronto
>>>>>
>>>>>
>>>>> On 2013-05-08 4:44 AM, Sean Owen wrote:
>>>>>>
>>>>>> It is true that a process based on user-user similarity only won't be
>>>>>> able to recommend item 4 in this example. This is a drawback of the
>>>>>> algorithm and not something that can be worked around. You could try
>>>>>> not to choose this item in the test set, but then that does not quite
>>>>>> reflect reality in the test.
>>>>>>
>>>>>> If you just mean that compressing the range of pref values improves
>>>>>> RMSE in absolute terms, yes it does of course. But not in relative
>>>>>> terms. There is nothing inherently better or worse about a small range
>>>>>> in this example.
>>>>>>
>>>>>> RMSE is a fine eval metric, but you can also considered mean average
>>>>>> precision.
>>>>>>
>>>>>> Sean
>>>
>>>
>>>
>>
>

Re: Question about evaluating a Recommender System

Reply via email to