Re: ItemSimilarityJob's results differ from non-distributed version

Sebastian Schelter Sat, 26 Nov 2011 00:59:45 -0800

Hi Greg,

Can you give an example of two items that where similar in the
non-distributed case but did not appear in the distributed version?


A small tip on the side: For implicit data, you should also include
"negative" ratings as those still contain a lot of information about the
taste and will of engagement of the user. No need to use only the 3+
ratings.

--sebastian

On 25.11.2011 09:27, Greg H wrote:
> Hi Sebastian,
> 
> I converted the dataset by simply keeping all user/item pairs that had a
> rating of above 3. I'm also using GenericItemBasedRecommender's
> mostSimilarItems method instead of the recommend method to make
> recommendations.
> 
> I'm certainly open to suggestions on better evaluation metrics. I'm just
> using the top 5 because it was easy to implement.
> 
> Thanks,
> Greg
> 
> On Fri, Nov 25, 2011 at 4:03 PM, Sebastian Schelter <[email protected]> wrote:
> 
>> Hi Greg,
>>
>> You should get the same results, can you describe exactly how you
>> converted the dataset? I'd like to try this myself, maybe you found some
>> subtle bug.
>>
>> I also have doubts whether taking the precision of the top 5 recommended
>> items is really a good quality measure.
>>
>> --sebastian
>>
>> On 25.11.2011 02:41, Greg H wrote:
>>> Thanks for the replies Sebastian and Sean. I looked at the similarity
>>> values and they are the same, but ItemSimilarityJob is calculating fewer
>> of
>>> them. So it must be still doing some sort of sampling. I thought that I
>>> could force it to use all of the data by setting maxPrefsPerUser
>>> sufficiently large. Could there be another reason for it not to calculate
>>> all of the similarity values?
>>>
>>> I also tried to use a smaller amount of similarItemsPerItem but this
>> leads
>>> to worse results.
>>>
>>> Thanks again,
>>> Greg
>>>
>>
>>
>

Re: ItemSimilarityJob's results differ from non-distributed version

Reply via email to