Hi Sebastian Unfortunately, i still get the wrong data from the RecommenderJob 
after i clean everything, check the following recommend result part: 49 
[300420:5.0,312611:5.0,428914:5.0,208617:5.0,345206:5.0,411909:5.0,363683:5.0,248872:5.0,93087:5.0,494200:5.0]
 Now, look at the input data for user 49, item 312611, 428914, 208617, 345206, 
411909, 363683, 248872 and 494200 are wrong recommendation, nearly all of them 
are wrong, I hope i can send you the test data, but it will be 50M+ in size, 
can we discuss offline? Thank you very much. 49,409769,4
49,98795,4
49,262163,1
49,66009,4
49,414484,2
49,405329,3
49,312611,1
49,336441,4
49,136494,5
49,345206,3
49,479179,1
49,318960,4
49,52683,3
49,270840,3
49,264828,1
49,222390,4
49,456614,5
49,436207,5
49,306308,2
49,391582,5
49,494200,4
49,423328,3
49,112997,3
49,229347,5
49,474928,3
49,349350,1
49,208508,3
49,314397,2
49,14673,2
49,496041,4
49,301875,4
49,234234,1
49,325287,3
49,35756,5
49,365097,4
49,13376,4
49,333634,2
49,283494,5
49,208617,3
49,245390,1
49,221804,2
49,347821,3
49,138954,5
49,164206,5
49,72238,1
49,356632,1
49,452296,3
49,182288,5
49,499031,5
49,150727,4
49,240533,5
49,326081,4
49,220683,2
49,196527,2
49,177165,3
49,411709,5
49,360722,3
49,466310,1
49,160375,2
49,137203,5
49,32634,4
49,62134,5
49,96982,5
49,196951,1
49,304155,5
49,406109,4
49,244276,5
49,189552,1
49,442215,3
49,268806,2
49,364912,2
49,410896,5
49,450602,5
49,151703,1
49,248872,4
49,21684,1
49,41196,1
49,26614,2
49,369075,5
49,321916,1
49,325081,1
49,329877,4
49,344661,4
49,8429,3
49,69279,1
49,143695,1
49,229120,2
49,26298,4
49,54456,1
49,75937,4
49,87042,3
49,345383,5
49,363683,4
49,128047,3
49,234878,5
49,428914,3
49,353107,2
49,266850,4
49,421211,3
49,265739,4
49,303723,1
49,244575,4
49,303625,4
49,350481,5
49,63985,4
49,207327,3
49,397535,1
49,300916,5
49,358094,4
49,314919,5
49,309355,5
49,403169,5
49,90148,4
49,224056,4
49,359181,2
49,341927,5
49,436521,4
49,480682,4
49,315561,3
49,218647,5
49,245276,2
49,93189,1
49,204695,4
49,498350,5
49,155787,3
49,112730,3
49,416756,2
49,411909,4
49,253353,2
49,196663,5
49,40903,3
49,51873,2
49,66925,3
 > Date: Thu, 20 Oct 2011 18:40:38 +0200
> From: s...@apache.org
> To: user@mahout.apache.org
> Subject: Re: Recommend result contains item which user has already given 
> preference, is that correct?
> 
> To put it simplified:
> 
> The vector of recommendations is the sum of the similarity vectors for
> all preferred items. In each similarity vector for a preferred item the
> entry for that particular item is set to NaN.
> 
> That means that in the recommendation vector the entries for all
> preferred items will be NaN.
> 
> It's a neat trick that is unfortunately very hard to see in the code.
> 
> --sebastian
> 
> On 20.10.2011 18:36, WangRamon wrote:
> > 
> > Hi Sebastian
> > "But as the entry for the item itself is set to NaN in its similarityvector 
> > and NaN plus something stays always NaN, the predicted preferencefor an 
> > item that was already preferred is NaN. And the NaN entries aredropped 
> > later."
> > Wait a minute here, i can understand NaN plus something stays always NaN, 
> > but, how do you explain "the predicted preference for an item that was 
> > already preferred is NaN", where do you put the code to check an item that 
> > was already preferred? The only thing about NaN in 
> > SimilarityMatrixRowWrapperMapper is to say two item (A to A) has a 
> > similarity of NaN, am i right?
> > Thanks
> > Ramon
> >> Date: Thu, 20 Oct 2011 17:04:20 +0200
> >> From: s...@apache.org
> >> To: user@mahout.apache.org
> >> Subject: Re: Recommend result contains item which user has already given 
> >> preference, is that correct?
> >>
> >> On 20.10.2011 16:57, WangRamon wrote:
> >>>
> >>> Hi Sebastian and Sean 
> >>> Thanks for your help. 
> >>>
> >>> I re-read the code again (debug seems to be very difficult for me to 
> >>> setup the environment) and find the line in 
> >>> SimilarityMatrixRowWrapperMapper,  i past it below with the comments: 
> >>>     /* remove self similarity */ 
> >>>     similarityMatrixRow.set(key.get(), Double.NaN); 
> >>> I think the meanning is to mark the similarity between Item X and Item X 
> >>> (the identical one) as NaN, but it doesn't exclude Item X from 
> >>> recommendation, then in AggregateAndRecommendReducer, it uses 
> >>> simColumn.times(prefValue) as part of the formula to calculate the 
> >>> preferences for all items that similar to Item i (it could be Item X or 
> >>> some other item), then return the top 10 (default) for a user. 
> >>> During this process, i cannot see any code to exclude an item which the 
> >>> user has already given preference from recommendation. 
> >>
> >> It's a little bit hidden :) For each preferred item, a vector of all its
> >> similarities is added:
> >>
> >>       numerators = numerators == null
> >>           ? prefValue == BOOLEAN_PREF_VALUE ? simColumn.clone() :
> >> simColumn.times(prefValue)
> >>           : numerators.plus(prefValue == BOOLEAN_PREF_VALUE ? simColumn
> >> : simColumn.times(prefValue));
> >>
> >> But as the entry for the item itself is set to NaN in its similarity
> >> vector and NaN plus something stays always NaN, the predicted preference
> >> for an item that was already preferred is NaN. And the NaN entries are
> >> dropped later.
> >>
> >> --sebastian
> >>
> >>
> >>> Correct me if i miss something, thank you guys. 
> >>> Cheers Ramon
> >>>> Date: Thu, 20 Oct 2011 13:59:28 +0100
> >>>> Subject: Re: Recommend result contains item which user has already given 
> >>>> preference, is that correct?
> >>>> From: sro...@gmail.com
> >>>> To: user@mahout.apache.org
> >>>>
> >>>> Ah OK, figured as much. WangRamon does that answer your question
> >>>> and/or can you debug to see if this is happening, not happening for
> >>>> you in your use case?
> >>>>
> >>>> On Thu, Oct 20, 2011 at 1:42 PM, Sebastian Schelter <s...@apache.org> 
> >>>> wrote:
> >>>>> It's still included in SimilarityMatrixRowWrapperMapper. We also have a
> >>>>> unit test that checks whether a user is only recommended unknown items
> >>>>> which still works.
> >>>                                     
> >>
> >                                       
> 
                                          

Reply via email to