Ok Sebastian, I will try Mahout 0.6 next week, i believe it's from trunk, 
right? Have a nice day/weekend!   Cheers Ramon
 > Date: Fri, 21 Oct 2011 09:06:50 +0200
> From: s...@apache.org
> To: user@mahout.apache.org
> Subject: Re: Recommend result contains item which user has already given 
> preference, is that correct?
> 
> As I already said multiple times, please use Mahout 0.6. It contains bug
> fixes and performance improvements for this particular job.
> 
> --sebastian
> 
> On 21.10.2011 09:04, WangRamon wrote:
> > 
> > Hi Sebastian I made the following change to resolve the issue in my local, 
> > it's in Mahout 0.5, maybe i were wrong, but the test result is correct: 1) 
> > I add a "int itemIdIndex" property with getter/setter methods in class 
> > PrefAndSimilarityColumnWritable, it will hold the item index for which the 
> > "prefValue" in this class is for.  2) Add 
> > "prefAndSimilarityColumn.setItemIdIndex(key.get());" in class 
> > PartialMultiplyMapper line 51 to set the item index property created in 
> > step 1.  3) In class AggregateAndRecommendReducer, add the following code 
> > in line 147:       // item which user has already given preference
> >       int itemIdIndex = prefAndSimilarityColumn.getItemIdIndex();
> >       // exclude item user has already given preference
> >       simColumn.set(itemIdIndex, Double.NaN);  This will make the specific 
> > index value in the sim column as NaN for item that user has already given 
> > preference, then later plus or multiply this vector will also get a NaN 
> > value in that specific item index, so i exclude the items which user has 
> > already shown preference from recommendation. 4) At line 173 of the same 
> > class AggregateAndRecommendReducer, add a check to make the prediction 
> > value as NaN for those items user has given preference:        double 
> > prediction = Double.NaN;
> >      if (!Double.isNaN(element.get())) {
> >       prediction = element.get() / denominators.getQuick(itemIDIndex);
> >      }
> >  Then, i get the correct recommendation, I have thought it carefully, 
> > but... maybe wrong, glad to hear your idea, and again, thank you very much. 
> >  CheersRamon> From: ramon_w...@hotmail.com
> >> To: user@mahout.apache.org
> >> Subject: RE: Recommend result contains item which user has already given 
> >> preference, is that correct?
> >> Date: Fri, 21 Oct 2011 10:01:12 +0800
> >>
> >>
> >> Hi Sebastian Unfortunately, i still get the wrong data from the 
> >> RecommenderJob after i clean everything, check the following recommend 
> >> result part: 49 
> >> [300420:5.0,312611:5.0,428914:5.0,208617:5.0,345206:5.0,411909:5.0,363683:5.0,248872:5.0,93087:5.0,494200:5.0]
> >>  Now, look at the input data for user 49, item 312611, 428914, 208617, 
> >> 345206, 411909, 363683, 248872 and 494200 are wrong recommendation, nearly 
> >> all of them are wrong, I hope i can send you the test data, but it will be 
> >> 50M+ in size, can we discuss offline? Thank you very much. 49,409769,4
> >> 49,98795,4
> >> 49,262163,1
> >> 49,66009,4
> >> 49,414484,2
> >> 49,405329,3
> >> 49,312611,1
> >> 49,336441,4
> >> 49,136494,5
> >> 49,345206,3
> >> 49,479179,1
> >> 49,318960,4
> >> 49,52683,3
> >> 49,270840,3
> >> 49,264828,1
> >> 49,222390,4
> >> 49,456614,5
> >> 49,436207,5
> >> 49,306308,2
> >> 49,391582,5
> >> 49,494200,4
> >> 49,423328,3
> >> 49,112997,3
> >> 49,229347,5
> >> 49,474928,3
> >> 49,349350,1
> >> 49,208508,3
> >> 49,314397,2
> >> 49,14673,2
> >> 49,496041,4
> >> 49,301875,4
> >> 49,234234,1
> >> 49,325287,3
> >> 49,35756,5
> >> 49,365097,4
> >> 49,13376,4
> >> 49,333634,2
> >> 49,283494,5
> >> 49,208617,3
> >> 49,245390,1
> >> 49,221804,2
> >> 49,347821,3
> >> 49,138954,5
> >> 49,164206,5
> >> 49,72238,1
> >> 49,356632,1
> >> 49,452296,3
> >> 49,182288,5
> >> 49,499031,5
> >> 49,150727,4
> >> 49,240533,5
> >> 49,326081,4
> >> 49,220683,2
> >> 49,196527,2
> >> 49,177165,3
> >> 49,411709,5
> >> 49,360722,3
> >> 49,466310,1
> >> 49,160375,2
> >> 49,137203,5
> >> 49,32634,4
> >> 49,62134,5
> >> 49,96982,5
> >> 49,196951,1
> >> 49,304155,5
> >> 49,406109,4
> >> 49,244276,5
> >> 49,189552,1
> >> 49,442215,3
> >> 49,268806,2
> >> 49,364912,2
> >> 49,410896,5
> >> 49,450602,5
> >> 49,151703,1
> >> 49,248872,4
> >> 49,21684,1
> >> 49,41196,1
> >> 49,26614,2
> >> 49,369075,5
> >> 49,321916,1
> >> 49,325081,1
> >> 49,329877,4
> >> 49,344661,4
> >> 49,8429,3
> >> 49,69279,1
> >> 49,143695,1
> >> 49,229120,2
> >> 49,26298,4
> >> 49,54456,1
> >> 49,75937,4
> >> 49,87042,3
> >> 49,345383,5
> >> 49,363683,4
> >> 49,128047,3
> >> 49,234878,5
> >> 49,428914,3
> >> 49,353107,2
> >> 49,266850,4
> >> 49,421211,3
> >> 49,265739,4
> >> 49,303723,1
> >> 49,244575,4
> >> 49,303625,4
> >> 49,350481,5
> >> 49,63985,4
> >> 49,207327,3
> >> 49,397535,1
> >> 49,300916,5
> >> 49,358094,4
> >> 49,314919,5
> >> 49,309355,5
> >> 49,403169,5
> >> 49,90148,4
> >> 49,224056,4
> >> 49,359181,2
> >> 49,341927,5
> >> 49,436521,4
> >> 49,480682,4
> >> 49,315561,3
> >> 49,218647,5
> >> 49,245276,2
> >> 49,93189,1
> >> 49,204695,4
> >> 49,498350,5
> >> 49,155787,3
> >> 49,112730,3
> >> 49,416756,2
> >> 49,411909,4
> >> 49,253353,2
> >> 49,196663,5
> >> 49,40903,3
> >> 49,51873,2
> >> 49,66925,3
> >>  > Date: Thu, 20 Oct 2011 18:40:38 +0200
> >>> From: s...@apache.org
> >>> To: user@mahout.apache.org
> >>> Subject: Re: Recommend result contains item which user has already given 
> >>> preference, is that correct?
> >>>
> >>> To put it simplified:
> >>>
> >>> The vector of recommendations is the sum of the similarity vectors for
> >>> all preferred items. In each similarity vector for a preferred item the
> >>> entry for that particular item is set to NaN.
> >>>
> >>> That means that in the recommendation vector the entries for all
> >>> preferred items will be NaN.
> >>>
> >>> It's a neat trick that is unfortunately very hard to see in the code.
> >>>
> >>> --sebastian
> >>>
> >>> On 20.10.2011 18:36, WangRamon wrote:
> >>>>
> >>>> Hi Sebastian
> >>>> "But as the entry for the item itself is set to NaN in its 
> >>>> similarityvector and NaN plus something stays always NaN, the predicted 
> >>>> preferencefor an item that was already preferred is NaN. And the NaN 
> >>>> entries aredropped later."
> >>>> Wait a minute here, i can understand NaN plus something stays always 
> >>>> NaN, but, how do you explain "the predicted preference for an item that 
> >>>> was already preferred is NaN", where do you put the code to check an 
> >>>> item that was already preferred? The only thing about NaN in 
> >>>> SimilarityMatrixRowWrapperMapper is to say two item (A to A) has a 
> >>>> similarity of NaN, am i right?
> >>>> Thanks
> >>>> Ramon
> >>>>> Date: Thu, 20 Oct 2011 17:04:20 +0200
> >>>>> From: s...@apache.org
> >>>>> To: user@mahout.apache.org
> >>>>> Subject: Re: Recommend result contains item which user has already 
> >>>>> given preference, is that correct?
> >>>>>
> >>>>> On 20.10.2011 16:57, WangRamon wrote:
> >>>>>>
> >>>>>> Hi Sebastian and Sean 
> >>>>>> Thanks for your help. 
> >>>>>>
> >>>>>> I re-read the code again (debug seems to be very difficult for me to 
> >>>>>> setup the environment) and find the line in 
> >>>>>> SimilarityMatrixRowWrapperMapper,  i past it below with the comments: 
> >>>>>>     /* remove self similarity */ 
> >>>>>>     similarityMatrixRow.set(key.get(), Double.NaN); 
> >>>>>> I think the meanning is to mark the similarity between Item X and Item 
> >>>>>> X (the identical one) as NaN, but it doesn't exclude Item X from 
> >>>>>> recommendation, then in AggregateAndRecommendReducer, it uses 
> >>>>>> simColumn.times(prefValue) as part of the formula to calculate the 
> >>>>>> preferences for all items that similar to Item i (it could be Item X 
> >>>>>> or some other item), then return the top 10 (default) for a user. 
> >>>>>> During this process, i cannot see any code to exclude an item which 
> >>>>>> the user has already given preference from recommendation. 
> >>>>>
> >>>>> It's a little bit hidden :) For each preferred item, a vector of all its
> >>>>> similarities is added:
> >>>>>
> >>>>>       numerators = numerators == null
> >>>>>           ? prefValue == BOOLEAN_PREF_VALUE ? simColumn.clone() :
> >>>>> simColumn.times(prefValue)
> >>>>>           : numerators.plus(prefValue == BOOLEAN_PREF_VALUE ? simColumn
> >>>>> : simColumn.times(prefValue));
> >>>>>
> >>>>> But as the entry for the item itself is set to NaN in its similarity
> >>>>> vector and NaN plus something stays always NaN, the predicted preference
> >>>>> for an item that was already preferred is NaN. And the NaN entries are
> >>>>> dropped later.
> >>>>>
> >>>>> --sebastian
> >>>>>
> >>>>>
> >>>>>> Correct me if i miss something, thank you guys. 
> >>>>>> Cheers Ramon
> >>>>>>> Date: Thu, 20 Oct 2011 13:59:28 +0100
> >>>>>>> Subject: Re: Recommend result contains item which user has already 
> >>>>>>> given preference, is that correct?
> >>>>>>> From: sro...@gmail.com
> >>>>>>> To: user@mahout.apache.org
> >>>>>>>
> >>>>>>> Ah OK, figured as much. WangRamon does that answer your question
> >>>>>>> and/or can you debug to see if this is happening, not happening for
> >>>>>>> you in your use case?
> >>>>>>>
> >>>>>>> On Thu, Oct 20, 2011 at 1:42 PM, Sebastian Schelter <s...@apache.org> 
> >>>>>>> wrote:
> >>>>>>>> It's still included in SimilarityMatrixRowWrapperMapper. We also 
> >>>>>>>> have a
> >>>>>>>> unit test that checks whether a user is only recommended unknown 
> >>>>>>>> items
> >>>>>>>> which still works.
> >>>>>>                                          
> >>>>>
> >>>>                                            
> >>>
> >>                                      
> >                                       
> 
                                          

Reply via email to