Thanks Sebastian! Is there a particular reason for that? On Nov 27, 2013 7:47 AM, "Sebastian Schelter" <ssc.o...@googlemail.com> wrote:
> Hi Amit, > > You are right, the non-corated items are not filtered out in the > distributed implementation. > > --sebastian > > > On 26.11.2013 20:51, Amit Nithian wrote: > > Hi all, > > > > Apologies if this is a repeat question as I just joined the list but I > have > > a question about the way that metrics like Cosine and Pearson are > > calculated in Hadoop "mode" (i.e. non Taste). > > > > As far as I understand, the vectors used for computing pairwise item > > similarity in Taste are based on the co-rated items; however, in the > Hadoop > > implementation, I don't see this done. > > > > The implementation of the distributed item-item similarity comes from > this > > paper http://ssc.io/wp-content/uploads/2012/06/rec11-schelter.pdf. I > didn't > > see anything in this paper about filtering out those elements from the > > vectors not co-rated and this can make a difference especially when you > > normalize the ratings by dividing by the average item rating. In some > > cases, the # users to divide by can be fewer depending on the sparseness > of > > the vector. > > > > Any clarity on this would be helpful. > > > > Thanks! > > Amit > > > >