Yes, it is due to the parallel algorithm which only looks at co-ratings from a given user.
On 27.11.2013 15:02, Amit Nithian wrote: > Thanks Sebastian! Is there a particular reason for that? > On Nov 27, 2013 7:47 AM, "Sebastian Schelter" <ssc.o...@googlemail.com> > wrote: > >> Hi Amit, >> >> You are right, the non-corated items are not filtered out in the >> distributed implementation. >> >> --sebastian >> >> >> On 26.11.2013 20:51, Amit Nithian wrote: >>> Hi all, >>> >>> Apologies if this is a repeat question as I just joined the list but I >> have >>> a question about the way that metrics like Cosine and Pearson are >>> calculated in Hadoop "mode" (i.e. non Taste). >>> >>> As far as I understand, the vectors used for computing pairwise item >>> similarity in Taste are based on the co-rated items; however, in the >> Hadoop >>> implementation, I don't see this done. >>> >>> The implementation of the distributed item-item similarity comes from >> this >>> paper http://ssc.io/wp-content/uploads/2012/06/rec11-schelter.pdf. I >> didn't >>> see anything in this paper about filtering out those elements from the >>> vectors not co-rated and this can make a difference especially when you >>> normalize the ratings by dividing by the average item rating. In some >>> cases, the # users to divide by can be fewer depending on the sparseness >> of >>> the vector. >>> >>> Any clarity on this would be helpful. >>> >>> Thanks! >>> Amit >>> >> >> >