It is very common that preferences or ratings DECREASE recommendation performance.
The basic reason is that there is little or no real signal in the ratings after you account for the fact that the rating exists at all. In practice, there is the additional reason that if you don't need a rating, you can use implicit feedback which typically is 20-100x more common than rating data. Ratings start off not so great and then with a huge deficit in data volume, they have no chance. On Thu, Mar 29, 2012 at 2:52 PM, ziad kamel <ziad.kame...@gmail.com> wrote: > OK, things become more clear . > > Will the Top items selected be same when changing the similarity ? Or > it does matter ? > > When using Pearson similarity that use the preference I got a > precision of 10% when using CityBlockSimilarity I got 50% . How come > when we neglect the preferences I got higher precision? > > > > On Thu, Mar 29, 2012 at 4:37 PM, Sean Owen <sro...@gmail.com> wrote: > > Ah OK. The key piece you are missing is that this similarity is assuming > > that all vector values are 0 (not present) or 1 (present). Every > dimension > > either contributes 1 to the distance (one value is 0 and the other is 1) > or > > 0 to the distance (both are 0, or both are 1). The distance is therefore > > the "XOR" of the data: the number of dimension in which they differ. The > > "XOR" of set A and B is the their union minus twice their intersection, > > hence the formula. >