Based on my quick understanding, while using item to item collaborative filtering, you use those items which have been rated by the user and find their similarity with the required item for computing the required item's similarity.
On the other hand when using user based collaborative filtering, you always use the same neighborhood set, irrespective of whether that user has rated the particular item or not. You check for this in the code, but your neighborhood set is always the same n members, where it may happen that no one has rated the item. To put it across mathematically, (let r denote the rating, s denote the similarity) item-item cf - r(user, item_i) = SUM_(all j rated by user) s(item_i, item_j) * r(user, item_j) user-user cf - r(user_i, item) = SUM_(all j in user_i neihborhood) s(user_i, user_j) * r(user_j, item) Why don't we use the same approach in user to user cf? Why not try to retrieve those users who have rated the particular item in question and then either take the k closest ones or based on some threshold? Or, why not use the same user to user logic in item-item cf? I know that you can argue users tend to be similar because of their general taste in items. And this similarity is not much determined by individual similar items. So, it makes more sense to use a constant neighborhood for users, irrespective of which item we are trying to rate. But, the same logic can be used when it comes to item-item cf. The items inherent features are determined by how many users rate it together, and blah blah.. I may be missing a simple point here, but I am unable to figure out why keep different implementations for the two? Please correct me if I am wrong in my observation and two codes are the same! Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/GenericUserBasedRecommender-vs-GenericItemBasedRecommender-tp1565019p1565019.html Sent from the Mahout User List mailing list archive at Nabble.com.
