Thanks Sean. That makes sense. -- Young
At 2010-08-31 00:17:08,"Sean Owen" <[email protected]> wrote: >You can do whatever you like if it works for you, but this sounds >wrong to me. Yes you got more recommendations, but are those last >recommendations actually good ones? The algorithm may be "telling you" >there's not enough information to be sure about recommending many >items. > >A neighborhood of hundreds of users is very large. It's such a crowd, >that much of the neighborhood is undoubtedly "far" from the user. Yes, >those are the nearest 1000 users, but perhaps 20 of them are really >similar and the other 980 are introducing increasingly more noise in >the computation. > >I would actually suggest you use a threshold-based neighborhood >definition. The cutoff value depends on your similarity metric. If you >use Pearson... maybe 0.5 or so? > >Yes, you may get fewer recommendations, but maybe that's good. > >(Another plug: if you are interested in this tradeoff, and evaluating >metrics and such, this is all written up pretty thoroughly in Mahout >in Action: http://manning.com/owen/) > >2010/8/30 Young <[email protected]>: >> Hi Sean, >> Thanks. When I expand the neighborsize into 1000, there are 80 items in >> common when giving 500 recommendations. That's quite reasonable and accepted. >> >> -- Young >> >> >> >> >> At 2010-08-30 23:55:15,"Sean Owen" <[email protected]> wrote: >> >>>That result is quite possible. For example, with a user-based >>>recommender, the only items that can possibly be recommended are those >>>in the user's neighborhood. If the neighborhood is small, it's >>>possible that only 23 unique items exist among users in that >>>neighborhood. You can never get more recommendations than this. >>> >>>I don't think this result is "bad" per se, but if you want to try to >>>get more recommendations, you really need more 'dense' data. Or, >>>another algorithm may have different properties that are more >>>desirable to you. Try SlopeOneRecommender. >>> >>>2010/8/30 Young <[email protected]>: >>>> Hi all, >>>> Based on 1M grouplens data, I tried to use user-based recommender and >>>> item-based recommender to give same user the recommendations. But the >>>> results vary so much. There are 4302 items in dataModel. For user 3 or 8, >>>> when returning 500 recommendeditems, there are only 23 items are in common. >>>> In itembased recommender, I use PearsonCorrelationSimilarity. >>>> In userbased recommender, I use NearestNNeighborhood (size 100), >>>> PearsonCorrelationSimilarity. >>>> Should these results be accepted? Or what should I do to improve this >>>> situation? >>>> >>>> Thank you very much. >>>> >>>> -- Young >>>> >>
