User vs. Item performance

2011-10-26 Thread Grant Ingersoll
I seem to recall past discussions on where one hits the bottleneck w/ user based recommendation approaches in Mahout, but I can't seem to locate it anymore. Anyone know off hand? Where do user based approaches hit their limits, more or less? Thanks, Grant

Re: User vs. Item performance

2011-10-26 Thread Sean Owen
Limits in terms of scalability? If you mean, how much can you fit on one machine without Hadoop, I usually say 100M data points or so. Beyond that you can go as big as you like, but on Hadoop. On Wed, Oct 26, 2011 at 1:56 PM, Grant Ingersoll gsing...@apache.org wrote: I seem to recall past

Re: User vs. Item performance

2011-10-26 Thread Grant Ingersoll
Sorry, should have been more clear. I was referring to if one is using a user based recommender (e.g GenericUserBasedRecommender) vs. item based recommender. Our general recommendation is that user based approaches won't scale, I was wondering what the general cutoff is on a single machine,

Re: User vs. Item performance

2011-10-26 Thread Sean Owen
Yes, I would still say so. You could still easily find this too slow if you're using user-user similarities and there are a lot of users and few items behind these 100M data points. Or vice versa. Past this point it's almost certainly too slow; before this point it could also be slow. You would

Re: User vs. Item performance

2011-10-26 Thread Ted Dunning
Item based recommendations can also use more expensive off-line computations which can make recommendations more accurate. SVD based methods in particular can be very useful especially which smaller data sets. On Wed, Oct 26, 2011 at 6:52 AM, Sean Owen sro...@gmail.com wrote: Yes, I would