Daniel, can you plot two curves showing the distribution of interactions per user and the distribution of interactions per item? I think we need to get a better picture of your data first.
Generally I always recommend to use precomputed similarities. You can still serve new users with realtime recommendations, the only disadvantages are the higher complexity and a delayed inclusion of new items. --sebastian 2011/11/30 Sean Owen <[email protected]>: > The simple answer is that: > > Mahout absorbed a non-distributed recommender project called Taste, which > scales up to a point which may be sufficient for a lot of users. It > certainly is a lot simpler. Yes it is realistic to do near-real-time > recommendations, though it gets harder and harder and requires more tuning, > tradeoffs and optimization as this thread shows. > > The rest, written from scratch, is almost all distributed and Hadoop-based, > including distributed re-implementations of the same algorithms. > > On Wed, Nov 30, 2011 at 8:23 PM, Dan Beaulieu > <[email protected]>wrote: > >> Hi all, this is a tangent and can mostly be ignored by the people >> interested in this problem. >> >> I'm new to Machine Learning and especially Mahout. Following this >> discussion has made me a bit confused. >> Isn't Mahout used for large datasets where it makes sense to distribute the >> work? Why then isn't anyone pointing >> out that the problem may be the use of one single Mahout node? Is it >> because it's boolean based? Is it because the data set >> isn't really that large? >> >> Even if for whatever reason a single node will do for this case, is it >> really expected that the recommendation process would finish in less than >> half a second? >> This makes me think if that is the expectation then the data set is >> actually small and Mahout might be overkill... >> >> What obvious piece of the Mahout puzzle am I missing? >> >> Thanks. >> >> Dan >> >> On Wed, Nov 30, 2011 at 11:56 AM, Sean Owen <[email protected]> wrote: >> >> > Have you used CachingItemSimilarity? That will hold common similarities >> in >> > memory. It's a lot easier than pre-computing and might help. >> > >> > I think something like your change is a good one (Sebastian what do you >> > think) in that it gives you the ultimate lever to control how many >> > candidates are evaluated. That ought to make it go as fast as you like, >> but >> > it trades off quality. Still I'd be really surprised if there's no viable >> > middle ground -- this works fine at smaller scale, where 100s of >> candidates >> > are evaluated, perhaps, and you can use your lever to get to 100s of >> > candidates at your scale too. Is that still both slow and inaccurate? >> > >> > On Wed, Nov 30, 2011 at 3:18 PM, Daniel Zohar <[email protected]> >> wrote: >> > >> > > I just tested the app with Mahout 0.6. >> > > There seems to be a small performance improvement, but still >> > > recommendations for the 'heavy users' take between 1-5 seconds. >> > > >> > > >> > >>
