Not really. See my previous posting. The best way to get fast recommendations is to use an item-based recommender. Pre-computing recommendations for all users is not usually a win because you wind up doing a lot of wasted work and you still don't have anything for new users who appear between refreshes. If you build up a service to handle the new users, you might as well just serve all users from that service so that you get up to date recommendations for everyone.
There IS a large off-line computation. But that doesn't produce recommendations for USER's. It typically produces recommendations for ITEM's. Then those item-item recommendations are combined to produce recommendations for users. On Sun, Mar 25, 2012 at 12:28 PM, Razon, Oren <oren.ra...@intel.com> wrote: > Correct me if I'm wrong but a good way to boost up speed could be to use > caching recommender, meaning computing the recommendations in advanced > (refresh it every X min\hours) and always recommend using the most updated > recommendations, right?! > > -----Original Message----- > From: Sean Owen [mailto:sro...@gmail.com] > Sent: Sunday, March 25, 2012 21:25 > To: user@mahout.apache.org > Subject: Re: Mahout beginner questions... > > It is memory. You will need a pretty large heap to put 100M data in memory > -- probably 4GB, if not a little more (so the machine would need 8GB+ RAM). > You can go bigger if you have more memory but that size seems about the > biggest to reasonably assume people have. > > Of course more data slows things down and past about 10M data points you > need to tune things to sample data rather than try every possibility. This > is most of what CandidateItemStrategy has to do with. It is relatively easy > to tune this though so speed doesn't have to ben an issue. > > Again you can go bigger and tune it to down-sample more; somehow I stil > believe that 100M is a crude but useful rule of thumb, as to the point > beyond which it's just hard to get good speed and quality. > > Sean > > On Sun, Mar 25, 2012 at 2:04 PM, Razon, Oren <oren.ra...@intel.com> wrote: > > > Thanks for the detailed answer Sean. > > I want to understand more clearly the non-distributed code limitations. > > I saw that you advise that for more than 100,000,000 ratings the > > non-distributed engine won't do the job. > > The question is why? Is it memory issue (and then if I will have a bigger > > machine, meaning I could scale up), or is it because of the > recommendation > > time it takes? > > > > > --------------------------------------------------------------------- > Intel Electronics Ltd. > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. >