Re: Mahout beginner questions...

Ted Dunning Sun, 25 Mar 2012 12:36:04 -0700

Not really.  See my previous posting.

The best way to get fast recommendations is to use an item-based
recommender.  Pre-computing recommendations for all users is not usually a
win because you wind up doing a lot of wasted work and you still don't have
anything for new users who appear between refreshes.  If you build up a
service to handle the new users, you might as well just serve all users
from that service so that you get up to date recommendations for everyone.


There IS a large off-line computation.  But that doesn't produce
recommendations for USER's.  It typically produces recommendations for
ITEM's.  Then those item-item recommendations are combined to produce
recommendations for users.

On Sun, Mar 25, 2012 at 12:28 PM, Razon, Oren <oren.ra...@intel.com> wrote:

> Correct me if I'm wrong but a good way to boost up speed could be to use
> caching recommender, meaning computing the recommendations in advanced
> (refresh it every X min\hours) and always recommend using the most updated
> recommendations, right?!
>
> -----Original Message-----
> From: Sean Owen [mailto:sro...@gmail.com]
> Sent: Sunday, March 25, 2012 21:25
> To: user@mahout.apache.org
> Subject: Re: Mahout beginner questions...
>
> It is memory. You will need a pretty large heap to put 100M data in memory
> -- probably 4GB, if not a little more (so the machine would need 8GB+ RAM).
> You can go bigger if you have more memory but that size seems about the
> biggest to reasonably assume people have.
>
> Of course more data slows things down and past about 10M data points you
> need to tune things to sample data rather than try every possibility. This
> is most of what CandidateItemStrategy has to do with. It is relatively easy
> to tune this though so speed doesn't have to ben an issue.
>
> Again you can go bigger and tune it to down-sample more; somehow I stil
> believe that 100M is a crude but useful rule of thumb, as to the point
> beyond which it's just hard to get good speed and quality.
>
> Sean
>
> On Sun, Mar 25, 2012 at 2:04 PM, Razon, Oren <oren.ra...@intel.com> wrote:
>
> > Thanks for the detailed answer Sean.
> > I want to understand more clearly the non-distributed code limitations.
> > I saw that you advise that for more than 100,000,000 ratings the
> > non-distributed engine won't do the job.
> > The question is why? Is it memory issue (and then if I will have a bigger
> > machine, meaning I could scale up), or is it because of the
> recommendation
> > time it takes?
> >
> >
> ---------------------------------------------------------------------
> Intel Electronics Ltd.
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>

Re: Mahout beginner questions...

Reply via email to