Re: Mahout performance issues

Sean Owen Thu, 01 Dec 2011 14:51:54 -0800

(Agree, and the sampling happens at the user level now -- so if you sample
one of these users, it slows down a lot. The spirit of the proposed change
is to make sampling more fine-grained, at the individual item level. That
seems to certainly fix this.)


On Thu, Dec 1, 2011 at 10:46 PM, Ted Dunning <[email protected]> wrote:

> This may or may not help much.  My guess is that the improvement will be
> very modest.
>
> The most serious problem is going to be recommendations for anybody who has
> rated one of these excessively popular items.  That item will bring in a
> huge number of other users and thus a huge number of items to consider.  If
> you down-sample ratings of the prolific users and kill super-common items,
> I think you will see much more improvement than simply eliminating the
> singleton users.
>
> The basic issue is that cooccurrence based algorithms have run-time
> proportional to O(n_max^2) where n_max is the maximum number of items per
> user.
>
> On Thu, Dec 1, 2011 at 2:35 PM, Daniel Zohar <[email protected]> wrote:
>
> > This is why I'm looking now into improving GenericBooleanPrefDataModel to
> > not take into account users which made one interaction under the
> > 'preferenceForItems' Map. What do you think about this approach?
> >
>

Re: Mahout performance issues

Reply via email to