Re: Mahout performance issues

Dan Beaulieu Wed, 30 Nov 2011 12:24:21 -0800

Hi all, this is a tangent and can mostly be ignored by the people
interested in this problem.

I'm new to Machine Learning and especially Mahout. Following this
discussion has made me a bit confused.
Isn't Mahout used for large datasets where it makes sense to distribute the
work? Why then isn't anyone pointing
out that the problem may be the use of one single Mahout node? Is it
because it's boolean based? Is it because the data set
isn't really that large?

Even if for whatever reason a single node will do for this case, is it
really expected that the recommendation process would finish in less than
half a second?
This makes me think if that is the expectation then the data set is
actually small and Mahout might be overkill...

What obvious piece of the Mahout puzzle am I missing?

Thanks.

Dan

On Wed, Nov 30, 2011 at 11:56 AM, Sean Owen <sro...@gmail.com> wrote:

> Have you used CachingItemSimilarity? That will hold common similarities in
> memory. It's a lot easier than pre-computing and might help.
>
> I think something like your change is a good one (Sebastian what do you
> think) in that it gives you the ultimate lever to control how many
> candidates are evaluated. That ought to make it go as fast as you like, but
> it trades off quality. Still I'd be really surprised if there's no viable
> middle ground -- this works fine at smaller scale, where 100s of candidates
> are evaluated, perhaps, and you can use your lever to get to 100s of
> candidates at your scale too. Is that still both slow and inaccurate?
>
> On Wed, Nov 30, 2011 at 3:18 PM, Daniel Zohar <disso...@gmail.com> wrote:
>
> > I just tested the app with Mahout 0.6.
> > There seems to be a small performance improvement, but still
> > recommendations for the 'heavy users' take between 1-5 seconds.
> >
> >
>

Re: Mahout performance issues

Reply via email to