Re: Mahout performance issues

Sean Owen Fri, 02 Dec 2011 10:20:13 -0800

On Fri, Dec 2, 2011 at 6:07 PM, Daniel Zohar <[email protected]> wrote:


> > I definitely agree that the correctness should not be broken. My solution
> is not meant to decrease the number of possible items like you stated in
> your example. It was meant to reduce the amount of item-user associations
> (while preserving user-item associations) which will results much less
> effort on intersectionSize(). Even in the case that we have two popular
>

My point is that intersectionSize() is called as part of a similarity
computation. Yes, that's the bottleneck. But, that happens after the stage
where candidate items are identified. And you are talking about changing
the candidate identification stage, which is not the bottleneck.

I think your change *happens* to also reduce the number of similarity
computations since it assumes some are 0, when they are not! sure that
saves time, in the same way that you'll finish an exam faster if you don't
answer half the questions.

I am instead suggesting to optimize intersectionSize(), such that for all
of these 1-item cases, the answer is computed extremely fast. Which also
addresses the bottleneck of course.


I suppose this could be proven or disproven quickly -- do you get the same
speed up with the change I committed, without your change? if you do,
great, we have a solution. If not then I am wrong and you have some example
that pinpoints where the new bottleneck is.

Re: Mahout performance issues

Reply via email to