Re: Mahout performance issues

Sebastian Schelter Thu, 01 Dec 2011 00:37:28 -0800

Daniel, can you plot two curves showing the distribution of
interactions per user and the distribution of interactions per item? I
think we need to get a better picture of your data first.


Generally I always recommend to use precomputed similarities. You can
still serve new users with realtime recommendations, the only
disadvantages are the higher complexity and a delayed inclusion of new
items.

--sebastian

2011/11/30 Sean Owen <[email protected]>:
> The simple answer is that:
>
> Mahout absorbed a non-distributed recommender project called Taste, which
> scales up to a point which may be sufficient for a lot of users. It
> certainly is a lot simpler. Yes it is realistic to do near-real-time
> recommendations, though it gets harder and harder and requires more tuning,
> tradeoffs and optimization as this thread shows.
>
> The rest, written from scratch, is almost all distributed and Hadoop-based,
> including distributed re-implementations of the same algorithms.
>
> On Wed, Nov 30, 2011 at 8:23 PM, Dan Beaulieu
> <[email protected]>wrote:
>
>> Hi all, this is a tangent and can mostly be ignored by the people
>> interested in this problem.
>>
>> I'm new to Machine Learning and especially Mahout. Following this
>> discussion has made me a bit confused.
>> Isn't Mahout used for large datasets where it makes sense to distribute the
>> work? Why then isn't anyone pointing
>> out that the problem may be the use of one single Mahout node? Is it
>> because it's boolean based? Is it because the data set
>> isn't really that large?
>>
>> Even if for whatever reason a single node will do for this case, is it
>> really expected that the recommendation process would finish in less than
>> half a second?
>> This makes me think if that is the expectation then the data set is
>> actually small and Mahout might be overkill...
>>
>> What obvious piece of the Mahout puzzle am I missing?
>>
>> Thanks.
>>
>> Dan
>>
>> On Wed, Nov 30, 2011 at 11:56 AM, Sean Owen <[email protected]> wrote:
>>
>> > Have you used CachingItemSimilarity? That will hold common similarities
>> in
>> > memory. It's a lot easier than pre-computing and might help.
>> >
>> > I think something like your change is a good one (Sebastian what do you
>> > think) in that it gives you the ultimate lever to control how many
>> > candidates are evaluated. That ought to make it go as fast as you like,
>> but
>> > it trades off quality. Still I'd be really surprised if there's no viable
>> > middle ground -- this works fine at smaller scale, where 100s of
>> candidates
>> > are evaluated, perhaps, and you can use your lever to get to 100s of
>> > candidates at your scale too. Is that still both slow and inaccurate?
>> >
>> > On Wed, Nov 30, 2011 at 3:18 PM, Daniel Zohar <[email protected]>
>> wrote:
>> >
>> > > I just tested the app with Mahout 0.6.
>> > > There seems to be a small performance improvement, but still
>> > > recommendations for the 'heavy users' take between 1-5 seconds.
>> > >
>> > >
>> >
>>

Re: Mahout performance issues

Reply via email to