Re: Mahout performance issues

Sebastian Schelter Thu, 01 Dec 2011 07:17:02 -0800

If I remember correctly, you 12M users and 18M interactions.

If I interpret the plots correctly there is one single item that
accounts for 8.5M interactions (nearly half of the overall interactions)
and more than two thirds of the users like it?


--sebastian

On 01.12.2011 16:12, Sean Owen wrote:
> You can 'tickle' the cache asynchronously if you like.
> 
> I am still not clear on why you are doing so many item-item similarity
> calculations. Your change ought to let you do 1, or 10, or 100 per
> calculation if you like. That, we know, is fast. And a few hundred
> similarities should start to give reasonable recommendations.
> 
> What is preventing you from making this tradeoff (with your change)?
> Yes, it is essential for reasonable performance.
> 
> On Thu, Dec 1, 2011 at 3:06 PM, Daniel Zohar <[email protected]> wrote:
> 
>> Hi Manuel,
>> I haven't got to the point where CacheItemSimilarity kicks in. That is, I
>> will have to run a lot of recommendations in order to get a real benefit
>> from it. I would first like to optimize the 'cold start' so it's at least
>> serves at reasonable time. Usually cache is used to prevent repeated
>> calculations, but personally I dont think it's a replacement for optimized
>> performance. Don't you agree?
>>
>> Also, I will try to profile the app now as you suggest and send the results
>> asap.
>>
>> Thanks!
>>
>> On Thu, Dec 1, 2011 at 4:56 PM, Manuel Blechschmidt <
>> [email protected]> wrote:
>>
>>> Hi Daniel,
>>> actually you are running the profile inside tomcat. You should take a
>>> snapshot and then drill down to the functions where the actual
>>> recommendation takes place. The current screenshots also contains some
>>> profiles from Tomcat threads which are sleeping a lot and therefore
>> taking
>>> a lot of time.
>>>
>>> Further the screenshots does not contain the amount how often the
>>> different functions are called.
>>>
>>> You have to profile multiple requests alone. The CacheItemSimilarity gets
>>> filled therefore it should go faster and faster.
>>>
>>> On 01.12.2011, at 15:11, Daniel Zohar wrote:
>>>
>>>> @Manuel thanks for the tips. I have installed VisualVM and followed are
>>> the
>>>> results
>>>> I did two sampling -
>>>> - With the optimized SamplingCandidateItemsStrategy (
>>>> http://pastebin.com/6n9C8Pw1):
>> http://static.inky.ws/image/934/image.jpg
>>>> - Without the optimized SamplingCandidateItemsStrategy:
>>>> http://static.inky.ws/image/935/image.jpg
>>>>
>>>
>>> The big hot spot is the function FastIDSet.find():
>>>
>>> Optimized: 13,759 s
>>> Unoptimized: 246,487 s
>>>
>>> So you see that your optimization already got you a performance boost of
>>> 2000%.
>>>
>>> Did you play around with the CacheItemSimilarity cache sizes?
>>>
>>> /Manuel
>>>
>>> --
>>> Manuel Blechschmidt
>>> Dortustr. 57
>>> 14467 Potsdam
>>> Mobil: 0173/6322621
>>> Twitter: http://twitter.com/Manuel_B
>>>
>>>
>>
>

Re: Mahout performance issues

Reply via email to