Re: Mahout Similarity Caching

Sean Owen Tue, 23 Apr 2013 00:18:22 -0700

That still sounds far too high, and it would be interested to profile
to see exactly what's slow. A recommendation entails making estimates
for most or all items, and so should be about as fast as making
estimates directly for a few thousand. Tanimoto similarity is trivial.
In fact it may be slowing things down to cache it.


You can 'warm up' the cache by requesting similarities, which will
then be cached. There's no real point in a separate method to give it
a cached value -- it can figure those out. The problem is, what do you
cache? you can't cache everything and you don't know what's needed
ahead of time.

Something else is not right here I think, like, the measurement is
including some other time.

On Tue, Apr 23, 2013 at 6:20 AM, Gabor Bernat <ber...@primeranks.net> wrote:
> Nope, and nope.
>
> Note that this is an outlier example, however even in other cases it does
> takes 500ms+ which is way to much for what I need.
>
> Thanks,
>
> Bernát GÁBOR
>
>
> On Tue, Apr 23, 2013 at 12:53 AM, Sean Owen <sro...@gmail.com> wrote:
>
>> 49 seconds is orders of magnitude too long -- something is very wrong
>> here, for so little data. Are you running this off a database? or are
>> you somehow counting the overhead of 3-4K network calls?
>>
>> On Mon, Apr 22, 2013 at 11:22 PM, Gabor Bernat <ber...@primeranks.net>
>> wrote:
>> > Hello,
>> >
>> > I'm using Mahout in a system, where the typical response time should be
>> > below 100ms. I'm using an item based recommender with float preference
>> > values (with Tanimato similarity for now, which is passed into a
>> > CachingItemSimilarity objec for performance reasonst). My model has
>> around
>> > 7k items, 26k users with around 100k preferences linking them.
>> >
>> > Instead of performing a recommendation, I only need to estimate
>> preferences
>> > of the user for around 3-4k items (this is important, as this allows the
>> > integration of a business rule engine in the recommendation process
>> inside
>> > the system where I'm working).
>> >
>> > Now my problem is that for users with lots of preferences (200+) this
>> > estimation process takes forever (49second+). I'm assuming the issue lies
>> > into the calculation of the similarity measurements; so I though I'll do
>> > this asynchroniously in a train like process, save it, and at start up
>> just
>> > load it into memory this precomputed information. However, I cannot see
>> any
>> > way to load this information into the CachingSimilarity object; nor can I
>> > persist the CachingSimilarity object and load it.
>> >
>> > So any ideas, on how to cut down the estimation times?
>> >
>> > Thanks,
>> >
>> > Bernát GÁBOR
>>

Re: Mahout Similarity Caching

Reply via email to