Re: Mahout Similarity Caching

Gabor Bernat Tue, 23 Apr 2013 05:15:20 -0700

Hello,

So I found out why the extremes. I also needed to make estimations to some
items not present in the data model. To test if an item is inside the model
I used the getPreferences for the item (because there isn't a does model
contains item ID member function for the model), which throws an exception
if the item id is not present inside the model. If the exception was thrown
I just catched and moved on. However, if I had 1-3k of such items the
exception handling summed up to the 49s. Now I created a hash map with the
itemIds, and just check prior to the estimation request if it's there and
now I'ts a lot better: 500-900ms on first call, 100-200ms once the caching
starts to kick in (I've also extended the model to 13k items). The cold
requests are still quite coumbersome. Given that the model is static, I
feel that the need of the cold requests could be mitigated if the
CachingSimilarity also allowed to add manually entries, because in that
case this task could be pushed off to an offline system. And yes, you
cannot add all the similarities to the caching object, however based on
history you can select some top (popular) item pairs, and just calculate
for that subset. This could push down the upper request times. Any other
ideas?


Thanks,

Bernát GÁBOR


On Tue, Apr 23, 2013 at 9:17 AM, Sean Owen <sro...@gmail.com> wrote:

> That still sounds far too high, and it would be interested to profile
> to see exactly what's slow. A recommendation entails making estimates
> for most or all items, and so should be about as fast as making
> estimates directly for a few thousand. Tanimoto similarity is trivial.
> In fact it may be slowing things down to cache it.
>
> You can 'warm up' the cache by requesting similarities, which will
> then be cached. There's no real point in a separate method to give it
> a cached value -- it can figure those out. The problem is, what do you
> cache? you can't cache everything and you don't know what's needed
> ahead of time.
>
> Something else is not right here I think, like, the measurement is
> including some other time.
>
> On Tue, Apr 23, 2013 at 6:20 AM, Gabor Bernat <ber...@primeranks.net>
> wrote:
> > Nope, and nope.
> >
> > Note that this is an outlier example, however even in other cases it does
> > takes 500ms+ which is way to much for what I need.
> >
> > Thanks,
> >
> > Bernát GÁBOR
> >
> >
> > On Tue, Apr 23, 2013 at 12:53 AM, Sean Owen <sro...@gmail.com> wrote:
> >
> >> 49 seconds is orders of magnitude too long -- something is very wrong
> >> here, for so little data. Are you running this off a database? or are
> >> you somehow counting the overhead of 3-4K network calls?
> >>
> >> On Mon, Apr 22, 2013 at 11:22 PM, Gabor Bernat <ber...@primeranks.net>
> >> wrote:
> >> > Hello,
> >> >
> >> > I'm using Mahout in a system, where the typical response time should
> be
> >> > below 100ms. I'm using an item based recommender with float preference
> >> > values (with Tanimato similarity for now, which is passed into a
> >> > CachingItemSimilarity objec for performance reasonst). My model has
> >> around
> >> > 7k items, 26k users with around 100k preferences linking them.
> >> >
> >> > Instead of performing a recommendation, I only need to estimate
> >> preferences
> >> > of the user for around 3-4k items (this is important, as this allows
> the
> >> > integration of a business rule engine in the recommendation process
> >> inside
> >> > the system where I'm working).
> >> >
> >> > Now my problem is that for users with lots of preferences (200+) this
> >> > estimation process takes forever (49second+). I'm assuming the issue
> lies
> >> > into the calculation of the similarity measurements; so I though I'll
> do
> >> > this asynchroniously in a train like process, save it, and at start up
> >> just
> >> > load it into memory this precomputed information. However, I cannot
> see
> >> any
> >> > way to load this information into the CachingSimilarity object; nor
> can I
> >> > persist the CachingSimilarity object and load it.
> >> >
> >> > So any ideas, on how to cut down the estimation times?
> >> >
> >> > Thanks,
> >> >
> >> > Bernát GÁBOR
> >>
>

Re: Mahout Similarity Caching

Reply via email to