Re: Mahout Similarity Caching

Gabor Bernat Tue, 23 Apr 2013 06:10:33 -0700

Well,

assume that we have a live system. The system is serving requests 24/7, and
in order to update the model periodically we recreate it on another machine
and just save it (like to a database), and as update just load the saved
data model and exchange it with the one running prior. On this another
machine the similarities could be calculated also, saved, and on the new
machine just load it.

The adventage in this is that you can offload similarity calculation to an
offline system without paying the cold start price on the live/online
system. Loading the pre calcualted top used similarities should certainly
be less costly than doing all this on the live system, even if concurently.
The problem with calculating them on the live system is that for the first
30-40 minutes you'll have long response times compared to the state when
your cache is filled.

Bernát GÁBOR

On Tue, Apr 23, 2013 at 2:54 PM, Sean Owen <sro...@gmail.com> wrote:

> I agree, but how is "pre-adding a cached value for X" different than
> "requesting X from the cache"? Either way you get X in the cache.
> Computing offline seems the same as computing on-line, but in some
> kind of warm-up state or phase. Which can be concurrent with serving
> early requests even. You can do everything else you say without a new
> operation, like selectively pre-caching certain entries.
>
> On Tue, Apr 23, 2013 at 1:14 PM, Gabor Bernat <ber...@primeranks.net>
> wrote:
> > CachingSimilarity also allowed to add manually entries, because in that
> > case this task could be pushed off to an offline system. And yes, you
> > cannot add all the similarities to the caching object, however based on
> > history you can select some top (popular) item pairs, and just calculate
> > for that subset. This could push down the upper request times. Any other
> > ideas?
> >
>

Re: Mahout Similarity Caching

Reply via email to