Re: Mahout Similarity Caching

Gabor Bernat Tue, 23 Apr 2013 06:31:00 -0700

Thanks Sebastian.

What I fail to see that how can I offer this information to the
recommendation object, once I have calculated it. How can I pass the output
file to the GenericItemBasedRecommender that I use to estimate preferences?




Bernát GÁBOR


On Tue, Apr 23, 2013 at 3:15 PM, Sebastian Schelter <s...@apache.org> wrote:

> Hi Bernat,
>
> you can do the offline similarity calculation on a single machine with
>
> o.a.m.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
>
> or on a Hadoop cluster (if necessary) with
> o.a.m.cf.taste.hadoop.similarity.item.ItemSimilarityJob
>
> I think such a setup is much easier than coming up with a complicated
> caching logic.
>
> Best,
> Sebastian
>
> On 23.04.2013 15:09, Gabor Bernat wrote:
> > Well,
> >
> > assume that we have a live system. The system is serving requests 24/7,
> and
> > in order to update the model periodically we recreate it on another
> machine
> > and just save it (like to a database), and as update just load the saved
> > data model and exchange it with the one running prior. On this another
> > machine the similarities could be calculated also, saved, and on the new
> > machine just load it.
> >
> > The adventage in this is that you can offload similarity calculation to
> an
> > offline system without paying the cold start price on the live/online
> > system. Loading the pre calcualted top used similarities should certainly
> > be less costly than doing all this on the live system, even if
> concurently.
> > The problem with calculating them on the live system is that for the
> first
> > 30-40 minutes you'll have long response times compared to the state when
> > your cache is filled.
> >
> >
> > Bernát GÁBOR
> >
> >
> > On Tue, Apr 23, 2013 at 2:54 PM, Sean Owen <sro...@gmail.com> wrote:
> >
> >> I agree, but how is "pre-adding a cached value for X" different than
> >> "requesting X from the cache"? Either way you get X in the cache.
> >> Computing offline seems the same as computing on-line, but in some
> >> kind of warm-up state or phase. Which can be concurrent with serving
> >> early requests even. You can do everything else you say without a new
> >> operation, like selectively pre-caching certain entries.
> >>
> >> On Tue, Apr 23, 2013 at 1:14 PM, Gabor Bernat <ber...@primeranks.net>
> >> wrote:
> >>> CachingSimilarity also allowed to add manually entries, because in that
> >>> case this task could be pushed off to an offline system. And yes, you
> >>> cannot add all the similarities to the caching object, however based on
> >>> history you can select some top (popular) item pairs, and just
> calculate
> >>> for that subset. This could push down the upper request times. Any
> other
> >>> ideas?
> >>>
> >>
> >
>
>

Re: Mahout Similarity Caching

Reply via email to