Thanks Sebastian. What I fail to see that how can I offer this information to the recommendation object, once I have calculated it. How can I pass the output file to the GenericItemBasedRecommender that I use to estimate preferences?
Bernát GÁBOR On Tue, Apr 23, 2013 at 3:15 PM, Sebastian Schelter <s...@apache.org> wrote: > Hi Bernat, > > you can do the offline similarity calculation on a single machine with > > o.a.m.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities > > or on a Hadoop cluster (if necessary) with > o.a.m.cf.taste.hadoop.similarity.item.ItemSimilarityJob > > I think such a setup is much easier than coming up with a complicated > caching logic. > > Best, > Sebastian > > On 23.04.2013 15:09, Gabor Bernat wrote: > > Well, > > > > assume that we have a live system. The system is serving requests 24/7, > and > > in order to update the model periodically we recreate it on another > machine > > and just save it (like to a database), and as update just load the saved > > data model and exchange it with the one running prior. On this another > > machine the similarities could be calculated also, saved, and on the new > > machine just load it. > > > > The adventage in this is that you can offload similarity calculation to > an > > offline system without paying the cold start price on the live/online > > system. Loading the pre calcualted top used similarities should certainly > > be less costly than doing all this on the live system, even if > concurently. > > The problem with calculating them on the live system is that for the > first > > 30-40 minutes you'll have long response times compared to the state when > > your cache is filled. > > > > > > Bernát GÁBOR > > > > > > On Tue, Apr 23, 2013 at 2:54 PM, Sean Owen <sro...@gmail.com> wrote: > > > >> I agree, but how is "pre-adding a cached value for X" different than > >> "requesting X from the cache"? Either way you get X in the cache. > >> Computing offline seems the same as computing on-line, but in some > >> kind of warm-up state or phase. Which can be concurrent with serving > >> early requests even. You can do everything else you say without a new > >> operation, like selectively pre-caching certain entries. > >> > >> On Tue, Apr 23, 2013 at 1:14 PM, Gabor Bernat <ber...@primeranks.net> > >> wrote: > >>> CachingSimilarity also allowed to add manually entries, because in that > >>> case this task could be pushed off to an offline system. And yes, you > >>> cannot add all the similarities to the caching object, however based on > >>> history you can select some top (popular) item pairs, and just > calculate > >>> for that subset. This could push down the upper request times. Any > other > >>> ideas? > >>> > >> > > > >