Re: "LLR with time"

Johannes Schulte Tue, 14 Nov 2017 12:02:39 -0800

✓

On Mon, Nov 13, 2017 at 3:32 AM, Ted Dunning <ted.dunn...@gmail.com> wrote:


> Regarding overfitting, don't forget dithering. That can be the most
> important single step you take in building a good recommender.
>
> Dithering can be inversely proportional to amount of exposures so far if
> you like to give novel items more exposure.
>
> This doesn't have to be very fancy. I have had very good results by
> generating a long list of recommendations, computing a pseudo score based
> on rank, adding a bit of noise and resorting. I also scanned down the list
> and penalized items that showed insufficient diversity.  Then I resorted
> again. Typically, the pseudo score was something like exp(-r) where r is
> rank.
>
> The noise scale is adjusted to leave a good proportion of originally
> recommended items in the first page. It could have easily been scaled by
> 1/sqrt(exposures) to let the newbies move around more.
>
> The parameters here should be adjusted a bit based on experiments, but a
> heuristic first hack works pretty well as a start.
>
>
>
>
>
> On Sun, Nov 12, 2017 at 10:34 PM, Pat Ferrel <p...@occamsmachete.com>
> wrote:
>
> > Part of what Ted is talking about can be seen in the carousels on Netflix
> > or Amazon. Some are not recommendations like “trending” videos, or “new”
> > videos, or “prime” videos (substitute your own promotions here). Nothing
> to
> > do with recommender created items but presented along with
> > recommender-based carousels. They are based on analytics or business
> rules
> > and ideally have some randomness built in. The reason for this is 1) it
> > works by exposing users to items that they would not see in
> recommendations
> > and 2) it provides data to build the recommender model from.
> >
> > A recommender cannot work in an app that has no non-recommended items
> > displayed or there will be no un-biased data to create recommendations
> > from. This would lead to crippling overfitting. Most apps have placements
> > like the ones mentioned above and also have search and browse. However
> you
> > do it, it must be prominent and aways available. The moral of this
> > paragraph is; don’t try to make everything a recommendation, it will be
> > self-defeating. In fact make sure not every video watch comes from a
> > recommendation.
> >
> > Likewise think of placements (reflecting a particular recommender use) as
> > experimentation grounds. Try things like finding a recommended category
> and
> > then recommending items in that category all based on user behavior. Or
> try
> > a placement based on a single thing a user watched like “because you
> > watched xyz you might like these”. Don’t just show the most popular
> > categories for the user and recommend items in them. This would be a type
> > of overfitting too.
> >
> > I’m sure we have strayed far from your original question but maybe it’s
> > covered somewhere in here.
> >
> >
> > On Nov 12, 2017, at 12:11 PM, Johannes Schulte <
> johannes.schu...@gmail.com>
> > wrote:
> >
> > I did "second order" recommendations before but more to fight sparsity
> and
> > find more significant associations in situations with less traffic, so
> > recommending categories instead of products. There needs to be some third
> > order sorting / boosting like you mentioned with "new music", or maybe
> > popularity or hotness to avoid quasi-random order. For events with
> limited
> > lifetime it's probably some mixture of spatial distance and freshness.
> >
> > We will definetely keep an eye on the generation process of data for new
> > items. It depends on the domain but in the time of multi channel
> promotion
> > of videos, shows and products, it's also helps that there is traffic
> driven
> > from external sources.
> >
> > Thanks for the detailed  hints - now it's time to see what comes out of
> > this.
> >
> > Johannes
> >
> > On Sun, Nov 12, 2017 at 7:52 AM, Ted Dunning <ted.dunn...@gmail.com>
> > wrote:
> >
> > > Events have the natural good quality that having a cold start means
> that
> > > you will naturally favor recent interactions simply because there won't
> > be
> > > any old interactions to deal with.
> > >
> > > Unfortunately, that also means that you will likely be facing serious
> > cold
> > > start issues all the time. I have used two strategies to deal with cold
> > > starts, both fairly successfully.
> > >
> > > *Method 1: Second order recommendation*
> > >
> > > For novel items with no history, you typically do have some kind of
> > > information about the content. For an event, you may know the
> performer,
> > > the organizer, the venue, possibly something about the content of the
> > event
> > > as well (especially for a tour event). As such, you can build a
> > recommender
> > > that recommends this secondary information and then do a search with
> > > recommended secondary information to find events. This actually works
> > > pretty well, at least for the domains where I have used (music and
> > videos).
> > > For instance, in music, you can easily recommend a new album based on
> the
> > > artist (s) and track list.
> > >
> > > The trick here is to determine when and how to blend in normal
> > > recommendations. One way is query blending where you combine the second
> > > order query with a normal recommendation query, but I think that a fair
> > bit
> > > of experimentation is warranted here.
> > >
> > > *Method 2: What's new and what's trending*
> > >
> > > It is always important to provide alternative avenues of information
> > > gathering for recommendation. Especially for the user generated video
> > case,
> > > there was pretty high interest in the "What's new" and "What's hot"
> > pages.
> > > If you do a decent job of dithering here, you keep reasonably good
> > content
> > > on the what's new page longer than content that doesn't pull. That
> > > maintains interest in the page. Similarly, you can have a bit of a
> lower
> > > bar for new content to be classified as hot than established content.
> > That
> > > way you keep the page fresh (because new stuff appears transiently),
> but
> > > you also have a fair bit of really good stuff as well. If done well,
> > these
> > > pages will provide enough interactions with new items so that they
> don't
> > > start entirely cold. You may need to have genre specific or location
> > > specific versions of these pages to avoid interesting content being
> > > overwhelmed. You might also be able to spot content that has intense
> > > interest from a sub-population as opposed to diffuse interest from a
> mass
> > > population.
> > >
> > > You can also use novelty and trending boosts for content in the normal
> > > recommendation engine. I have avoided this in the past because I felt
> it
> > > was better to have specialized pages for what's new and hot rather than
> > > because I had data saying it was bad to do. I have put a very weak
> > > recommendation effect on the what's hot pages so that people tend to
> see
> > > trending material that they like. That doesn't help on what's new pages
> > for
> > > obvious reasons unless you use a touch of second order recommendation.
> > >
> > >
> > >
> > >
> > >
> > > On Sat, Nov 11, 2017 at 11:00 PM, Johannes Schulte <
> > > johannes.schu...@gmail.com> wrote:
> > >
> > >> Well the greece thing was just an example for a thing you don't know
> > >> upfront - it could be any of the modeled feature on the cross
> > recommender
> > >> input side (user segment, country, city, previous buys), some
> > > subpopulation
> > >> getting active, so the current approach, probably with sampling that
> > >> favours newer events, will be the best here. Luckily a sampling
> strategy
> > > is
> > >> a big topic anyway since we're trying to go for the near real time
> way -
> > >> pat, you talked about it some while ago on this list and i still have
> to
> > >> look at the flink talk from trevor grant but I'm really eager to
> attack
> > >> this after years of batch :)
> > >>
> > >> Thanks for your thoughts, I am happy I can rule something out given
> the
> > >> domain (poisson llr). Luckily the domain I'm working on is event
> > >> recommendations, so there is a natural deterministic item expiry (as
> > >> compared to christmas like stuff).
> > >>
> > >> Again,
> > >> thanks!
> > >>
> > >>
> > >> On Sat, Nov 11, 2017 at 7:00 PM, Ted Dunning <ted.dunn...@gmail.com>
> > >> wrote:
> > >>
> > >>> Inline.
> > >>>
> > >>> On Sat, Nov 11, 2017 at 6:31 PM, Pat Ferrel <p...@occamsmachete.com>
> > >> wrote:
> > >>>
> > >>>> If Mahout were to use http://bit.ly/poisson-llr it would tend to
> > > favor
> > >>>> new events in calculating the LLR score for later use in the
> > > threshold
> > >>> for
> > >>>> whether a co or cross-occurrence iss incorporated in the model.
> > >>>
> > >>>
> > >>> I don't think that this would actually help for most recommendation
> > >>> purposes.
> > >>>
> > >>> It might help to determine that some item or other has broken out of
> > >>> historical rates. Thus, we might have "hotness" as a detected feature
> > >> that
> > >>> could be used as a boost at recommendation time. We might also have
> > > "not
> > >>> hotness" as a negative boost feature.
> > >>>
> > >>> Since we have a pretty good handle on the "other" counts, I don't
> think
> > >>> that the Poisson test would help much with the cooccurrence stuff
> > > itself.
> > >>>
> > >>> Changing the sampling rule could make a difference to temporality and
> > >> would
> > >>> be more like what Johannes is asking about.
> > >>>
> > >>>
> > >>>> But it doesn’t relate to popularity as I think Ted is saying.
> > >>>>
> > >>>> Are you looking for 1) personal recommendations biased by hotness in
> > >>>> Greece or 2) things hot in Greece?
> > >>>>
> > >>>> 1) create a secondary indicator for “watched in some locale” the
> > >> local-id
> > >>>> uses a country-code+postal-code maybe but not lat-lon. Something
> that
> > >>>> includes a good number of people/events. The the query would be
> > >> user-id,
> > >>>> and user-locale. This would yield personal recs preferred in the
> > > user’s
> > >>>> locale. Athens-west-side in this case.
> > >>>>
> > >>>
> > >>> And this works in the current regime. Simply add location tags to the
> > >> user
> > >>> histories and do cooccurrence against content. Locations will pop out
> > > as
> > >>> indicators for some content and not for others. Then when somebody
> > >> appears
> > >>> in some location, their tags will retrieve localized content.
> > >>>
> > >>> For localization based on strict geography, say for restaurant
> search,
> > > we
> > >>> can just add business rules based on geo-search. A very large bank
> > >> customer
> > >>> of ours does that, for instance.
> > >>>
> > >>>
> > >>>> 2) split the data into locales and do the hot calc I mention. The
> > > query
> > >>>> would have no user-id since it is not personalized but would yield
> > > “hot
> > >>> in
> > >>>> Greece”
> > >>>>
> > >>>
> > >>> I think that this is a good approach.
> > >>>
> > >>>
> > >>>>
> > >>>> Ted’s “Christmas video” tag is what I was calling a business rule
> and
> > >> can
> > >>>> be added to either of the above techniques.
> > >>>>
> > >>>
> > >>> But the (not) hotness feature might help with automated this.
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>>
> > >>>> On Nov 11, 2017, at 4:01 AM, Ted Dunning <ted.dunn...@gmail.com>
> > >> wrote:
> > >>>>
> > >>>> So ... there are a few different threads here.
> > >>>>
> > >>>> 1) LLR but with time. Quite possible, but not really what Johannes
> is
> > >>>> talking about, I think. See http://bit.ly/poisson-llr for a quick
> > >>>> discussion.
> > >>>>
> > >>>> 2) time varying recommendation. As Johannes notes, this can make use
> > > of
> > >>>> windowed counts. The problem is that rarely accessed items should
> > >>> probably
> > >>>> have longer windows so that we use longer term trends when we have
> > > less
> > >>>> data.
> > >>>>
> > >>>> The good news here is that this some part of this is nearly already
> > > in
> > >>> the
> > >>>> code. The trick is that the down-sampling used in the system can be
> > >>> adapted
> > >>>> to favor recent events over older ones. That means that if the
> > > meaning
> > >> of
> > >>>> something changes over time, the system will catch on. Likewise, if
> > >>>> something appears out of nowhere, it will quickly train up. This
> > >> handles
> > >>>> the popular in Greece right now problem.
> > >>>>
> > >>>> But this isn't the whole story of changing recommendations. Another
> > >>> problem
> > >>>> that we commonly face is what I call the christmas music issue. The
> > >> idea
> > >>> is
> > >>>> that there are lots of recommendations for music that are highly
> > >>> seasonal.
> > >>>> Thus, Bing Crosby fans want to hear White Christmas
> > >>>> <https://www.youtube.com/watch?v=P8Ozdqzjigg> until the day after
> > >>>> christmas
> > >>>> at which point this becomes a really bad recommendation. To some
> > >> degree,
> > >>>> this can be partially dealt with by using temporal tags as
> > > indicators,
> > >>> but
> > >>>> that doesn't really allow a recommendation to be completely shut
> > > down.
> > >>>>
> > >>>> The only way that I have seen to deal with this in the past is with
> a
> > >>>> manually designed kill switch. As much as possible, we would tag the
> > >>>> obviously seasonal content and then add a filter to kill or
> downgrade
> > >>> that
> > >>>> content the moment it went out of fashion.
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Sat, Nov 11, 2017 at 9:43 AM, Johannes Schulte <
> > >>>> johannes.schu...@gmail.com> wrote:
> > >>>>
> > >>>>> Pat, thanks for your help. especially the insights on how you
> > > handle
> > >>> the
> > >>>>> system in production and the tips for multiple acyclic buckets.
> > >>>>> Doing the combination signalls when querying sounds okay but as you
> > >>> say,
> > >>>>> it's always hard to find the right boosts without setting up some
> > > ltr
> > >>>>> system. If there would be a way to use the hotness when calculating
> > >> the
> > >>>>> indicators for subpopulations it would be great., especially for a
> > >>> cross
> > >>>>> recommender.
> > >>>>>
> > >>>>> e.g. people in greece _now_ are viewing this show/product  whatever
> > >>>>>
> > >>>>> And here the popularity of the recommended item in this
> > > subpopulation
> > >>>> could
> > >>>>> be overrseen when just looking at the overall derivatives of
> > >> activity.
> > >>>>>
> > >>>>> Maybe one could do multiple G-Tests using sliding windows
> > >>>>> * itemA&itemB  vs population (classic)
> > >>>>> * itemA&itemB(t) vs itemA&itemB(t-1)
> > >>>>> ..
> > >>>>>
> > >>>>> and derive multiple indicators per item to be indexed.
> > >>>>>
> > >>>>> But this all relies on discretizing time into buckets and not
> > > looking
> > >>> at
> > >>>>> the distribution of time between events like in presentation above
> > > -
> > >>>> maybe
> > >>>>> there is  something way smarter
> > >>>>>
> > >>>>> Johannes
> > >>>>>
> > >>>>> On Sat, Nov 11, 2017 at 2:50 AM, Pat Ferrel <p...@occamsmachete.com
> > >>
> > >>>> wrote:
> > >>>>>
> > >>>>>> BTW you should take time buckets that are relatively free of daily
> > >>>> cycles
> > >>>>>> like 3 day, week, or month buckets for “hot”. This is to remove
> > >>> cyclical
> > >>>>>> affects from the frequencies as much as possible since you need 3
> > >>>> buckets
> > >>>>>> to see the change in change, 2 for the change, and 1 for the event
> > >>>>> volume.
> > >>>>>>
> > >>>>>>
> > >>>>>> On Nov 10, 2017, at 4:12 PM, Pat Ferrel <p...@occamsmachete.com>
> > >>> wrote:
> > >>>>>>
> > >>>>>> So your idea is to find anomalies in event frequencies to detect
> > >> “hot”
> > >>>>>> items?
> > >>>>>>
> > >>>>>> Interesting, maybe Ted will chime in.
> > >>>>>>
> > >>>>>> What I do is take the frequency, first, and second, derivatives as
> > >>>>>> measures of popularity, increasing popularity, and increasingly
> > >>>>> increasing
> > >>>>>> popularity. Put another way popular, trending, and hot. This is
> > >> simple
> > >>>> to
> > >>>>>> do by taking 1, 2, or 3 time buckets and looking at the number of
> > >>>> events,
> > >>>>>> derivative (difference), and second derivative. Ranking all items
> > > by
> > >>>>> these
> > >>>>>> value gives various measures of popularity or its increase.
> > >>>>>>
> > >>>>>> If your use is in a recommender you can add a ranking field to all
> > >>> items
> > >>>>>> and query for “hot” by using the ranking you calculated.
> > >>>>>>
> > >>>>>> If you want to bias recommendations by hotness, query with user
> > >>> history
> > >>>>>> and boost by your hot field. I suspect the hot field will tend to
> > >>>>> overwhelm
> > >>>>>> your user history in this case as it would if you used anomalies
> > > so
> > >>>> you’d
> > >>>>>> also have to normalize the hotness to some range closer to the one
> > >>>>> created
> > >>>>>> by the user history matching score. I haven’t found a vey good way
> > >> to
> > >>>> mix
> > >>>>>> these in a model so use hot as a method of backfill if you cannot
> > >>> return
> > >>>>>> enough recommendations or in places where you may want to show
> > > just
> > >>> hot
> > >>>>>> items. There are several benefits to this method of using hot to
> > >> rank
> > >>>> all
> > >>>>>> items including the fact that you can apply business rules to them
> > >>> just
> > >>>>> as
> > >>>>>> normal recommendations—so you can ask for hot in “electronics” if
> > >> you
> > >>>>> know
> > >>>>>> categories, or hot "in-stock" items, or ...
> > >>>>>>
> > >>>>>> Still anomaly detection does sound like an interesting approach.
> > >>>>>>
> > >>>>>>
> > >>>>>> On Nov 10, 2017, at 3:13 PM, Johannes Schulte <
> > >>>>> johannes.schu...@gmail.com>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>> Hi "all",
> > >>>>>>
> > >>>>>> I am wondering what would be the best way to incorporate event
> > > time
> > >>>>>> information into the calculation of the G-Test.
> > >>>>>>
> > >>>>>> There is a claim here
> > >>>>>> https://de.slideshare.net/tdunning/finding-changes-in-real-data
> > >>>>>>
> > >>>>>> saying "Time aware variant of G-Test is possible"
> > >>>>>>
> > >>>>>> I remember i experimented with exponentially decayed counts some
> > >> years
> > >>>>> ago
> > >>>>>> and this involved changing the counts to doubles, but I suspect
> > >> there
> > >>> is
> > >>>>>> some smarter way. What I don't get is the relation to a data
> > >> structure
> > >>>>> like
> > >>>>>> T-Digest when working with a lot of counts / cells for every
> > >>> combination
> > >>>>> of
> > >>>>>> items. Keeping a t-digest for every combination seems unfeasible.
> > >>>>>>
> > >>>>>> How would one incorporate event time into recommendations to
> > > detect
> > >>>>>> "hotness" of certain relations? Glad if someone has an idea...
> > >>>>>>
> > >>>>>> Cheers,
> > >>>>>>
> > >>>>>> Johannes
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>>
> > >>>
> > >>
> > >
> >
> >
>

Re: "LLR with time"

Reply via email to