✓ On Mon, Nov 13, 2017 at 3:32 AM, Ted Dunning <ted.dunn...@gmail.com> wrote:
> Regarding overfitting, don't forget dithering. That can be the most > important single step you take in building a good recommender. > > Dithering can be inversely proportional to amount of exposures so far if > you like to give novel items more exposure. > > This doesn't have to be very fancy. I have had very good results by > generating a long list of recommendations, computing a pseudo score based > on rank, adding a bit of noise and resorting. I also scanned down the list > and penalized items that showed insufficient diversity. Then I resorted > again. Typically, the pseudo score was something like exp(-r) where r is > rank. > > The noise scale is adjusted to leave a good proportion of originally > recommended items in the first page. It could have easily been scaled by > 1/sqrt(exposures) to let the newbies move around more. > > The parameters here should be adjusted a bit based on experiments, but a > heuristic first hack works pretty well as a start. > > > > > > On Sun, Nov 12, 2017 at 10:34 PM, Pat Ferrel <p...@occamsmachete.com> > wrote: > > > Part of what Ted is talking about can be seen in the carousels on Netflix > > or Amazon. Some are not recommendations like “trending” videos, or “new” > > videos, or “prime” videos (substitute your own promotions here). Nothing > to > > do with recommender created items but presented along with > > recommender-based carousels. They are based on analytics or business > rules > > and ideally have some randomness built in. The reason for this is 1) it > > works by exposing users to items that they would not see in > recommendations > > and 2) it provides data to build the recommender model from. > > > > A recommender cannot work in an app that has no non-recommended items > > displayed or there will be no un-biased data to create recommendations > > from. This would lead to crippling overfitting. Most apps have placements > > like the ones mentioned above and also have search and browse. However > you > > do it, it must be prominent and aways available. The moral of this > > paragraph is; don’t try to make everything a recommendation, it will be > > self-defeating. In fact make sure not every video watch comes from a > > recommendation. > > > > Likewise think of placements (reflecting a particular recommender use) as > > experimentation grounds. Try things like finding a recommended category > and > > then recommending items in that category all based on user behavior. Or > try > > a placement based on a single thing a user watched like “because you > > watched xyz you might like these”. Don’t just show the most popular > > categories for the user and recommend items in them. This would be a type > > of overfitting too. > > > > I’m sure we have strayed far from your original question but maybe it’s > > covered somewhere in here. > > > > > > On Nov 12, 2017, at 12:11 PM, Johannes Schulte < > johannes.schu...@gmail.com> > > wrote: > > > > I did "second order" recommendations before but more to fight sparsity > and > > find more significant associations in situations with less traffic, so > > recommending categories instead of products. There needs to be some third > > order sorting / boosting like you mentioned with "new music", or maybe > > popularity or hotness to avoid quasi-random order. For events with > limited > > lifetime it's probably some mixture of spatial distance and freshness. > > > > We will definetely keep an eye on the generation process of data for new > > items. It depends on the domain but in the time of multi channel > promotion > > of videos, shows and products, it's also helps that there is traffic > driven > > from external sources. > > > > Thanks for the detailed hints - now it's time to see what comes out of > > this. > > > > Johannes > > > > On Sun, Nov 12, 2017 at 7:52 AM, Ted Dunning <ted.dunn...@gmail.com> > > wrote: > > > > > Events have the natural good quality that having a cold start means > that > > > you will naturally favor recent interactions simply because there won't > > be > > > any old interactions to deal with. > > > > > > Unfortunately, that also means that you will likely be facing serious > > cold > > > start issues all the time. I have used two strategies to deal with cold > > > starts, both fairly successfully. > > > > > > *Method 1: Second order recommendation* > > > > > > For novel items with no history, you typically do have some kind of > > > information about the content. For an event, you may know the > performer, > > > the organizer, the venue, possibly something about the content of the > > event > > > as well (especially for a tour event). As such, you can build a > > recommender > > > that recommends this secondary information and then do a search with > > > recommended secondary information to find events. This actually works > > > pretty well, at least for the domains where I have used (music and > > videos). > > > For instance, in music, you can easily recommend a new album based on > the > > > artist (s) and track list. > > > > > > The trick here is to determine when and how to blend in normal > > > recommendations. One way is query blending where you combine the second > > > order query with a normal recommendation query, but I think that a fair > > bit > > > of experimentation is warranted here. > > > > > > *Method 2: What's new and what's trending* > > > > > > It is always important to provide alternative avenues of information > > > gathering for recommendation. Especially for the user generated video > > case, > > > there was pretty high interest in the "What's new" and "What's hot" > > pages. > > > If you do a decent job of dithering here, you keep reasonably good > > content > > > on the what's new page longer than content that doesn't pull. That > > > maintains interest in the page. Similarly, you can have a bit of a > lower > > > bar for new content to be classified as hot than established content. > > That > > > way you keep the page fresh (because new stuff appears transiently), > but > > > you also have a fair bit of really good stuff as well. If done well, > > these > > > pages will provide enough interactions with new items so that they > don't > > > start entirely cold. You may need to have genre specific or location > > > specific versions of these pages to avoid interesting content being > > > overwhelmed. You might also be able to spot content that has intense > > > interest from a sub-population as opposed to diffuse interest from a > mass > > > population. > > > > > > You can also use novelty and trending boosts for content in the normal > > > recommendation engine. I have avoided this in the past because I felt > it > > > was better to have specialized pages for what's new and hot rather than > > > because I had data saying it was bad to do. I have put a very weak > > > recommendation effect on the what's hot pages so that people tend to > see > > > trending material that they like. That doesn't help on what's new pages > > for > > > obvious reasons unless you use a touch of second order recommendation. > > > > > > > > > > > > > > > > > > On Sat, Nov 11, 2017 at 11:00 PM, Johannes Schulte < > > > johannes.schu...@gmail.com> wrote: > > > > > >> Well the greece thing was just an example for a thing you don't know > > >> upfront - it could be any of the modeled feature on the cross > > recommender > > >> input side (user segment, country, city, previous buys), some > > > subpopulation > > >> getting active, so the current approach, probably with sampling that > > >> favours newer events, will be the best here. Luckily a sampling > strategy > > > is > > >> a big topic anyway since we're trying to go for the near real time > way - > > >> pat, you talked about it some while ago on this list and i still have > to > > >> look at the flink talk from trevor grant but I'm really eager to > attack > > >> this after years of batch :) > > >> > > >> Thanks for your thoughts, I am happy I can rule something out given > the > > >> domain (poisson llr). Luckily the domain I'm working on is event > > >> recommendations, so there is a natural deterministic item expiry (as > > >> compared to christmas like stuff). > > >> > > >> Again, > > >> thanks! > > >> > > >> > > >> On Sat, Nov 11, 2017 at 7:00 PM, Ted Dunning <ted.dunn...@gmail.com> > > >> wrote: > > >> > > >>> Inline. > > >>> > > >>> On Sat, Nov 11, 2017 at 6:31 PM, Pat Ferrel <p...@occamsmachete.com> > > >> wrote: > > >>> > > >>>> If Mahout were to use http://bit.ly/poisson-llr it would tend to > > > favor > > >>>> new events in calculating the LLR score for later use in the > > > threshold > > >>> for > > >>>> whether a co or cross-occurrence iss incorporated in the model. > > >>> > > >>> > > >>> I don't think that this would actually help for most recommendation > > >>> purposes. > > >>> > > >>> It might help to determine that some item or other has broken out of > > >>> historical rates. Thus, we might have "hotness" as a detected feature > > >> that > > >>> could be used as a boost at recommendation time. We might also have > > > "not > > >>> hotness" as a negative boost feature. > > >>> > > >>> Since we have a pretty good handle on the "other" counts, I don't > think > > >>> that the Poisson test would help much with the cooccurrence stuff > > > itself. > > >>> > > >>> Changing the sampling rule could make a difference to temporality and > > >> would > > >>> be more like what Johannes is asking about. > > >>> > > >>> > > >>>> But it doesn’t relate to popularity as I think Ted is saying. > > >>>> > > >>>> Are you looking for 1) personal recommendations biased by hotness in > > >>>> Greece or 2) things hot in Greece? > > >>>> > > >>>> 1) create a secondary indicator for “watched in some locale” the > > >> local-id > > >>>> uses a country-code+postal-code maybe but not lat-lon. Something > that > > >>>> includes a good number of people/events. The the query would be > > >> user-id, > > >>>> and user-locale. This would yield personal recs preferred in the > > > user’s > > >>>> locale. Athens-west-side in this case. > > >>>> > > >>> > > >>> And this works in the current regime. Simply add location tags to the > > >> user > > >>> histories and do cooccurrence against content. Locations will pop out > > > as > > >>> indicators for some content and not for others. Then when somebody > > >> appears > > >>> in some location, their tags will retrieve localized content. > > >>> > > >>> For localization based on strict geography, say for restaurant > search, > > > we > > >>> can just add business rules based on geo-search. A very large bank > > >> customer > > >>> of ours does that, for instance. > > >>> > > >>> > > >>>> 2) split the data into locales and do the hot calc I mention. The > > > query > > >>>> would have no user-id since it is not personalized but would yield > > > “hot > > >>> in > > >>>> Greece” > > >>>> > > >>> > > >>> I think that this is a good approach. > > >>> > > >>> > > >>>> > > >>>> Ted’s “Christmas video” tag is what I was calling a business rule > and > > >> can > > >>>> be added to either of the above techniques. > > >>>> > > >>> > > >>> But the (not) hotness feature might help with automated this. > > >>> > > >>> > > >>> > > >>> > > >>>> > > >>>> On Nov 11, 2017, at 4:01 AM, Ted Dunning <ted.dunn...@gmail.com> > > >> wrote: > > >>>> > > >>>> So ... there are a few different threads here. > > >>>> > > >>>> 1) LLR but with time. Quite possible, but not really what Johannes > is > > >>>> talking about, I think. See http://bit.ly/poisson-llr for a quick > > >>>> discussion. > > >>>> > > >>>> 2) time varying recommendation. As Johannes notes, this can make use > > > of > > >>>> windowed counts. The problem is that rarely accessed items should > > >>> probably > > >>>> have longer windows so that we use longer term trends when we have > > > less > > >>>> data. > > >>>> > > >>>> The good news here is that this some part of this is nearly already > > > in > > >>> the > > >>>> code. The trick is that the down-sampling used in the system can be > > >>> adapted > > >>>> to favor recent events over older ones. That means that if the > > > meaning > > >> of > > >>>> something changes over time, the system will catch on. Likewise, if > > >>>> something appears out of nowhere, it will quickly train up. This > > >> handles > > >>>> the popular in Greece right now problem. > > >>>> > > >>>> But this isn't the whole story of changing recommendations. Another > > >>> problem > > >>>> that we commonly face is what I call the christmas music issue. The > > >> idea > > >>> is > > >>>> that there are lots of recommendations for music that are highly > > >>> seasonal. > > >>>> Thus, Bing Crosby fans want to hear White Christmas > > >>>> <https://www.youtube.com/watch?v=P8Ozdqzjigg> until the day after > > >>>> christmas > > >>>> at which point this becomes a really bad recommendation. To some > > >> degree, > > >>>> this can be partially dealt with by using temporal tags as > > > indicators, > > >>> but > > >>>> that doesn't really allow a recommendation to be completely shut > > > down. > > >>>> > > >>>> The only way that I have seen to deal with this in the past is with > a > > >>>> manually designed kill switch. As much as possible, we would tag the > > >>>> obviously seasonal content and then add a filter to kill or > downgrade > > >>> that > > >>>> content the moment it went out of fashion. > > >>>> > > >>>> > > >>>> > > >>>> On Sat, Nov 11, 2017 at 9:43 AM, Johannes Schulte < > > >>>> johannes.schu...@gmail.com> wrote: > > >>>> > > >>>>> Pat, thanks for your help. especially the insights on how you > > > handle > > >>> the > > >>>>> system in production and the tips for multiple acyclic buckets. > > >>>>> Doing the combination signalls when querying sounds okay but as you > > >>> say, > > >>>>> it's always hard to find the right boosts without setting up some > > > ltr > > >>>>> system. If there would be a way to use the hotness when calculating > > >> the > > >>>>> indicators for subpopulations it would be great., especially for a > > >>> cross > > >>>>> recommender. > > >>>>> > > >>>>> e.g. people in greece _now_ are viewing this show/product whatever > > >>>>> > > >>>>> And here the popularity of the recommended item in this > > > subpopulation > > >>>> could > > >>>>> be overrseen when just looking at the overall derivatives of > > >> activity. > > >>>>> > > >>>>> Maybe one could do multiple G-Tests using sliding windows > > >>>>> * itemA&itemB vs population (classic) > > >>>>> * itemA&itemB(t) vs itemA&itemB(t-1) > > >>>>> .. > > >>>>> > > >>>>> and derive multiple indicators per item to be indexed. > > >>>>> > > >>>>> But this all relies on discretizing time into buckets and not > > > looking > > >>> at > > >>>>> the distribution of time between events like in presentation above > > > - > > >>>> maybe > > >>>>> there is something way smarter > > >>>>> > > >>>>> Johannes > > >>>>> > > >>>>> On Sat, Nov 11, 2017 at 2:50 AM, Pat Ferrel <p...@occamsmachete.com > > >> > > >>>> wrote: > > >>>>> > > >>>>>> BTW you should take time buckets that are relatively free of daily > > >>>> cycles > > >>>>>> like 3 day, week, or month buckets for “hot”. This is to remove > > >>> cyclical > > >>>>>> affects from the frequencies as much as possible since you need 3 > > >>>> buckets > > >>>>>> to see the change in change, 2 for the change, and 1 for the event > > >>>>> volume. > > >>>>>> > > >>>>>> > > >>>>>> On Nov 10, 2017, at 4:12 PM, Pat Ferrel <p...@occamsmachete.com> > > >>> wrote: > > >>>>>> > > >>>>>> So your idea is to find anomalies in event frequencies to detect > > >> “hot” > > >>>>>> items? > > >>>>>> > > >>>>>> Interesting, maybe Ted will chime in. > > >>>>>> > > >>>>>> What I do is take the frequency, first, and second, derivatives as > > >>>>>> measures of popularity, increasing popularity, and increasingly > > >>>>> increasing > > >>>>>> popularity. Put another way popular, trending, and hot. This is > > >> simple > > >>>> to > > >>>>>> do by taking 1, 2, or 3 time buckets and looking at the number of > > >>>> events, > > >>>>>> derivative (difference), and second derivative. Ranking all items > > > by > > >>>>> these > > >>>>>> value gives various measures of popularity or its increase. > > >>>>>> > > >>>>>> If your use is in a recommender you can add a ranking field to all > > >>> items > > >>>>>> and query for “hot” by using the ranking you calculated. > > >>>>>> > > >>>>>> If you want to bias recommendations by hotness, query with user > > >>> history > > >>>>>> and boost by your hot field. I suspect the hot field will tend to > > >>>>> overwhelm > > >>>>>> your user history in this case as it would if you used anomalies > > > so > > >>>> you’d > > >>>>>> also have to normalize the hotness to some range closer to the one > > >>>>> created > > >>>>>> by the user history matching score. I haven’t found a vey good way > > >> to > > >>>> mix > > >>>>>> these in a model so use hot as a method of backfill if you cannot > > >>> return > > >>>>>> enough recommendations or in places where you may want to show > > > just > > >>> hot > > >>>>>> items. There are several benefits to this method of using hot to > > >> rank > > >>>> all > > >>>>>> items including the fact that you can apply business rules to them > > >>> just > > >>>>> as > > >>>>>> normal recommendations—so you can ask for hot in “electronics” if > > >> you > > >>>>> know > > >>>>>> categories, or hot "in-stock" items, or ... > > >>>>>> > > >>>>>> Still anomaly detection does sound like an interesting approach. > > >>>>>> > > >>>>>> > > >>>>>> On Nov 10, 2017, at 3:13 PM, Johannes Schulte < > > >>>>> johannes.schu...@gmail.com> > > >>>>>> wrote: > > >>>>>> > > >>>>>> Hi "all", > > >>>>>> > > >>>>>> I am wondering what would be the best way to incorporate event > > > time > > >>>>>> information into the calculation of the G-Test. > > >>>>>> > > >>>>>> There is a claim here > > >>>>>> https://de.slideshare.net/tdunning/finding-changes-in-real-data > > >>>>>> > > >>>>>> saying "Time aware variant of G-Test is possible" > > >>>>>> > > >>>>>> I remember i experimented with exponentially decayed counts some > > >> years > > >>>>> ago > > >>>>>> and this involved changing the counts to doubles, but I suspect > > >> there > > >>> is > > >>>>>> some smarter way. What I don't get is the relation to a data > > >> structure > > >>>>> like > > >>>>>> T-Digest when working with a lot of counts / cells for every > > >>> combination > > >>>>> of > > >>>>>> items. Keeping a t-digest for every combination seems unfeasible. > > >>>>>> > > >>>>>> How would one incorporate event time into recommendations to > > > detect > > >>>>>> "hotness" of certain relations? Glad if someone has an idea... > > >>>>>> > > >>>>>> Cheers, > > >>>>>> > > >>>>>> Johannes > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>>> > > >>> > > >> > > > > > > > >