Re: "LLR with time"

Pat Ferrel Fri, 10 Nov 2017 17:51:06 -0800

BTW you should take time buckets that are relatively free of daily cycles like 
3 day, week, or month buckets for “hot”. This is to remove cyclical affects 
from the frequencies as much as possible since you need 3 buckets to see the 
change in change, 2 for the change, and 1 for the event volume.



On Nov 10, 2017, at 4:12 PM, Pat Ferrel <[email protected]> wrote:

So your idea is to find anomalies in event frequencies to detect “hot” items?

Interesting, maybe Ted will chime in.

What I do is take the frequency, first, and second, derivatives as measures of 
popularity, increasing popularity, and increasingly increasing popularity. Put 
another way popular, trending, and hot. This is simple to do by taking 1, 2, or 
3 time buckets and looking at the number of events, derivative (difference), 
and second derivative. Ranking all items by these value gives various measures 
of popularity or its increase. 

If your use is in a recommender you can add a ranking field to all items and 
query for “hot” by using the ranking you calculated. 

If you want to bias recommendations by hotness, query with user history and 
boost by your hot field. I suspect the hot field will tend to overwhelm your 
user history in this case as it would if you used anomalies so you’d also have 
to normalize the hotness to some range closer to the one created by the user 
history matching score. I haven’t found a vey good way to mix these in a model 
so use hot as a method of backfill if you cannot return enough recommendations 
or in places where you may want to show just hot items. There are several 
benefits to this method of using hot to rank all items including the fact that 
you can apply business rules to them just as normal recommendations—so you can 
ask for hot in “electronics” if you know categories, or hot "in-stock" items, 
or ...

Still anomaly detection does sound like an interesting approach.


On Nov 10, 2017, at 3:13 PM, Johannes Schulte <[email protected]> 
wrote:

Hi "all",

I am wondering what would be the best way to incorporate event time
information into the calculation of the G-Test.

There is a claim here
https://de.slideshare.net/tdunning/finding-changes-in-real-data

saying "Time aware variant of G-Test is possible"

I remember i experimented with exponentially decayed counts some years ago
and this involved changing the counts to doubles, but I suspect there is
some smarter way. What I don't get is the relation to a data structure like
T-Digest when working with a lot of counts / cells for every combination of
items. Keeping a t-digest for every combination seems unfeasible.

How would one incorporate event time into recommendations to detect
"hotness" of certain relations? Glad if someone has an idea...

Cheers,

Johannes

Re: "LLR with time"

Reply via email to