I think I see what you're saying about the temp table with start/end dates (30x expansion makes sense) and it sounds like it should work. I just need to figure out a good way to generate the table. Thanks!
Tom On Wed, Oct 10, 2012 at 11:05 PM, Igor Tatarinov <i...@decide.com> wrote: > If you have a lot of data, you might have to write a custom reducer (in > python) to keep track of the moving date window. > > If you don't have that much data, you might want to use a temp table > <start_date, end_date> such that datediff(end_date, start_date) < 30. To > create such a table, you can self-join a table of unique dates using the > above condition. Then, you would join your data with that table on > start_date and group by end_date counting distinct user_ids. Hope I got > that right :) > > The latter approach will essentially multiply the number of rows by 30. > > igor > decide.com > > > On Wed, Oct 10, 2012 at 3:05 PM, Tom Hubina <t...@z2live.com> wrote: > >> I'm trying to compute the number of active users in the previous 30 days >> for each day over a date range. I can't think of any way to do it directly >> within Hive so I'm wondering if you guys have any ideas. >> >> Basically the algorithm is something like: >> >> For each day in date range: >> SELECT day, COUNT(DISTINCT(userid)) FROM logins WHERE day - logins.day >> < 30; >> >> Thanks for your help! >> >> Tom >> >> >