Thanks Alex! That is exactly what I thought was the limitation but wanted to make sure I'm not missing anything.
On Fri, Oct 15, 2010 at 10:51 AM, Alex Boisvert <a...@bizo.com> wrote: > As far as I know, Hive has no built-in support for sliding-window > analytics. There is an enhancement request here: > https://issues.apache.org/jira/browse/HIVE-896 > > <https://issues.apache.org/jira/browse/HIVE-896>Without such support, the > brute force way of doing things is, > > SELECT COUNT(DISTINCT user_id) FROM events WHERE event_date > start_date > and event_date <= end_date; > > (repeated N times to cover each day of your time window). > > alex > > On Thu, Oct 14, 2010 at 11:36 PM, Vijay <tec...@gmail.com> wrote: > >> Hi, I need help with this scenario. We have a table of events which has >> columns date, event (not important for this discussion), and user_id. It is >> obviously easy to find number of unique users for each day. I also need to >> find number of unique users in the last 30 days for each day. This is also >> quite simple to do for one day. However, I cannot figure out how to do this >> for a range of days. Something like this is pretty straightforward in most >> RDBMS but with HiveQL has I'm finding this hard. I might be missing >> something simple though. Any help is appreciated. Ideally the query should >> also be as optimized as possible as this table could be huge. >> >> Thanks, >> Vijay >> >> >