The problem is that "day" is the value in the for loop. I've tried doing a join with a table that contains the set of days, but the problem is that you can't do a join on a range ... Hive only support equality in the join. For example:
INSERT OVERWRITE TABLE mausummary SELECT day, COUNT(DISTINCT(userid)) FROM days JOIN logins ON date_add(logins.t, 30) >= days.day AND logins.t <= days.day GROUP BY day; fails because of the range in the join. Tom On Wed, Oct 10, 2012 at 8:50 PM, MiaoMiao <liy...@gmail.com> wrote: > How about > SELECT day, COUNT(DISTINCT(userid)) FROM logins WHERE day - logins.day > < 30 GROUP BY day; > > On Thu, Oct 11, 2012 at 6:05 AM, Tom Hubina <t...@z2live.com> wrote: > > I'm trying to compute the number of active users in the previous 30 days > for > > each day over a date range. I can't think of any way to do it directly > > within Hive so I'm wondering if you guys have any ideas. > > > > Basically the algorithm is something like: > > > > For each day in date range: > > SELECT day, COUNT(DISTINCT(userid)) FROM logins WHERE day - > logins.day < > > 30; > > > > Thanks for your help! > > > > Tom > > >