Re: Rolling MAU computation

Vijay Fri, 12 Oct 2012 13:43:05 -0700

For both simplicity and efficiency, I'd recommend making the
mausummary table partitioned on date and generate the MAU data each
day. There is no reason to generate MAU data for a given day more than
once (unless you find some problems with the source data or
something).


On Fri, Oct 12, 2012 at 1:08 PM, Igor Tatarinov <i...@decide.com> wrote:
> You just need to put the join condition in the WHERE clause. That way Hive
> will do a cartesian product followed by a filter.
>
> On Fri, Oct 12, 2012 at 1:02 PM, Tom Hubina <t...@z2live.com> wrote:
>>
>> I think I see what you're saying about the temp table with start/end dates
>> (30x expansion makes sense) and it sounds like it should work. I just need
>> to figure out a good way to generate the table. Thanks!
>>
>> Tom
>>
>> On Wed, Oct 10, 2012 at 11:05 PM, Igor Tatarinov <i...@decide.com> wrote:
>>>
>>> If you have a lot of data, you might have to write a custom reducer (in
>>> python) to keep track of the moving date window.
>>>
>>> If you don't have that much data, you might want to use a temp table
>>> <start_date, end_date> such that datediff(end_date, start_date) < 30. To
>>> create such a table, you can self-join a table of unique dates using the
>>> above condition. Then, you would join your data with that table on
>>> start_date and group by end_date counting distinct user_ids. Hope I got that
>>> right :)
>>>
>>> The latter approach will essentially multiply the number of rows by 30.
>>>
>>> igor
>>> decide.com
>>>
>>>
>>> On Wed, Oct 10, 2012 at 3:05 PM, Tom Hubina <t...@z2live.com> wrote:
>>>>
>>>> I'm trying to compute the number of active users in the previous 30 days
>>>> for each day over a date range. I can't think of any way to do it directly
>>>> within Hive so I'm wondering if you guys have any ideas.
>>>>
>>>> Basically the algorithm is something like:
>>>>
>>>> For each day in date range:
>>>>    SELECT day, COUNT(DISTINCT(userid)) FROM logins WHERE day -
>>>> logins.day < 30;
>>>>
>>>> Thanks for your help!
>>>>
>>>> Tom
>>>>
>>>
>>
>

Re: Rolling MAU computation

Reply via email to