Thank you both for your input!

To calculate moving average of active users, could you comment on whether
to go for RDD based implementation or dataframe? If dataframe, will window
function work here?

In general, how would spark behave when working with dataframe with date,
week, month, quarter, year columns and groupie against each one one by one?



On Sun, Jun 9, 2019 at 1:17 PM Jörn Franke <jornfra...@gmail.com> wrote:

> Depending on what accuracy is needed, hyperloglogs can be an interesting
> alternative
> https://en.m.wikipedia.org/wiki/HyperLogLog
>
> Am 09.06.2019 um 15:59 schrieb big data <bigdatab...@outlook.com>:
>
> From m opinion, Bitmap is the best solution for active users calculation.
> Other solution almost bases on count(distinct) calculation process, which
> is more slower.
>
> If you 've implemented Bitmap solution including how to build Bitmap, how
> to load Bitmap, then Bitmap is the best choice.
> 在 2019/6/5 下午6:49, Rishi Shah 写道:
>
> Hi All,
>
> Is there a best practice around calculating daily, weekly, monthly,
> quarterly, yearly active users?
>
> One approach is to create a window of daily bitmap and aggregate it based
> on period later. However I was wondering if anyone has a better approach to
> tackling this problem..
>
> --
> Regards,
>
> Rishi Shah
>
>

-- 
Regards,

Rishi Shah

Reply via email to