Thank you both for your input! To calculate moving average of active users, could you comment on whether to go for RDD based implementation or dataframe? If dataframe, will window function work here?
In general, how would spark behave when working with dataframe with date, week, month, quarter, year columns and groupie against each one one by one? On Sun, Jun 9, 2019 at 1:17 PM Jörn Franke <[email protected]> wrote: > Depending on what accuracy is needed, hyperloglogs can be an interesting > alternative > https://en.m.wikipedia.org/wiki/HyperLogLog > > Am 09.06.2019 um 15:59 schrieb big data <[email protected]>: > > From m opinion, Bitmap is the best solution for active users calculation. > Other solution almost bases on count(distinct) calculation process, which > is more slower. > > If you 've implemented Bitmap solution including how to build Bitmap, how > to load Bitmap, then Bitmap is the best choice. > 在 2019/6/5 下午6:49, Rishi Shah 写道: > > Hi All, > > Is there a best practice around calculating daily, weekly, monthly, > quarterly, yearly active users? > > One approach is to create a window of daily bitmap and aggregate it based > on period later. However I was wondering if anyone has a better approach to > tackling this problem.. > > -- > Regards, > > Rishi Shah > > -- Regards, Rishi Shah
