For grouping with each: look into grouping sets https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-multi-dimensional-aggregation.html
Am Di., 11. Juni 2019 um 06:09 Uhr schrieb Rishi Shah < rishishah.s...@gmail.com>: > Thank you both for your input! > > To calculate moving average of active users, could you comment on whether > to go for RDD based implementation or dataframe? If dataframe, will window > function work here? > > In general, how would spark behave when working with dataframe with date, > week, month, quarter, year columns and groupie against each one one by one? > > > > On Sun, Jun 9, 2019 at 1:17 PM Jörn Franke <jornfra...@gmail.com> wrote: > >> Depending on what accuracy is needed, hyperloglogs can be an interesting >> alternative >> https://en.m.wikipedia.org/wiki/HyperLogLog >> >> Am 09.06.2019 um 15:59 schrieb big data <bigdatab...@outlook.com>: >> >> From m opinion, Bitmap is the best solution for active users calculation. >> Other solution almost bases on count(distinct) calculation process, which >> is more slower. >> >> If you 've implemented Bitmap solution including how to build Bitmap, how >> to load Bitmap, then Bitmap is the best choice. >> 在 2019/6/5 下午6:49, Rishi Shah 写道: >> >> Hi All, >> >> Is there a best practice around calculating daily, weekly, monthly, >> quarterly, yearly active users? >> >> One approach is to create a window of daily bitmap and aggregate it based >> on period later. However I was wondering if anyone has a better approach to >> tackling this problem.. >> >> -- >> Regards, >> >> Rishi Shah >> >> > > -- > Regards, > > Rishi Shah >