> I'm trying to explore the HLL UDF option to compute # of uniq users for
>each time range (week, month, yr, etc.) and wanted to know if
> its possible to just maintain HLL struct for each day and then use those
>to compute the uniqs for various time
> ranges using these per day structs instead of running the queries across
>all the data?

Yes, unions of raw HLL can be done (though not intersects).

https://github.com/t3rmin4t0r/hive-hll-udf


Or better yet, use the Yahoo sketches which work better than raw HLL.

http://yahooeng.tumblr.com/post/135390948446/data-sketches

+
http://datasketches.github.io/

+
https://github.com/DataSketches/sketches-hive


Cheers,
Gopal

Reply via email to