> I'm trying to explore the HLL UDF option to compute # of uniq users for >each time range (week, month, yr, etc.) and wanted to know if > its possible to just maintain HLL struct for each day and then use those >to compute the uniqs for various time > ranges using these per day structs instead of running the queries across >all the data?
Yes, unions of raw HLL can be done (though not intersects). https://github.com/t3rmin4t0r/hive-hll-udf Or better yet, use the Yahoo sketches which work better than raw HLL. http://yahooeng.tumblr.com/post/135390948446/data-sketches + http://datasketches.github.io/ + https://github.com/DataSketches/sketches-hive Cheers, Gopal