anyone use Storm kafkaSpout implement a HyperLoglog

2014-08-13 Thread Sa Li
Hi, All I am thinking to implement HyperLoglog by storm with KafkaSpout, and output not only the distinct counts, but also some kind of bitmap string, anyone did the similar job, a guide for start is highly appreciated. thanks Alec

Re: anyone use Storm kafkaSpout implement a HyperLoglog

2014-08-15 Thread Sa Li
Hi, all Continue this topic, I am bit of confused whether I should implement the hyperloglog in storm or perform the postgresql-hll extension in postgresDB, if I can effectively count the uniques in postgresql-hll, and write into a separate distinct count table, why would I implement that in storm

Re: anyone use Storm kafkaSpout implement a HyperLoglog

2014-08-15 Thread Sa Li
postgresql-hll: the PostgreSQL extension adding HyperLogLog data structures seems pretty good, If we do counting directly in postgresDB. On Fri, Aug 15, 2014 at 1:38 PM, Sa Li wrote: > Hi, all > > Continue this topic, I am bit of confused whether I should implement the > hyperloglog in storm or

Re: anyone use Storm kafkaSpout implement a HyperLoglog

2014-08-15 Thread Sam Goodwin
I'm not too sure about how postgres hll works but i'm assuming you're going to have to send every tuple to Postgres DB remotely. This is very expensive. Where if you build your hll data strucuture in storm you only have to persist the fixed size serialized version of the hll to the database each tr

Re: anyone use Storm kafkaSpout implement a HyperLoglog

2014-08-15 Thread Sa Li
You are right, my plan was to store the cardinalities as well as maybe a bitmap string into database, that surely save huge space. However, the we already have a channel to populate the web events into postgres for some other analytics use, which is kinda parallel with the process kafka listening t