hi, Miloš 

Thanks for reply, redid hll definitely worth to try, anyone have done that 
before? my understand is that redis data structure is key-value, where the 
offset represents the userid for example, the concern is whether redis add 
another bottle-neck to whole system?

In addition, I feel TridentReach does unique count as well, my question is how 
to use external timestamp to define time windows, since I have not seen any 
sample code for timestamps.


thanks 

Alec

On Aug 21, 2014, at 4:36 PM, Miloš Solujić <milos.solu...@gmail.com> wrote:

> Alec,
> 
> For this one, I'd recommend redis hll like Gna explained earlier.
> 
> On 21 Aug 2014 23:31, "Sa Li" <sa.in.v...@gmail.com> wrote:
> Thanks all the reply
> 
> I have considered to integrate the java-hll package 
> (https://github.com/aggregateknowledge/java-hll), which uses hash-function 
> murmur_23 from google, I am having lot of exceptions to include it, I am 
> thinking if this hash is compatible with the distributed machnism of storm (I 
> might be naive). 
> 
> Another thing I am thinking is to use TridentReach, this is to count the 
> unique people exposed to a url page, I am thinking to combine this 
> tridentReach with kafkaSpout, my question, should I create a fixed size 
> Hashmap to contain the URL and array of visitors? So this means the fixed 
> size of hash map represents the window size of slide window. I wonder if this 
> is correct?
> 
> 
> thanks
> 
> Alec
> 
> On Aug 21, 2014, at 11:18 AM, Nima Movafaghrad <nima.movafagh...@oracle.com> 
> wrote:
> 
>> Alec,
>>  
>> You can use something like HyperLogLog or Bloomfilters to do Unique and/or 
>> Distinct counting. Just create a bolt that does that.
>>  
>> Nima
>>  
>> From: Sa Li [mailto:sa.in.v...@gmail.com] 
>> Sent: Wednesday, August 20, 2014 2:45 PM
>> To: user@storm.incubator.apache.org
>> Subject: distinct counting
>>  
>> Hi, all
>>  
>> I know storm does good job on counting and other aggregate jobs, I wonder if 
>> anyone ever did distinct counting in storm, and how would you set the time 
>> sliding window?
>>  
>> thanks
>>  
>> 
>> Alec
> 

Reply via email to