Hi Ted,

Yes, that's what we did recently, too: 

... but it's still a little too phat...which is what made me think of your 
OnlineSummarizer as a possible, slimmer alternative.

Performance Monitoring for Solr / ElasticSearch / Hadoop / HBase - 

> From: Ted Dunning <ted.dunn...@gmail.com>
>To: "user@mahout.apache.org" <user@mahout.apache.org>; Otis Gospodnetic 
>Sent: Thursday, August 8, 2013 8:27 AM
>Subject: Re: Is OnlineSummarizer mergeable?
>I just looked at the source for QDigest from streamlib.
>I think that the memory usage could be trimmed substantially, possibly by as 
>much as 5:1 by using more primitive friendly structures.
>On Wed, Aug 7, 2013 at 3:04 PM, Otis Gospodnetic <otis_gospodne...@yahoo.com> 
>Hi Ted,
>>I need percentiles.  Ideally not pre-defined ones, because one person may 
>>want e.g. 70th pctile, while somebody else might want 75th pctile for the 
>>same metric.
>>Deal breakers:
>>High memory footprint. ("high" means "higher than QDigest from stream-lib" 
>>for us.... and we could test and compare with QDigest relatively easily with 
>>live data)
>>Algos that create data structures that cannot be merged
>>Loss of accuracy that is not predictably small or configurable
>>Thank you,
>>Performance Monitoring for Solr / ElasticSearch / Hadoop / HBase - 
>>> From: Ted Dunning <ted.dunn...@gmail.com>
>>>To: "user@mahout.apache.org" <user@mahout.apache.org>; Otis Gospodnetic 
>>>Sent: Wednesday, August 7, 2013 11:48 PM
>>>Subject: Re: Is OnlineSummarizer mergeable?
>>>What statistics do you need?
>>>What guarantees?
>>>On Wed, Aug 7, 2013 at 1:26 PM, Otis Gospodnetic 
>>><otis_gospodne...@yahoo.com> wrote:
>>>Hi Ted,
>>>>I'm actually trying to find an alternative to QDigest (the stream-lib impl 
>>>>specifically) because even though it seems good, we have to deal with crazy 
>>>>volumes of data in SPM (performance monitoring service, see signature)... 
>>>>I'm hoping we can find something that has both a lower memory footprint 
>>>>than QDigest AND that is mergeable a la QDigest.  Utopia?
>>>>Performance Monitoring for Solr / ElasticSearch / Hadoop / HBase - 
>>>>> From: Ted Dunning <ted.dunn...@gmail.com>
>>>>>To: "user@mahout.apache.org" <user@mahout.apache.org>
>>>>>Sent: Wednesday, August 7, 2013 4:51 PM
>>>>>Subject: Re: Is OnlineSummarizer mergeable?
>>>>>It isn't as mergeable as I would like.  If you have randomized record
>>>>>selection, it should be possible, but perverse ordering can cause serious
>>>>>It would be better to use something like a Q-digest.
>>>>>On Wed, Aug 7, 2013 at 4:21 AM, Otis Gospodnetic 
>>>>>> wrote:
>>>>>> Hi,
>>>>>> Is OnlineSummarizer algo "mergeable"?
>>>>>> Say that we compute a percentile for some metric for time 12:00-12:01
>>>>>> and store that somewhere, then we compute it for 1201-12:02 and store
>>>>>> that separately, and so on.
>>>>>> Can we then later merge these computed and previously stored
>>>>>> percentile "instances" and get an accurate value?
>>>>>> Thanks,
>>>>>> Otis
>>>>>> --
>>>>>> Performance Monitoring -- http://sematext.com/spm
>>>>>> Solr & ElasticSearch Support -- http://sematext.com/

Reply via email to