Hi Ted,

I need percentiles.  Ideally not pre-defined ones, because one person may want 
e.g. 70th pctile, while somebody else might want 75th pctile for the same 
metric.

Deal breakers:
High memory footprint. ("high" means "higher than QDigest from stream-lib" for 
us.... and we could test and compare with QDigest relatively easily with live 
data)
Algos that create data structures that cannot be merged
Loss of accuracy that is not predictably small or configurable

Thank you,
Otis
----

Performance Monitoring for Solr / ElasticSearch / Hadoop / HBase - 
http://sematext.com/spm 




>________________________________
> From: Ted Dunning <ted.dunn...@gmail.com>
>To: "user@mahout.apache.org" <user@mahout.apache.org>; Otis Gospodnetic 
><otis_gospodne...@yahoo.com> 
>Sent: Wednesday, August 7, 2013 11:48 PM
>Subject: Re: Is OnlineSummarizer mergeable?
> 
>
>
>Otis,
>
>
>What statistics do you need?
>
>
>What guarantees?
>
>
>
>
>
>On Wed, Aug 7, 2013 at 1:26 PM, Otis Gospodnetic <otis_gospodne...@yahoo.com> 
>wrote:
>
>Hi Ted,
>>
>>I'm actually trying to find an alternative to QDigest (the stream-lib impl 
>>specifically) because even though it seems good, we have to deal with crazy 
>>volumes of data in SPM (performance monitoring service, see signature)... I'm 
>>hoping we can find something that has both a lower memory footprint than 
>>QDigest AND that is mergeable a la QDigest.  Utopia?
>>
>>Thanks,
>>Otis
>>----
>>Performance Monitoring for Solr / ElasticSearch / Hadoop / HBase - 
>>http://sematext.com/spm 
>>
>>
>>
>>
>>>________________________________
>>> From: Ted Dunning <ted.dunn...@gmail.com>
>>>To: "user@mahout.apache.org" <user@mahout.apache.org>
>>>Sent: Wednesday, August 7, 2013 4:51 PM
>>>Subject: Re: Is OnlineSummarizer mergeable?
>>>
>>>
>>>It isn't as mergeable as I would like.  If you have randomized record
>>>selection, it should be possible, but perverse ordering can cause serious
>>>errors.
>>>
>>>It would be better to use something like a Q-digest.
>>>
>>>http://www.cs.virginia.edu/~son/cs851/papers/ucsb.sensys04.pdf
>>>
>>>
>>>
>>>
>>>On Wed, Aug 7, 2013 at 4:21 AM, Otis Gospodnetic <otis.gospodne...@gmail.com
>>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Is OnlineSummarizer algo "mergeable"?
>>>>
>>>> Say that we compute a percentile for some metric for time 12:00-12:01
>>>> and store that somewhere, then we compute it for 1201-12:02 and store
>>>> that separately, and so on.
>>>>
>>>> Can we then later merge these computed and previously stored
>>>> percentile "instances" and get an accurate value?
>>>>
>>>> Thanks,
>>>> Otis
>>>> --
>>>> Performance Monitoring -- http://sematext.com/spm
>>>> Solr & ElasticSearch Support -- http://sematext.com/
>>>>
>>>
>>>
>>>
>
>
>

Reply via email to