Hi Ted,

Yes, that's what we did recently, too: 
https://github.com/clearspring/stream-lib/pull/47

... but it's still a little too phat...which is what made me think of your 
OnlineSummarizer as a possible, slimmer alternative.

Otis 
----
Performance Monitoring for Solr / ElasticSearch / Hadoop / HBase - 
http://sematext.com/spm 




>________________________________
> From: Ted Dunning <ted.dunn...@gmail.com>
>To: "user@mahout.apache.org" <user@mahout.apache.org>; Otis Gospodnetic 
><otis_gospodne...@yahoo.com> 
>Sent: Thursday, August 8, 2013 8:27 AM
>Subject: Re: Is OnlineSummarizer mergeable?
> 
>
>
>I just looked at the source for QDigest from streamlib.
>
>
>I think that the memory usage could be trimmed substantially, possibly by as 
>much as 5:1 by using more primitive friendly structures.
>
>
>
>
>
>On Wed, Aug 7, 2013 at 3:04 PM, Otis Gospodnetic <otis_gospodne...@yahoo.com> 
>wrote:
>
>Hi Ted,
>>
>>I need percentiles.  Ideally not pre-defined ones, because one person may 
>>want e.g. 70th pctile, while somebody else might want 75th pctile for the 
>>same metric.
>>
>>Deal breakers:
>>High memory footprint. ("high" means "higher than QDigest from stream-lib" 
>>for us.... and we could test and compare with QDigest relatively easily with 
>>live data)
>>Algos that create data structures that cannot be merged
>>Loss of accuracy that is not predictably small or configurable
>>
>>Thank you,
>>Otis
>>----
>>
>>Performance Monitoring for Solr / ElasticSearch / Hadoop / HBase - 
>>http://sematext.com/spm 
>>
>>
>>
>>
>>>________________________________
>>> From: Ted Dunning <ted.dunn...@gmail.com>
>>>To: "user@mahout.apache.org" <user@mahout.apache.org>; Otis Gospodnetic 
>>><otis_gospodne...@yahoo.com>
>>>Sent: Wednesday, August 7, 2013 11:48 PM
>>>Subject: Re: Is OnlineSummarizer mergeable?
>>>
>>>
>>>
>>>Otis,
>>>
>>>
>>>What statistics do you need?
>>>
>>>
>>>What guarantees?
>>>
>>>
>>>
>>>
>>>
>>>On Wed, Aug 7, 2013 at 1:26 PM, Otis Gospodnetic 
>>><otis_gospodne...@yahoo.com> wrote:
>>>
>>>Hi Ted,
>>>>
>>>>I'm actually trying to find an alternative to QDigest (the stream-lib impl 
>>>>specifically) because even though it seems good, we have to deal with crazy 
>>>>volumes of data in SPM (performance monitoring service, see signature)... 
>>>>I'm hoping we can find something that has both a lower memory footprint 
>>>>than QDigest AND that is mergeable a la QDigest.  Utopia?
>>>>
>>>>Thanks,
>>>>Otis
>>>>----
>>>>Performance Monitoring for Solr / ElasticSearch / Hadoop / HBase - 
>>>>http://sematext.com/spm 
>>>>
>>>>
>>>>
>>>>
>>>>>________________________________
>>>>> From: Ted Dunning <ted.dunn...@gmail.com>
>>>>>To: "user@mahout.apache.org" <user@mahout.apache.org>
>>>>>Sent: Wednesday, August 7, 2013 4:51 PM
>>>>>Subject: Re: Is OnlineSummarizer mergeable?
>>>>>
>>>>>
>>>>>It isn't as mergeable as I would like.  If you have randomized record
>>>>>selection, it should be possible, but perverse ordering can cause serious
>>>>>errors.
>>>>>
>>>>>It would be better to use something like a Q-digest.
>>>>>
>>>>>http://www.cs.virginia.edu/~son/cs851/papers/ucsb.sensys04.pdf
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>On Wed, Aug 7, 2013 at 4:21 AM, Otis Gospodnetic 
>>>>><otis.gospodne...@gmail.com
>>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Is OnlineSummarizer algo "mergeable"?
>>>>>>
>>>>>> Say that we compute a percentile for some metric for time 12:00-12:01
>>>>>> and store that somewhere, then we compute it for 1201-12:02 and store
>>>>>> that separately, and so on.
>>>>>>
>>>>>> Can we then later merge these computed and previously stored
>>>>>> percentile "instances" and get an accurate value?
>>>>>>
>>>>>> Thanks,
>>>>>> Otis
>>>>>> --
>>>>>> Performance Monitoring -- http://sematext.com/spm
>>>>>> Solr & ElasticSearch Support -- http://sematext.com/
>>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>>
>>>
>
>
>

Reply via email to