This approach is fine for relatively well-behaved distributions. Anything more skewed than, say, an exponential or as long tailed as a t(3) distribution is likely to have troubles with this approach.
See http://search-lucene.com/jd/mahout/math/org/apache/mahout/math/stats/OnlineSummarizer.htmlfor the alternative I have been suggesting. It can keep accurate estimates of any quantile that you like. On Mon, Mar 14, 2011 at 5:17 PM, sebb <seb...@gmail.com> wrote: > > In JMeter we needed to display long running percentiles without using > excess memory, and someone came up with the idea of using buckets for > ranges of values. So instead of keeping details on each sample elapsed > time, we increment the count for the appropriate bucket. > > If the range of values is too large to use a single bucket for each > value, each bucket can represent a range of values. > These ranges can potentially be non-uniform though that does > complicate the calculations. > > JMeter actually uses a TreeMap for the values and counts - the values > need to be sorted in order to calculate percentiles. > > Depending on the data-set, it might be possible to used fixed arrays > instead of the TreeMap.