This approach is fine for relatively well-behaved distributions.  Anything
more skewed than, say, an exponential or as long tailed as a t(3)
distribution is likely to have troubles with this approach.

See
http://search-lucene.com/jd/mahout/math/org/apache/mahout/math/stats/OnlineSummarizer.htmlfor
the alternative I have been suggesting.  It can keep accurate
estimates
of any quantile that you like.

On Mon, Mar 14, 2011 at 5:17 PM, sebb <seb...@gmail.com> wrote:

>
> In JMeter we needed to display long running percentiles without using
> excess memory, and someone came up with the idea of using buckets for
> ranges of values. So instead of keeping details on each sample elapsed
> time, we increment the count for the appropriate bucket.
>
> If the range of values is too large to use a single bucket for each
> value, each bucket can represent a range of values.
> These ranges can potentially be non-uniform though that does
> complicate the calculations.
>
> JMeter actually uses a TreeMap for the values and counts - the values
> need to be sorted in order to calculate percentiles.
>
> Depending on the data-set, it might be possible to used fixed arrays
> instead of the TreeMap.

Reply via email to