Public bug reported:

The output of the ping command gives to the user several statistical paramters 
of the measured values (seen, in statistical sense, as a sample of a 
statistical population) e.g.
> rtt min/avg/max/mdev = 423.152/728.492/1306.341/220.001 ms, pipe 2

One of them is surely not appropriate in order to describe data in that
case, one is probably inappropriate.

First, the mean of a sample is a good estimator (in statistical sence) for the 
mean of an underlying statistical distribution (in statistical sense).
But, this is only true for statistical distributions that do possess a 
so-called "first moment" (or "expectation value") or mean.

Some do not. And for that cases, giving the mean of the sample is misleading, 
because it is only an unreliable, fluctuating property of the random sample - 
and not of the statistical population!
The mean of the random sample does not converge (e.g. with increasing sample 
size) to a location paramter of the underlying population or distribution.
An user will interprete the given value as information of some kind of "middle" 
of the latencies, that will occure in the data conneciton. And this 
interpretation is wrong. Therefor, the statistical parameter "avg", mean of the 
sample, is misleading and therefor inappropriate.

Latency measurements are a standard case, where distrubutions occure, that do 
not possess first moments or expectation values (or, at least, do contain a 
large amount of outliers).
In such cases, the more robust (and easier) measure of location, called 
"median" should be used, see
http://en.wikipedia.org/wiki/Median
http://en.wikipedia.org/wiki/File:Comparison_mean_median_mode.svg

(As a second reason, the skew of the latency measurements also
indicates, that a sample mean is not a good choice for an estimator for
the measure of location of the distribution.)

Second, a better measure of dispersion should be used. Wikipedia:
"When the median is used as a location parameter in descriptive statistics, 
there are several choices for a measure of variability: the range, the 
interquartile range, the mean absolute deviation, and the median absolute 
deviation."

I would argue for the median absolute deviation.
(I wrote "probalby inappropriate", because "mdev" does not indicate a specific 
statistical technical term, so I do not know, what ist calculated. If it is 
"(square root of) sample variance" or "estimator for standard deviation", then 
it is surely inappropriate.)

Ubuntu release: 14.04.1 LTS
iputils-ping:  3:20121221-4ubuntu1.1

** Affects: iputils (Ubuntu)
     Importance: Undecided
         Status: New

** Summary changed:

- inappropriate statistical paramters in ping output
+ inappropriate statistical parameters in ping output

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to iputils in Ubuntu.
https://bugs.launchpad.net/bugs/1376606

Title:
  inappropriate statistical parameters in ping output

Status in “iputils” package in Ubuntu:
  New

Bug description:
  The output of the ping command gives to the user several statistical 
paramters of the measured values (seen, in statistical sense, as a sample of a 
statistical population) e.g.
  > rtt min/avg/max/mdev = 423.152/728.492/1306.341/220.001 ms, pipe 2

  One of them is surely not appropriate in order to describe data in
  that case, one is probably inappropriate.

  First, the mean of a sample is a good estimator (in statistical sence) for 
the mean of an underlying statistical distribution (in statistical sense).
  But, this is only true for statistical distributions that do possess a 
so-called "first moment" (or "expectation value") or mean.

  Some do not. And for that cases, giving the mean of the sample is misleading, 
because it is only an unreliable, fluctuating property of the random sample - 
and not of the statistical population!
  The mean of the random sample does not converge (e.g. with increasing sample 
size) to a location paramter of the underlying population or distribution.
  An user will interprete the given value as information of some kind of 
"middle" of the latencies, that will occure in the data conneciton. And this 
interpretation is wrong. Therefor, the statistical parameter "avg", mean of the 
sample, is misleading and therefor inappropriate.

  Latency measurements are a standard case, where distrubutions occure, that do 
not possess first moments or expectation values (or, at least, do contain a 
large amount of outliers).
  In such cases, the more robust (and easier) measure of location, called 
"median" should be used, see
  http://en.wikipedia.org/wiki/Median
  http://en.wikipedia.org/wiki/File:Comparison_mean_median_mode.svg

  (As a second reason, the skew of the latency measurements also
  indicates, that a sample mean is not a good choice for an estimator
  for the measure of location of the distribution.)

  Second, a better measure of dispersion should be used. Wikipedia:
  "When the median is used as a location parameter in descriptive statistics, 
there are several choices for a measure of variability: the range, the 
interquartile range, the mean absolute deviation, and the median absolute 
deviation."

  I would argue for the median absolute deviation.
  (I wrote "probalby inappropriate", because "mdev" does not indicate a 
specific statistical technical term, so I do not know, what ist calculated. If 
it is "(square root of) sample variance" or "estimator for standard deviation", 
then it is surely inappropriate.)

  Ubuntu release: 14.04.1 LTS
  iputils-ping:  3:20121221-4ubuntu1.1

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/iputils/+bug/1376606/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to