[issue39218] Assertion failure when calling statistics.variance() on a float32 Numpy array

2021-08-30 Thread Raymond Hettinger
Change by Raymond Hettinger : -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker ___ ___

[issue39218] Assertion failure when calling statistics.variance() on a float32 Numpy array

2021-08-30 Thread Raymond Hettinger
Raymond Hettinger added the comment: New changeset 793f55bde9b0299100c12ddb0e6949c6eb4d85e5 by Raymond Hettinger in branch 'main': bpo-39218: Improve accuracy of variance calculation (GH-27960) https://github.com/python/cpython/commit/793f55bde9b0299100c12ddb0e6949c6eb4d85e5 -- ___

[issue39218] Assertion failure when calling statistics.variance() on a float32 Numpy array

2021-08-26 Thread Raymond Hettinger
Raymond Hettinger added the comment: > what it's correcting for is an inaccurate value of "c" [...] I'll leave the logic as-is and just add a note about what is being corrected. > Numerically, it's probably not helpful. To make a difference, the mean would have to have huge magnitude relativ

[issue39218] Assertion failure when calling statistics.variance() on a float32 Numpy array

2021-08-26 Thread Mark Dickinson
Mark Dickinson added the comment: > what it's correcting for is an inaccurate value of "c" [...] In more detail: Suppose "m" is the true mean of the x in data, but all we have is an approximate mean "c" to work with. Write "e" for the error in that approximation, so that c = m + e. Then (us

[issue39218] Assertion failure when calling statistics.variance() on a float32 Numpy array

2021-08-26 Thread Mark Dickinson
Mark Dickinson added the comment: > The rounding correction in _ss() looks mathematically incorrect to me [...] I don't think it was intended as a rounding correction - I think it's just computing the variance (prior to the division by n or n-1) of the `(x - c)` terms using the standard "exp

[issue39218] Assertion failure when calling statistics.variance() on a float32 Numpy array

2021-08-25 Thread Raymond Hettinger
Change by Raymond Hettinger : -- keywords: +patch pull_requests: +26406 stage: -> patch review pull_request: https://github.com/python/cpython/pull/27960 ___ Python tracker __

[issue39218] Assertion failure when calling statistics.variance() on a float32 Numpy array

2021-08-25 Thread Raymond Hettinger
Raymond Hettinger added the comment: The rounding correction in _ss() looks mathematically incorrect to me: ∑ (xᵢ - x̅ + εᵢ)² = ∑ (xᵢ - x̅)² - (∑ εᵢ)² ÷ n If we drop this logic (which seems completely bogus), all the tests still pass and the code becomes cleaner: def _ss(data, c=None)

[issue39218] Assertion failure when calling statistics.variance() on a float32 Numpy array

2021-08-25 Thread Tal Einat
Change by Tal Einat : -- nosy: -taleinat ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python

[issue39218] Assertion failure when calling statistics.variance() on a float32 Numpy array

2021-08-20 Thread Raymond Hettinger
Raymond Hettinger added the comment: Removing the assertion and implementing Steven's idea seems like the best way to go: sum((y:=(x-c)) * y for x in data) -- ___ Python tracker ___

[issue39218] Assertion failure when calling statistics.variance() on a float32 Numpy array

2021-08-20 Thread Irit Katriel
Irit Katriel added the comment: I've reproduced this on 3.9 and 3.10. This part of the code in main is still the same, so the issue is probably there even though we don't have numpy with which to test. -- nosy: +iritkatriel versions: +Python 3.10, Python 3.11, Python 3.9 -Python 3.8

[issue39218] Assertion failure when calling statistics.variance() on a float32 Numpy array

2020-01-05 Thread Reed
Reed added the comment: Thank you all for the comments! Either using (x-c)*(x-c), or removing the assertion and changing the final line to `return (U, total)`, seem reasonable. I slightly prefer the latter case, due to Mark's comments about x*x being faster and simpler than x**2. But I am no

[issue39218] Assertion failure when calling statistics.variance() on a float32 Numpy array

2020-01-05 Thread Mark Dickinson
Mark Dickinson added the comment: [Karthikeyan] > can possibly break again if (x-c) * (x-c) was also changed to return float64 > in future I think it's safe to assume that multiplying two NumPy float32's will continue to give a float32 back in the future; NumPy has no reason to give back a

[issue39218] Assertion failure when calling statistics.variance() on a float32 Numpy array

2020-01-05 Thread Steven D'Aprano
Steven D'Aprano added the comment: Nice analysis and bug report, thank you! That's pretty strange behaviour for float32, but I guess we're stuck with it. I wonder if the type assertion has outlived its usefulness? I.e. drop the `T == U` part and change the assertion to `assert count == count

[issue39218] Assertion failure when calling statistics.variance() on a float32 Numpy array

2020-01-04 Thread Karthikeyan Singaravelan
Karthikeyan Singaravelan added the comment: I think it's more of an implementation artifact of numpy eq definition for float32 and float64 and can possibly break again if (x-c) * (x-c) was also changed to return float64 in future. -- nosy: +rhettinger, steven.daprano, taleinat, xtrea

[issue39218] Assertion failure when calling statistics.variance() on a float32 Numpy array

2020-01-04 Thread Reed
New submission from Reed : If a float32 Numpy array is passed to statistics.variance(), an assertion failure occurs. For example: import statistics import numpy as np x = np.array([1, 2], dtype=np.float32) statistics.variance(x) The assertion error is: assert T == U and c