[issue33084] Computing median, median_high an median_low in statistics library

2019-01-07 Thread Jonathan Fine
Jonathan Fine added the comment: Based on a quick review of the python docs, the bug report, PEP 450 and this thread, I suggest 1. More carefully draw attention to the NaN feature, in the documentation for existing Python versions. 2. Consider revising statistics.py so that it raises an

[issue33084] Computing median, median_high an median_low in statistics library

2019-01-06 Thread David Mertz
David Mertz added the comment: I believe that the current behavior of `statistics.median[|_low|_high\]` is simply broken. It relies on the particular behavior of Python sorting, which only utilizes `.__lt__()` between objects, and hence does not require a total order. I can think of

[issue33084] Computing median, median_high an median_low in statistics library

2018-10-07 Thread Steven D'Aprano
Steven D'Aprano added the comment: I want to revisit this for 3.8. I agree that the current implementation-dependent behaviour when there are NANs in the data is troublesome. But I don't think that there is a single right answer. I also agree with Mark that if we change median, we ought to

[issue33084] Computing median, median_high an median_low in statistics library

2018-03-20 Thread Eric V. Smith
Change by Eric V. Smith : -- nosy: +eric.smith ___ Python tracker ___ ___

[issue33084] Computing median, median_high an median_low in statistics library

2018-03-16 Thread Luc
Luc added the comment: If we are trying to fix this, the behavior should be like computing the mean or harmonic mean with the statistics library when there are missing values in the data. At least that way, it is consistent with how the statistics library works when

[issue33084] Computing median, median_high an median_low in statistics library

2018-03-16 Thread Maheshwar Kumar
Maheshwar Kumar added the comment: So From the above i am to conclude that removing np.nan is the best path to be taken? Also the above step is to be included in median_grouped as well right? -- ___ Python tracker

[issue33084] Computing median, median_high an median_low in statistics library

2018-03-16 Thread Luc
Luc added the comment: Just to make sure we are focused on the issue, the reported bug is with the statistics library (not with numpy). It happens, when there is at least one missing value in the data and involves the computation of the median, median_low and median_high

[issue33084] Computing median, median_high an median_low in statistics library

2018-03-16 Thread Steven D'Aprano
Steven D'Aprano added the comment: On Fri, Mar 16, 2018 at 02:32:36PM +, Mark Dickinson wrote: > For what it's worth, NumPy gives a result of NaN for the median of an array > that contains NaNs: By default, R gives the median of a list containing either NaN or

[issue33084] Computing median, median_high an median_low in statistics library

2018-03-16 Thread Mark Dickinson
Mark Dickinson added the comment: > then the answer being 90 is correct,right? How do you deduce that? Why 90 rather than 85 (or 87.5, or some other value)? For what it's worth, NumPy gives a result of NaN for the median of an array that contains NaNs: >>>

[issue33084] Computing median, median_high an median_low in statistics library

2018-03-16 Thread Maheshwar Kumar
Maheshwar Kumar added the comment: Well if i dont consider np.nan as missing data and consider all other values then the answer being 90 is correct,right? -- ___ Python tracker

[issue33084] Computing median, median_high an median_low in statistics library

2018-03-16 Thread Mark Dickinson
Mark Dickinson added the comment: > Will just removing all np.nan values do the job? Unfortunately, I don't think it's that simple. You want consistency across the various library calls, so if the various `median` functions are changed to treat NaNs as missing data, then

[issue33084] Computing median, median_high an median_low in statistics library

2018-03-16 Thread Maheshwar Kumar
Maheshwar Kumar added the comment: Will just removing all np.nan values do the job? Btw the values will be: median = 88.5 median_low = 85 median_high = 90 I can correct it and send a pull request. -- nosy: +maheshwark97 ___

[issue33084] Computing median, median_high an median_low in statistics library

2018-03-15 Thread Luc
New submission from Luc : When a list or dataframe serie contains NaN(s), the median, median_low and median_high are computed in Python 3.6.4 statistics library, however, the results are wrong. Either, it should return a NaN just like when we try to compute a mean or point