[Python-ideas] Re: Fix statistics.median()?

2019-12-30 Thread Andrew Barnert via Python-ideas
On Dec 30, 2019, at 14:35, David Mertz wrote: > > On Mon, Dec 30, 2019, 5:17 PM Andrew Barnert >> The fact that all three of the alternate orders anyone’s asked for or >> suggested turned out to be spurious, and nobody can think of a good use for >> a different one, that’s a pretty good

[Python-ideas] Re: Fix statistics.median()?

2019-12-30 Thread Andrew Barnert via Python-ideas
> On Dec 30, 2019, at 14:05, Richard Damon wrote: > > On 12/30/19 4:22 PM, Andrew Barnert via Python-ideas wrote: >>> On Dec 30, 2019, at 06:50, Richard Damon wrote: >>> On 12/30/19 12:06 AM, Andrew Barnert wrote: > On Dec 29, 2019, at 20:04, Richard Damon wrote: > Thus your

[Python-ideas] Re: Fix statistics.median()?

2019-12-30 Thread Steven D'Aprano
On Mon, Dec 30, 2019 at 06:10:37PM +1100, Chris Angelico wrote: > But the question is: What should the statistics module do with a nan? > This can only be answered by understanding what "nan" means in a > statistical context, which is only tangentially related to the > question of whether nan is

[Python-ideas] Re: Fix statistics.median()?

2019-12-30 Thread David Mertz
On Mon, Dec 30, 2019, 5:17 PM Andrew Barnert > The fact that all three of the alternate orders anyone’s asked for or > suggested turned out to be spurious, and nobody can think of a good use for > a different one, that’s a pretty good argument that YAGNI. > I think everyone agrees that the only

[Python-ideas] Re: Fix statistics.median()?

2019-12-30 Thread Andrew Barnert via Python-ideas
> On Dec 30, 2019, at 08:55, David Mertz wrote: >> Presumably the end user (unlike the statistics module) knows what data they >> have. > > No, Steven is right here. In Python we might very sensibly mix numeric > datatypes. The statistics module explicitly doesn’t support doing so. Which

[Python-ideas] Re: Fix statistics.median()?

2019-12-30 Thread Richard Damon
On 12/30/19 4:22 PM, Andrew Barnert via Python-ideas wrote: On Dec 30, 2019, at 06:50, Richard Damon wrote: On 12/30/19 12:06 AM, Andrew Barnert wrote: On Dec 29, 2019, at 20:04, Richard Damon wrote: Thus your total_order, while not REALLY a total order, is likely good enough for most

[Python-ideas] Re: Fix statistics.median()?

2019-12-30 Thread Andrew Barnert via Python-ideas
On Dec 30, 2019, at 06:50, Richard Damon wrote: > > On 12/30/19 12:06 AM, Andrew Barnert wrote: >>> On Dec 29, 2019, at 20:04, Richard Damon wrote: >>> Thus your total_order, while not REALLY a total order, is likely good >>> enough for most purposes. >> Well, it is a total order of

[Python-ideas] Re: Fix statistics.median()?

2019-12-30 Thread Richard Damon
On 12/30/19 12:45 PM, David Mertz wrote: On Mon, Dec 30, 2019 at 12:37 PM Richard Damon mailto:rich...@damon-family.org>> wrote: My preference is that the interpretation that NaN means Missing Data isn't appropriate for for the statistics module. You need to tel the entire PyData

[Python-ideas] Re: Fix statistics.median()?

2019-12-30 Thread David Mertz
On Mon, Dec 30, 2019 at 12:37 PM Richard Damon wrote: > My preference is that the interpretation that NaN means Missing Data > isn't appropriate for for the statistics module. You need to tel the entire PyData ecosystem, the entire R ecosystem, and a pretty much all of Data Science that they

[Python-ideas] Re: Fix statistics.median()?

2019-12-30 Thread Richard Damon
On 12/30/19 12:18 PM, Christopher Barker wrote: Steven D'Aprano mailto:st...@pearwood.info>> wrote: Can you explain the scenario where somebody using median will want negative NANs to sort to the beginning, below -INF, and positive NANs to sort to the

[Python-ideas] Re: Fix statistics.median()?

2019-12-30 Thread Richard Damon
On 12/30/19 11:54 AM, David Mertz wrote: On Mon, Dec 30, 2019 at 3:32 AM Andrew Barnert via Python-ideas mailto:python-ideas@python.org>> wrote: On Dec 29, 2019, at 23:50, Steven D'Aprano mailto:st...@pearwood.info>> wrote: > > On Sun, Dec 29, 2019 at 06:23:03PM -0800, Andrew

[Python-ideas] Re: Fix statistics.median()?

2019-12-30 Thread Christopher Barker
Steven D'Aprano wrote: > Can you explain the scenario where somebody using median will want >> negative NANs to sort to the beginning, below -INF, and positive NANs to >> sort to the end, above +INF? > > No, and I don’t think anyone really wants that. I think it was proposed as a possible

[Python-ideas] Re: Fix statistics.median()?

2019-12-30 Thread Richard Damon
On 12/30/19 2:10 AM, Chris Angelico wrote: On Mon, Dec 30, 2019 at 3:52 PM Andrew Barnert wrote: On Dec 29, 2019, at 18:50, Chris Angelico wrote: On Mon, Dec 30, 2019 at 1:40 PM Andrew Barnert wrote: On Dec 29, 2019, at 18:20, Chris Angelico wrote: Counting numbers are intuitively

[Python-ideas] Re: Fix statistics.median()?

2019-12-30 Thread David Mertz
On Mon, Dec 30, 2019 at 3:32 AM Andrew Barnert via Python-ideas < python-ideas@python.org> wrote: > On Dec 29, 2019, at 23:50, Steven D'Aprano wrote: > > > > On Sun, Dec 29, 2019 at 06:23:03PM -0800, Andrew Barnert via > Python-ideas wrote: > > > >> Likewise, it’s even easier to write

[Python-ideas] Re: Fix statistics.median()?

2019-12-30 Thread David Mertz
On Mon, Dec 30, 2019 at 2:58 AM Steven D'Aprano wrote: > Can you explain the scenario where somebody using median will want > negative NANs to sort to the beginning, below -INF, and positive NANs to > sort to the end, above +INF? > I can kinda-sorta provide a case. But overall, despite my

[Python-ideas] Re: Fix statistics.median()?

2019-12-30 Thread Richard Damon
On 12/30/19 12:06 AM, Andrew Barnert wrote: On Dec 29, 2019, at 20:04, Richard Damon wrote: Thus your total_order, while not REALLY a total order, is likely good enough for most purposes. Well, it is a total order of equivalence classes (with all IEEE-equal values being equivalent, all

[Python-ideas] Re: Fix statistics.median()?

2019-12-30 Thread Greg Ewing
On 30/12/19 12:19 pm, David Mertz wrote: I'm sort of convinced that Posits better approximate the behavior of rationals than do IEEE-754 floats: https://en.m.wikipedia.org/wiki/Unum_(number_format) Is it just me, or is that Wikipedia article abysmally written? I feel like I gained a

[Python-ideas] Re: Fix statistics.median()?

2019-12-30 Thread Andrew Barnert via Python-ideas
On Dec 29, 2019, at 23:50, Steven D'Aprano wrote: > > On Sun, Dec 29, 2019 at 06:23:03PM -0800, Andrew Barnert via Python-ideas > wrote: > >> Likewise, it’s even easier to write ignore-nan yourself than to write the >> DSU yourself: >> >>median = statistics.median(x for x in xs if not

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Steven D'Aprano
On Sun, Dec 29, 2019 at 08:32:52PM -0800, Andrew Barnert via Python-ideas wrote: > The 95% case is handled by just ignore and raise. Novices should > probably never be using anything else. > > Experts will definitely often want poison. And probably sometimes fast > for backward compatibility

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Steven D'Aprano
On Sun, Dec 29, 2019 at 06:23:03PM -0800, Andrew Barnert via Python-ideas wrote: > Likewise, it’s even easier to write ignore-nan yourself than to write the DSU > yourself: > > median = statistics.median(x for x in xs if not x.isnan()) Try that with xs = [1, 10**400, 2] and come back to

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Steven D'Aprano
On Sun, Dec 29, 2019 at 07:59:26PM -0500, Richard Damon wrote: > Which is EXACTLY the reason I say that if this is important enough to > fix in median, it is important enough to fix in sorted. sorted gives > exactly the same nonsense result, it is only a bit more obvious because > it gives all

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Chris Angelico
On Mon, Dec 30, 2019 at 3:52 PM Andrew Barnert wrote: > > On Dec 29, 2019, at 18:50, Chris Angelico wrote: > > > > On Mon, Dec 30, 2019 at 1:40 PM Andrew Barnert wrote: > >> > >>> On Dec 29, 2019, at 18:20, Chris Angelico wrote: > >> > >> Counting numbers are intuitively numbers. So are

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Steven D'Aprano
On Sun, Dec 29, 2019 at 06:22:49PM -0500, Richard Damon wrote: > The way I see it, is that median doesn't handle NaNs in a reasonable > way, because sorted doesn't handle them, because it is easy and quick to > not handle NaN, and to handle them you need to define an Official > meaning for

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Tim Peters
[David Mertz] >> As me and Uncle Timmy have pointed out, it IS FIXED in sorted(). You just >> need to call: >> >>sorted_stuff = sorted(stuff, key=nan_aware_transform) [Christopher Barker] > But what would that be? floats have inf and -inf -- so how could you force > the NaNs to be at the

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Christopher Barker
On Sun, Dec 29, 2019 at 5:14 PM David Mertz wrote: > As me and Uncle Timmy have pointed out, it IS FIXED in sorted(). You just > need to call: > >sorted_stuff = sorted(stuff, key=nan_aware_transform) > But what would that be? floats have inf and -inf -- so how could you force the NaNs to

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Andrew Barnert via Python-ideas
On Dec 29, 2019, at 21:00, David Mertz wrote: > >  >> On Sun, Dec 29, 2019 at 11:33 PM Andrew Barnert wrote: > >> IEEE total order specifies a distinct order for every distinct bit pattern, >> and tries to do so in a way that makes sense. > > Ok, ok... I've got "learned up" about this three

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Andrew Barnert via Python-ideas
On Dec 29, 2019, at 20:04, Richard Damon wrote: > > Thus your total_order, while not REALLY a total order, is likely good enough > for most purposes. Well, it is a total order of equivalence classes (with all IEEE-equal values being equivalent, all negative NaNs being equivalent, and all

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread David Mertz
On Sun, Dec 29, 2019 at 11:33 PM Andrew Barnert wrote: > IEEE total order specifies a distinct order for every distinct bit > pattern, and tries to do so in a way that makes sense. > Ok, ok... I've got "learned up" about this three times now :-). Given we cannot control those bit patterns from

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Andrew Barnert via Python-ideas
On Dec 29, 2019, at 18:50, Chris Angelico wrote: > > On Mon, Dec 30, 2019 at 1:40 PM Andrew Barnert wrote: >> >>> On Dec 29, 2019, at 18:20, Chris Angelico wrote: >> >> Counting numbers are intuitively numbers. So are measures. And yet, they’re >> different. Which one is the “one true

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Tim Peters
[David] > How is that fancy bitmask version different from my 3-line version? Where he's referring to my: https://bugs.python.org/msg336487 and, I presume, to his: def total_order(x): if math.isnan(x): return (math.copysign(1, x), x) . return (0, x) \ Richard

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Richard Damon
On 12/29/19 11:00 PM, Tim Peters wrote: [Richard Damon ] IEEE total_order puts NaN as bigger than infinity, and -NaN as less than -inf. One simple way to implement it is to convert the representaton to a 64 bit signed integer (not its value, but its representation) and if the sign bit is set,

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread David Mertz
How is that fancy bitmask version different from my 3-line version? On Sun, Dec 29, 2019, 11:01 PM Tim Peters wrote: > [Richard Damon ] > > IEEE total_order puts NaN as bigger than infinity, and -NaN as less than > > -inf. > > > > One simple way to implement it is to convert the representaton

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Richard Damon
On 12/29/19 10:39 PM, David Mertz wrote: On Sun, Dec 29, 2019 at 10:18 PM Richard Damon mailto:rich...@damon-family.org>> wrote: IEEE total_order puts NaN as bigger than infinity, and -NaN as less than -inf. You mean like this? >>> def total_order(x): ...     if math.isnan(x): ...    

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Tim Peters
[Richard Damon ] > IEEE total_order puts NaN as bigger than infinity, and -NaN as less than > -inf. > > One simple way to implement it is to convert the representaton to a 64 > bit signed integer (not its value, but its representation) and if the > sign bit is set, complement the bottom 63 bits

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Richard Damon
On 12/29/19 9:46 PM, Chris Angelico wrote: On Mon, Dec 30, 2019 at 1:40 PM Andrew Barnert wrote: On Dec 29, 2019, at 18:20, Chris Angelico wrote: On Mon, Dec 30, 2019 at 11:47 AM Steven D'Aprano wrote: On Mon, Dec 30, 2019 at 08:30:41AM +1100, Chris Angelico wrote: Especially since it

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Steven D'Aprano
On Sun, Dec 29, 2019 at 05:40:09PM -0800, Neil Girdhar wrote: > I'm just glancing at this thread, but it sounds like you want to add the > quickselect algorithm to the standard library. As you point out in > another message, quickselect is faster than quicksort: it is linear time > (provided

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Richard Damon
On 12/29/19 9:42 PM, David Mertz wrote: On Sun, Dec 29, 2019, 9:23 PM Andrew Barnert Here it is. I could save a line by not using the 'else'. def total_order(x):     if is_nan(x):     return (math.copysign(1, x), x)     else:     return (0, x) This

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Steven D'Aprano
On Sun, Dec 29, 2019 at 07:43:28PM -0500, David Mertz wrote: > the notional approximations that bit-patterns give of Rational numbers (not > sure why Richard keeps insisting it's about Reals, not Rationals... albeit > there is no difference that means anything to this discussion). Excluding NANs

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread David Mertz
"*Die ganzen Zahlen hat der liebe Gott gemacht, alles andere ist Menschenwerk*" ("God made the integers, all else is the work of man"). –Leopold Kronecker of course, Kronecker was wrong, and Cantor was right. But the quote is an excellent dis. :-) On Sun, Dec 29, 2019 at 9:41 PM Andrew

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Chris Angelico
On Mon, Dec 30, 2019 at 1:40 PM Andrew Barnert wrote: > > On Dec 29, 2019, at 18:20, Chris Angelico wrote: > > > > On Mon, Dec 30, 2019 at 11:47 AM Steven D'Aprano > > wrote: > >> > >> On Mon, Dec 30, 2019 at 08:30:41AM +1100, Chris Angelico wrote: > >> > Especially since it fails quite

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread David Mertz
On Sun, Dec 29, 2019, 9:23 PM Andrew Barnert > Here it is. I could save a line by not using the 'else'. > > def total_order(x): > if is_nan(x): > return (math.copysign(1, x), x) > else: > return (0, x) > > > This doesn’t give you IEEE total order. Under what circumstances

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Andrew Barnert via Python-ideas
On Dec 29, 2019, at 18:20, Chris Angelico wrote: > > On Mon, Dec 30, 2019 at 11:47 AM Steven D'Aprano wrote: >> >> On Mon, Dec 30, 2019 at 08:30:41AM +1100, Chris Angelico wrote: >> Especially since it fails quite a few commonsense tests for whether or not something is a number: >>

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Steven D'Aprano
On Sat, Dec 28, 2019 at 10:16:28PM -0800, Christopher Barker wrote: > Richard: I am honestly confused about what you think we should do. Sure, > you can justify why the statistics module doesn’t currently handle NaN’s > well, but that doesn’t address the question of what it should do. > > As far

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Andrew Barnert via Python-ideas
On Dec 29, 2019, at 17:30, David Mertz wrote: > >  >> On Sun, Dec 29, 2019 at 8:14 PM Andrew Barnert wrote: >> On Dec 29, 2019, at 16:08, David Mertz wrote: >> > >> > * There is absolutely no need to lose any efficiency by making the >> > statistics functions more friendly. All we need is

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Chris Angelico
On Mon, Dec 30, 2019 at 11:47 AM Steven D'Aprano wrote: > > On Mon, Dec 30, 2019 at 08:30:41AM +1100, Chris Angelico wrote: > > > > Especially since it fails quite a few commonsense tests for whether or > > > not something is a number: > [...] > > > The answer in all four cases is No. If

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread David Mertz
Actually, I wouldn't mind passing a key function to _median(), but that is way too advanced for the beginner users to have to think about. So maybe median() could call _median() internally where needed, but the underscore version could exist also. On Sun, Dec 29, 2019 at 8:14 PM Andrew Barnert

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Neil Girdhar
I'm just glancing at this thread, but it sounds like you want to add the quickselect algorithm to the standard library. As you point out in another message, quickselect is faster than quicksort: it is linear time (provided the pivot is chosen by median of medians) whereas quicksort is

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread David Mertz
median() does not currently take a key function. This is not hard to see. It could, but as I've written, I don't think that's the best approach. In [16]: statistics.median?? Signature: statistics.median(data) Source: def median(data): """Return the median (middle value) of numeric data.

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread David Mertz
On Sun, Dec 29, 2019 at 8:14 PM Andrew Barnert wrote: > On Dec 29, 2019, at 16:08, David Mertz wrote: > > > > * There is absolutely no need to lose any efficiency by making the > statistics functions more friendly. All we need is an optional parameter > whose spelling I've suggested as

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Richard Damon
On 12/29/19 8:13 PM, David Mertz wrote: On Sun, Dec 29, 2019 at 8:00 PM Richard Damon mailto:rich...@damon-family.org>> wrote: Which is EXACTLY the reason I say that if this is important enough to fix in median, it is important enough to fix in sorted. sorted gives exactly the same

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Andrew Barnert via Python-ideas
On Dec 29, 2019, at 16:08, David Mertz wrote: > > * There is absolutely no need to lose any efficiency by making the statistics > functions more friendly. All we need is an optional parameter whose spelling > I've suggested as `on_nan` (but bikeshed freely). Under at least one value > of

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread David Mertz
On Sun, Dec 29, 2019 at 8:00 PM Richard Damon wrote: > Which is EXACTLY the reason I say that if this is important enough to > fix in median, it is important enough to fix in sorted. sorted gives > exactly the same nonsense result, it is only a bit more obvious because > it gives all the points.

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Richard Damon
On 12/29/19 7:43 PM, Tim Peters wrote: [Christopher Barker ] ... But the biggest barrier is that it would be a fair bit of churn on the sort() functions (and the float class), and would only help for floats anyway. If someone want to propose this, please do -- but I don't think we should wait

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread David Mertz
Oh... I made a mistake in my off-the-cuff code. The items.append() shouldn't be in an else, but just in the loop. def median(it, on_nan=DEFAULT): if on_nan == 'unsafe': ... do all the current stuff ... elif on_nan == "ignore": return median((x for x in it if not is_nan(x)),

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Richard Damon
On 12/29/19 7:05 PM, Christopher Barker wrote: On Sun, Dec 29, 2019 at 3:26 PM Richard Damon mailto:rich...@damon-family.org>> wrote: > Frankly, I’m also confused as to why folks seem to think this is an > issue to be addressed in the sort() functions The way I see it, is that

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Christopher Barker
Thanks David for laying a proposal out clearly: +1 to the whole thing. -CHB On Sun, Dec 29, 2019 at 4:06 PM David Mertz wrote: > Several points: > > * NaN as missing-value is widely used outside the Python standard > library. One could argue, somewhat reasonably, that Pandas and NumPy and >

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Christopher Barker
Sorry for all these posts, but maybe someone mentioned this already, but maybe this is a time to consider a new algorithm anyway: https://rcoh.me/posts/linear-time-median-finding/ And doing the NaN-check inline might be faster than pre-filtering. -CHB On Sun, Dec 29, 2019 at 4:39 PM

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread David Mertz
On Sun, Dec 29, 2019 at 7:35 PM Andrew Barnert wrote: > On Dec 29, 2019, at 15:19, David Mertz wrote:On Sun, > Dec 29, 2019, 5:20 PM Andrew Barnert via Python-ideas > > But it is, out of all of the possible magma-over-magma structures on those >> values, the one that most closely

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Tim Peters
[Christopher Barker ] > ... > But the biggest barrier is that it would be a fair bit of churn on the sort() > functions > (and the float class), and would only help for floats anyway. If someone want > to propose this, please do -- but I don't think we should wait for that to do > something >

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Steven D'Aprano
On Mon, Dec 30, 2019 at 08:30:41AM +1100, Chris Angelico wrote: > > Especially since it fails quite a few commonsense tests for whether or > > not something is a number: [...] > > The answer in all four cases is No. If something doesn't quack like a > > duck, doesn't swim like a duck, and doesn't

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Christopher Barker
On Sun, Dec 29, 2019 at 4:05 PM Christopher Barker wrote: > >> You mean performance? Sure, but as I've argued before (no idea if anyone > agrees with me) the statistics package is already not a high performance > package anyway. If it turns out that it slows it down by, say, a factor of > two or

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Andrew Barnert via Python-ideas
On Dec 29, 2019, at 15:19, David Mertz wrote: > >  > On Sun, Dec 29, 2019, 5:20 PM Andrew Barnert via Python-ideas >> But it is, out of all of the possible magma-over-magma structures on those >> values, the one that most closely approximates—in a well-defined and useful, >> if very

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Richard Damon
On 12/29/19 5:41 PM, Chris Angelico wrote: On Mon, Dec 30, 2019 at 9:01 AM Richard Damon wrote: On 12/29/19 4:30 PM, Chris Angelico wrote: On Mon, Dec 30, 2019 at 5:48 AM Steven D'Aprano wrote: On Sat, Dec 28, 2019 at 09:20:49PM -0800, Brendan Barnwell wrote: Especially since it fails

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread David Mertz
Several points: * NaN as missing-value is widely used outside the Python standard library. One could argue, somewhat reasonably, that Pandas and NumPy and PyTorch misinterpret the IEEE-754 intention here, but this is EVERYWHERE in numeric/scientific Python. We could DOCUMENT that None is a

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Christopher Barker
On Sun, Dec 29, 2019 at 3:26 PM Richard Damon wrote: > > Frankly, I’m also confused as to why folks seem to think this is an > > issue to be addressed in the sort() functions > > The way I see it, is that median doesn't handle NaNs in a reasonable > way, because sorted doesn't handle them, I

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Richard Damon
On 12/29/19 1:16 AM, Christopher Barker wrote: OMG! Thus is fun and all, but: On Sat, Dec 28, 2019 at 9:11 PM Richard Damon mailto:rich...@damon-family.org>> wrote: ... practicality beats purity. And practically, everyone in this thread understands what a float is, and what a NaN is

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread David Mertz
On Sun, Dec 29, 2019, 5:20 PM Andrew Barnert via Python-ideas > But it is, out of all of the possible magma-over-magma structures on those > values, the one that most closely approximates—in a well-defined and > useful, if very complicated, way—the rationals. I'm sort of convinced that Posits

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Chris Angelico
On Mon, Dec 30, 2019 at 9:01 AM Richard Damon wrote: > > On 12/29/19 4:30 PM, Chris Angelico wrote: > > On Mon, Dec 30, 2019 at 5:48 AM Steven D'Aprano wrote: > >> On Sat, Dec 28, 2019 at 09:20:49PM -0800, Brendan Barnwell wrote: > >> > >> Especially since it fails quite a few commonsense tests

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Christopher Barker
OMG! Thus is fun and all, but: On Sat, Dec 28, 2019 at 9:11 PM Richard Damon wrote: > > ... practicality beats purity. And practically, everyone in this thread understands what a float is, and what a NaN is and is not. Richard: I am honestly confused about what you think we should do. Sure,

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Andrew Barnert via Python-ideas
On Dec 29, 2019, at 13:33, Chris Angelico wrote: > > More useful would be to look at the useful operations and invariants > that can be maintained, but that doesn't work too well for finite > subsets of numbers. Very few operations are closed for, say, "sixteen > bit integers". Nor for "rational

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Richard Damon
On 12/29/19 4:30 PM, Chris Angelico wrote: On Mon, Dec 30, 2019 at 5:48 AM Steven D'Aprano wrote: On Sat, Dec 28, 2019 at 09:20:49PM -0800, Brendan Barnwell wrote: The things that computers work with are floats, and NaN is a float, so in any relevant sense it is a number; it is an instance

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Chris Angelico
On Mon, Dec 30, 2019 at 5:48 AM Steven D'Aprano wrote: > > On Sat, Dec 28, 2019 at 09:20:49PM -0800, Brendan Barnwell wrote: > > > The things that > > computers work with are floats, and NaN is a float, so in any relevant > > sense it is a number; it is an instance of a numerical type. > > You

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Steven D'Aprano
On Sat, Dec 28, 2019 at 09:20:49PM -0800, Brendan Barnwell wrote: > The things that > computers work with are floats, and NaN is a float, so in any relevant > sense it is a number; it is an instance of a numerical type. You seem to be assuming that `x is an instance of type float (or

[Python-ideas] Re: Fix statistics.median()?

2019-12-29 Thread Richard Damon
On 12/29/19 12:20 AM, Brendan Barnwell wrote: On 2019-12-28 21:11, Richard Damon wrote: You seem to understand Pure Math, but not the Applied Mathematics of computers. The Applied Mathematics of Computing is based on the concept of finite approximation, which is something that Pure Math, like

[Python-ideas] Re: Fix statistics.median()?

2019-12-28 Thread Christopher Barker
On Sat, Dec 28, 2019 at 9:42 PM Brendan Barnwell wrote: >But that is the problem. "The applied mathematics of computing" is > floating point, and in floating point, NaN is a number (despite its > name). careful here -- that may just re-ignite the argument :-( > computers work with

[Python-ideas] Re: Fix statistics.median()?

2019-12-28 Thread David Mertz
On Sun, Dec 29, 2019, 12:14 AM Richard Damon wrote: > But practicality beats purity, and practically, bit pattern CAN > represent numbers. If you want to argue that floats are not numbers, than > we can't use the statistics package, as we can't have any numbers to > perform the statistics on. >

[Python-ideas] Re: Fix statistics.median()?

2019-12-28 Thread Brendan Barnwell
On 2019-12-28 21:11, Richard Damon wrote: You seem to understand Pure Math, but not the Applied Mathematics of computers. The Applied Mathematics of Computing is based on the concept of finite approximation, which is something that Pure Math, like the type that builds up the Number line

[Python-ideas] Re: Fix statistics.median()?

2019-12-28 Thread Richard Damon
On 12/28/19 11:44 PM, David Mertz wrote: On Sat, Dec 28, 2019, 11:02 PM Chris Angelico > wrote: They really truly ARE numbers. They are, in fact, these numbers: a = 3602879701896397 / 36028797018963968 b = 3602879701896397 / 18014398509481984 c =

[Python-ideas] Re: Fix statistics.median()?

2019-12-28 Thread David Mertz
On Sat, Dec 28, 2019, 11:02 PM Chris Angelico wrote: > They really truly ARE numbers. They are, in fact, these numbers: > > a = 3602879701896397 / 36028797018963968 > b = 3602879701896397 / 18014398509481984 > c = 5404319552844595 / 18014398509481984 > > When you perform addition on these, the

[Python-ideas] Re: Fix statistics.median()?

2019-12-28 Thread Richard Damon
On 12/28/19 10:41 PM, David Mertz wrote: On Sat, Dec 28, 2019 at 10:31 PM Richard Damon mailto:rich...@damon-family.org>> wrote: Every value of the type float, except NaN and perhaps +inf and -inf (depending on which version of the Real Number Line you use) IS actually a

[Python-ideas] Re: Fix statistics.median()?

2019-12-28 Thread Chris Angelico
On Sun, Dec 29, 2019 at 2:44 PM David Mertz wrote: > > On Sat, Dec 28, 2019 at 10:31 PM Richard Damon > wrote: >> >> Every value of the type float, except NaN and perhaps +inf and -inf >> (depending on which version of the Real Number Line you use) IS actually >> a representation of a Real

[Python-ideas] Re: Fix statistics.median()?

2019-12-28 Thread David Mertz
On Sat, Dec 28, 2019 at 10:31 PM Richard Damon wrote: > Every value of the type float, except NaN and perhaps +inf and -inf > (depending on which version of the Real Number Line you use) IS actually > a representation of a Real Number (so I don't understand in what way you > can say they aren't

[Python-ideas] Re: Fix statistics.median()?

2019-12-28 Thread Richard Damon
On 12/28/19 10:05 PM, David Mertz wrote: On Sat, Dec 28, 2019, 9:36 PM Richard Damon > wrote: > NaN may be an instance of the abstract type Number, but is isn't a mathematical number. Yes, floating point numbers are not pure-math Reals. Not even

[Python-ideas] Re: Fix statistics.median()?

2019-12-28 Thread David Mertz
On Sat, Dec 28, 2019, 9:36 PM Richard Damon wrote: > > NaN may be an instance of the abstract type Number, but is isn't a > mathematical number. Yes, floating point numbers are not pure-math Reals. Not even Rationals. They are a CS construct that is very useful for computer programs that very

[Python-ideas] Re: Fix statistics.median()?

2019-12-28 Thread Ricky Teachey
An idea I haven't seen anyone suggest: Since testing for nan can be expensive, maybe it would make sense to provide a statistics.median_unsafe(), or perhaps median_fast(), method with the current implementation (for situations where a slow down isn't acceptable), and update the median() function

[Python-ideas] Re: Fix statistics.median()?

2019-12-28 Thread Richard Damon
On 12/28/19 8:40 PM, David Mertz wrote: This is sophistry. NaN is an instance of the abstract type numbers.Number and the concrete type float. IEEE-754 defines NaN as collection of required values in any floating point type. I know the acronym suggests otherwise in a too-cute way, but NaN is

[Python-ideas] Re: Fix statistics.median()?

2019-12-28 Thread David Mertz
This is sophistry. NaN is an instance of the abstract type numbers.Number and the concrete type float. IEEE-754 defines NaN as collection of required values in any floating point type. I know the acronym suggests otherwise in a too-cute way, but NaN is archetypically a number in a computer

[Python-ideas] Re: Fix statistics.median()?

2019-12-28 Thread Richard Damon
On 12/28/19 1:14 AM, Christopher Barker wrote: On Fri, Dec 27, 2019 at 8:14 PM Richard Damon mailto:rich...@damon-family.org>> wrote: > It is a well known axiom of computing that returning an *incorrect* > result is a very bad thing. There is also an axiom that you can only expect

[Python-ideas] Re: Fix statistics.median()?

2019-12-27 Thread Christopher Barker
Thanks Steven, for your thoughtful response. On Thu, Dec 26, 2019 at 5:44 PM Steven D'Aprano wrote: > However, I am happy to accept that silent failure may not be the ideal > result for everyone. I would argue that it is not the ideal result for ANYONE. The only reason for it is that it's

[Python-ideas] Re: Fix statistics.median()?

2019-12-27 Thread Christopher Barker
On Fri, Dec 27, 2019 at 8:14 PM Richard Damon wrote: > > It is a well known axiom of computing that returning an *incorrect* > > result is a very bad thing. > > There is also an axiom that you can only expect valid results if you > meet the operations pre-conditions. > sure. > Sometimes,

[Python-ideas] Re: Fix statistics.median()?

2019-12-27 Thread Richard Damon
On 12/27/19 9:15 PM, Christopher Barker wrote: I’m going to strongly support David Mertz’s point here: It is a well known axiom of computing that returning an *incorrect* result is a very bad thing. There is also an axiom that you can only expect valid results if you meet the operations

[Python-ideas] Re: Fix statistics.median()?

2019-12-27 Thread Christopher Barker
I’m going to strongly support David Mertz’s point here: It is a well known axiom of computing that returning an *incorrect* result is a very bad thing. What the correct result of the median of a sequence of floats that contains some NaNs is up for debate. As David points out there are (at

[Python-ideas] Re: Fix statistics.median()?

2019-12-27 Thread David Mertz
>>> nan1 = float('nan') >>> nan2 = 1e350 - 1e350 >>> nan3 = 1e400 / 1e399 >>> nan1, nan2, nan3 (nan, nan, nan) >>> things = [1, 2, 3, nan1, nan2] >>> nan1 in things, nan3 in things (True, False) >>> nan1 == nan1 False >>> nan1 is nan1 True The "in" operator might not do what you hope with

[Python-ideas] Re: Fix statistics.median()?

2019-12-27 Thread Juancarlo Añez
The signature could be: def median(it, exclude=None): With *exclude* being a value, or collection supporting the *in* operator On Thu, Dec 26, 2019 at 4:14 PM David Mertz wrote: > Maybe we can just change the function signature: > > statistics.median(it, do_wrong_ass_thing_with_nans=False) >

[Python-ideas] Re: Fix statistics.median()?

2019-12-27 Thread Richard Damon
On 12/26/19 5:23 PM, Andrew Barnert via Python-ideas wrote: On Dec 26, 2019, at 12:36, Richard Damon wrote: On 12/26/19 2:10 PM, Andrew Barnert via Python-ideas wrote: On Dec 26, 2019, at 10:58, Richard Damon wrote: Note, that NaN values are somewhat rare in most programs, I think they can

[Python-ideas] Re: Fix statistics.median()?

2019-12-26 Thread Steven D'Aprano
Forcing NANs to the end is not the right solution. Consider the median of [NAN, 2, 3, 4, 5]. If you force the NAN to remain at the start, the median is 3. If you force the NAN to the end of the list, the median is 4. Your choice to force NANs to the end is equivalent to introducing a bias

[Python-ideas] Re: Fix statistics.median()?

2019-12-26 Thread Steven D'Aprano
On Fri, Dec 27, 2019 at 02:03:57AM -, Marco Sulla via Python-ideas wrote: > Steven D'Aprano wrote: > > Marco, you don't have to use median_low and median_high if you don't > > like them, but they aren't any worse than any other choice for > > calculating order statistics. All order

[Python-ideas] Re: Fix statistics.median()?

2019-12-26 Thread Steven D'Aprano
On Fri, Dec 27, 2019 at 04:32:44AM -, Marco Sulla via Python-ideas wrote: > Think about this: you have a population of 1 million of people. You > want to take the median of their heart rate. But for some reason, your > calculations gives you some NaN. The only reasonable scenario for that

[Python-ideas] Re: Fix statistics.median()?

2019-12-26 Thread Steven D'Aprano
On Thu, Dec 26, 2019 at 02:23:42PM -0800, Andrew Barnert via Python-ideas wrote: > I don’t think that’s true. Surely the median of (-inf, 1, 2, 3, inf, > inf, inf) is well defined and can only be 3? It's well-defined, but probably not good statistics. I'm not sure what measurement you are

[Python-ideas] Re: Fix statistics.median()?

2019-12-26 Thread Steven D'Aprano
On Fri, Dec 27, 2019 at 03:40:10AM -, Marco Sulla via Python-ideas wrote: > Oh my... Mertz, listen to me, you don't need a parameter. You only > need a key function to pass to `sorted()` How do you pass the key function to sorted() without a parameter? > median(iterable, key=iliadSort)

  1   2   >