[Python-ideas] Re: NAN handling in statistics functions

2021-08-31 Thread Finn Mason
I've honestly never really used an Enum, so I'm not an expert here. An idea might be using string flags, but setting module level constants equal to the string flag, so that you can use either. For example (using the ipython shell because it's easier in email with quoting and all): In [1]: import

[Python-ideas] Re: NAN handling in statistics functions

2021-08-31 Thread Christopher Barker
First: I started this specifically in the context of the stats package and the NaN handling flag, but it did turn into a ore general discussion of Enums, so a final thought: On Tue, Aug 31, 2021 at 4:17 AM Ronald Oussoren wrote: > > Not just static typing, but static analysis in general. Tools

[Python-ideas] Re: NAN handling in statistics functions

2021-08-31 Thread Christopher Barker
On Tue, Aug 31, 2021 at 12:09 AM Stephen J. Turnbull < stephenjturnb...@gmail.com> wrote: > *sigh* __members__ is just the mechanism. As a consequence, Enums are > iterable, and they automatically DTRT with dir() and help() even if > there are no docstrings. but you didn't say dir(), you said _

[Python-ideas] Re: NAN handling in statistics functions

2021-08-31 Thread Ricky Teachey
On Tue, Aug 31, 2021 at 9:17 AM Ricky Teachey wrote: > Can someone explain why enum-vs-string is being discussed as if it is an > either-or choice? Why not just call the enum class using the input so that > you can supply a string or enum?I understand this would not be a really > great choice for

[Python-ideas] Re: NAN handling in statistics functions

2021-08-31 Thread Ricky Teachey
Can someone explain why enum-vs-string is being discussed as if it is an either-or choice? Why not just call the enum class using the input so that you can supply a string or enum? NanChoice(nan_choice_input) I understand this would not be a really great choice for a flags enum or int enum, but f

[Python-ideas] Re: NAN handling in statistics functions

2021-08-31 Thread Chris Angelico
On Tue, Aug 31, 2021 at 9:51 PM Steven D'Aprano wrote: > > I want to (in iPython) do: > > > > statistics.median? > > > > and see everything I need to know to use it > > > Okay, so if the API is (say) this: > > def median(data, *, nans='ignore'): > ... > > > will iPython give you a list

[Python-ideas] Re: NAN handling in statistics functions

2021-08-31 Thread Steven D'Aprano
On Mon, Aug 30, 2021 at 10:37:28PM -0700, Christopher Barker wrote: > On Mon, Aug 30, 2021 at 10:22 AM Stephen J. Turnbull < > stephenjturnb...@gmail.com> wrote: > > > Christopher Barker writes: > > > > > e.g.: what are the valid values? > > > > That's easy: MyEnum.__members__. > > > > Seriously

[Python-ideas] Re: NAN handling in statistics functions

2021-08-31 Thread Ronald Oussoren via Python-ideas
> On 30 Aug 2021, at 18:19, Christopher Barker wrote: > > On Mon, Aug 30, 2021 at 12:57 AM Ronald Oussoren > wrote: > > On 28 Aug 2021, at 07:14, Christopher Barker > > wrote: >> >> Also +1 on a string flag, rather than an Enum. > ou

[Python-ideas] Re: NAN handling in statistics functions

2021-08-30 Thread Christopher Barker
On Mon, Aug 30, 2021 at 10:22 AM Stephen J. Turnbull < stephenjturnb...@gmail.com> wrote: > Christopher Barker writes: > > > e.g.: what are the valid values? > > That's easy: MyEnum.__members__. > Seriously? you are arguing that Enums are better because they are self documenting, when you have t

[Python-ideas] Re: NAN handling in statistics functions

2021-08-30 Thread Christopher Barker
On Mon, Aug 30, 2021 at 6:50 PM Steven D'Aprano wrote: > > They provide a *huge* advantage when they can be combined. It's easy > > to accept a flags argument that is the bitwise Or of a collection of > > flags, > > I'm not a big user of Enums, but I *think* that only applies for > IntEnums? >

[Python-ideas] Re: NAN handling in statistics functions

2021-08-30 Thread Brendan Barnwell
On 2021-08-30 09:23, Chris Angelico wrote: On Tue, Aug 31, 2021 at 2:19 AM Christopher Barker wrote: To be honest, I haven't really used Enums much (in fact, only to mirror C enums in extension code), but part of that is because I have yet to see what the point is in Python, over simple strin

[Python-ideas] Re: NAN handling in statistics functions

2021-08-30 Thread Chris Angelico
On Tue, Aug 31, 2021 at 11:47 AM Steven D'Aprano wrote: > > On Tue, Aug 31, 2021 at 02:23:29AM +1000, Chris Angelico wrote: > > On Tue, Aug 31, 2021 at 2:19 AM Christopher Barker > > wrote: > > > > I suppose they provide a real advantage for static typing, but other > > > than that I just don't

[Python-ideas] Re: NAN handling in statistics functions

2021-08-30 Thread Steven D'Aprano
On Tue, Aug 31, 2021 at 02:23:29AM +1000, Chris Angelico wrote: > On Tue, Aug 31, 2021 at 2:19 AM Christopher Barker > wrote: > > I suppose they provide a real advantage for static typing, but other > > than that I just don't see it. > > They provide a *huge* advantage when they can be combine

[Python-ideas] Re: NAN handling in statistics functions

2021-08-30 Thread Chris Angelico
On Tue, Aug 31, 2021 at 2:19 AM Christopher Barker wrote: > > On Mon, Aug 30, 2021 at 12:57 AM Ronald Oussoren > wrote: > > On 28 Aug 2021, at 07:14, Christopher Barker wrote: >>> >>> >> Also +1 on a string flag, rather than an Enum. >> >> ou prefer strings for the options rather than an Enum?

[Python-ideas] Re: NAN handling in statistics functions

2021-08-30 Thread Christopher Barker
On Mon, Aug 30, 2021 at 12:57 AM Ronald Oussoren wrote: > On 28 Aug 2021, at 07:14, Christopher Barker wrote: > >> Also +1 on a string flag, rather than an Enum. > > ou prefer strings for the options rather than an Enum? The enum clearly > documents what the valid options are for the option. >

[Python-ideas] Re: NAN handling in statistics functions

2021-08-30 Thread Ronald Oussoren via Python-ideas
> On 28 Aug 2021, at 07:14, Christopher Barker wrote: > > > SciPy should probably also be a data-point, it uses: > > nan_policy : {'propagate', 'raise', 'omit'}, optional > > +1 > > Also +1 on a string flag, rather than an Enum. Why do you prefer strings for the options rather than an

[Python-ideas] Re: NAN handling in statistics functions

2021-08-29 Thread MRAB
On 2021-08-30 04:31, Steven D'Aprano wrote: On Sun, Aug 29, 2021 at 08:20:07PM -0400, tritium-l...@sdamon.com wrote: Not to go off on too much of a tangent, but isn't NaN unorderable? Its greater than nothing, and less than nothing, so you can't even really sort a list with a NaN value in it (

[Python-ideas] Re: NAN handling in statistics functions

2021-08-29 Thread Chris Angelico
On Mon, Aug 30, 2021 at 1:33 PM Steven D'Aprano wrote: > However we could add a function, totalorder, which can be used as a key > function to force an order on NANs. The 2008 version of the IEEE-754 > standard recommends such a function: > > from some_module import totalorder > sorted([4,

[Python-ideas] Re: NAN handling in statistics functions

2021-08-29 Thread Steven D'Aprano
On Sun, Aug 29, 2021 at 08:20:07PM -0400, tritium-l...@sdamon.com wrote: > Not to go off on too much of a tangent, but isn't NaN unorderable? Its > greater than nothing, and less than nothing, so you can't even really sort a > list with a NaN value in it (..though I'm sure python does sort it by

[Python-ideas] Re: NAN handling in statistics functions

2021-08-29 Thread tritium-list
find a NaN with a binary search... it would be impossible to have a NaN in an ordered sequence wouldn't it? -Original Message- From: Cameron Simpson Sent: Sunday, August 29, 2021 5:36 PM To: python-ideas@python.org Subject: [Python-ideas] Re: NAN handling in statistics functions

[Python-ideas] Re: NAN handling in statistics functions

2021-08-29 Thread Cameron Simpson
On 27Aug2021 15:50, Finn Mason wrote: >Perhaps a math.hasnan() function for collections could be implemented with >binary search? > >math.hasnan(seq) Why would a binary search be of use? A staraight sequential scan of the sequence seems the only reliable method. Binary search is for finding a v

[Python-ideas] Re: NAN handling in statistics functions

2021-08-28 Thread David Mertz, Ph.D.
I was thinking of the Cauchy distribution, with undefined variance. But Augustin-Louis Cauchy had quite a few things named after him. I know best Cauchy sequences as a construction of Real numbers. On Sun, Aug 29, 2021, 2:36 AM Stephen J. Turnbull < stephenjturnb...@gmail.com> wrote: > David M

[Python-ideas] Re: NAN handling in statistics functions

2021-08-28 Thread David Mertz, Ph.D.
On Sat, Aug 28, 2021, 8:34 AM Stephen J. Turnbull < stephenjturnb...@gmail.com> wrote: > David Mertz, Ph.D. writes: > > > NANs do not necessarily represent missing data. > > > I think in the context of `stats` they do. But this is color of > bikeshed, and I defer to you, of course. > > I have a

[Python-ideas] Re: NAN handling in statistics functions

2021-08-28 Thread Marc-Andre Lemburg
On 28.08.2021 14:33, Richard Damon wrote: > On 8/28/21 6:23 AM, Marc-Andre Lemburg wrote: >> To me, the behavior looked a lot like stripping NANs left and right >> from the list, but what you're explaining makes this appear even more >> as a bug in the implementation of median() - basically wrong

[Python-ideas] Re: NAN handling in statistics functions

2021-08-28 Thread Richard Damon
On 8/28/21 6:23 AM, Marc-Andre Lemburg wrote: > To me, the behavior looked a lot like stripping NANs left and right > from the list, but what you're explaining makes this appear even more > as a bug in the implementation of median() - basically wrong assumptions > about NANs sorting correctly. The

[Python-ideas] Re: NAN handling in statistics functions

2021-08-28 Thread Marc-Andre Lemburg
On 28.08.2021 05:32, Steven D'Aprano wrote: > On Thu, Aug 26, 2021 at 09:36:27AM +0200, Marc-Andre Lemburg wrote: > >> Indeed. The NAN handling in median() looks like a bug, more than >> anything else: > > [slightly paraphrased] > l1 = [1,2,nan,4] > l2 = [nan,1,2,4] >> > statistics.me

[Python-ideas] Re: NAN handling in statistics functions

2021-08-28 Thread Marc-Andre Lemburg
On 28.08.2021 07:14, Christopher Barker wrote: > > SciPy should probably also be a data-point, it uses: > >     nan_policy : {'propagate', 'raise', 'omit'}, optional > > > +1 > > Also +1 on a string flag, rather than an Enum. Same here. Codecs use strings as well: 'strict', 'ignore',

[Python-ideas] Re: NAN handling in statistics functions

2021-08-27 Thread David Mertz, Ph.D.
On Sat, Aug 28, 2021, 1:58 AM Steven D'Aprano wrote: > On Sat, Aug 28, 2021 at 01:36:33AM -0400, David Mertz, Ph.D. wrote: > > > I like the statsmodels spelling better: missing : str; Available options > > are ‘none’, ‘drop’, and ‘raise’ > > NANs do not necessarily represent missing data. > I th

[Python-ideas] Re: NAN handling in statistics functions

2021-08-27 Thread Steven D'Aprano
On Sat, Aug 28, 2021 at 01:36:33AM -0400, David Mertz, Ph.D. wrote: > I like the statsmodels spelling better: missing : str; Available options > are ‘none’, ‘drop’, and ‘raise’ NANs do not necessarily represent missing data. -- Steve ___ Python-ideas

[Python-ideas] Re: NAN handling in statistics functions

2021-08-27 Thread David Mertz, Ph.D.
I like the statsmodels spelling better: missing : str; Available options are ‘none’, ‘drop’, and ‘raise’ But this is bikeshed painting if the options exist. However, I WOULD urge the argument to take EITHER a string OR an enum. I don't think any other libraries mentioned do that, but it would ju

[Python-ideas] Re: NAN handling in statistics functions

2021-08-27 Thread Christopher Barker
> SciPy should probably also be a data-point, it uses: > > nan_policy : {'propagate', 'raise', 'omit'}, optional +1 Also +1 on a string flag, rather than an Enum. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Deskto

[Python-ideas] Re: NAN handling in statistics functions

2021-08-27 Thread Sebastian Berg
On Sat, 2021-08-28 at 11:49 +1000, Steven D'Aprano wrote: > On Tue, Aug 24, 2021 at 01:53:51PM +1000, Steven D'Aprano wrote: > > > I've spoken to users of other statistics packages and languages, > > such as > > R, and I cannot find any consensus on what the "right" behaviour > > should > > be f

[Python-ideas] Re: NAN handling in statistics functions

2021-08-27 Thread Steven D'Aprano
On Thu, Aug 26, 2021 at 09:36:27AM +0200, Marc-Andre Lemburg wrote: > Indeed. The NAN handling in median() looks like a bug, more than > anything else: [slightly paraphrased] > >>> l1 = [1,2,nan,4] > >>> l2 = [nan,1,2,4] > > >>> statistics.median(l1) > nan > >>> statistics.median(l2) > 1.5 Looks

[Python-ideas] Re: NAN handling in statistics functions

2021-08-27 Thread Steven D'Aprano
On Tue, Aug 24, 2021 at 01:53:51PM +1000, Steven D'Aprano wrote: > I've spoken to users of other statistics packages and languages, such as > R, and I cannot find any consensus on what the "right" behaviour should > be for NANs except "not that!". > > So I propose that statistics functions gain

[Python-ideas] Re: NAN handling in statistics functions

2021-08-27 Thread Finn Mason
Perhaps a math.hasnan() function for collections could be implemented with binary search? math.hasnan(seq) Though it is true that if you're using datasets large enough to care about speed, you should probably be using the SciPy stack instead of statistics in the first place. On Fri, Aug 27, 2021

[Python-ideas] Re: NAN handling in statistics functions

2021-08-27 Thread Christopher Barker
If folks want faster processing (checking for, replacing) of NaNs in sequences, a function written in C could be added to the math module. Or the statistics module) Now that I said that, it might make sense to put such a function in the statistics package, for use their anyway. Personally, I thin

[Python-ideas] Re: NAN handling in statistics functions

2021-08-27 Thread Jeff Allen
On 26/08/2021 19:41, Brendan Barnwell wrote: On 2021-08-23 20:53, Steven D'Aprano wrote: So I propose that statistics functions gain a keyword only parameter to specify the desired behaviour when a NAN is found: - raise an exception - return NAN - ignore it (filter out NANs) which seem to be t

[Python-ideas] Re: NAN handling in statistics functions

2021-08-27 Thread Marc-Andre Lemburg
On 27.08.2021 09:58, Serhiy Storchaka wrote: > 26.08.21 12:05, Marc-Andre Lemburg пише: >> Oh, good point. I was under the impression that NAN is handled >> as a singleton. >> >> Perhaps this should be changed to make to make it easier to >> detect NANs ?! > > Even ignoring a NaN payload, there ar

[Python-ideas] Re: NAN handling in statistics functions

2021-08-27 Thread Serhiy Storchaka
26.08.21 12:05, Marc-Andre Lemburg пише: > Oh, good point. I was under the impression that NAN is handled > as a singleton. > > Perhaps this should be changed to make to make it easier to > detect NANs ?! Even ignoring a NaN payload, there are many different NaNs of different types. For example,

[Python-ideas] Re: NAN handling in statistics functions

2021-08-27 Thread Marc-Andre Lemburg
On 27.08.2021 03:24, David Mertz, Ph.D. wrote: > > > On Thu, Aug 26, 2021, 6:46 AM Marc-Andre Lemburg  > > Fair enough. Would it then make sense to at least have all possible NAN > objects compare equal, treating the extra error information as an > attribute > value rather than a di

[Python-ideas] Re: NAN handling in statistics functions

2021-08-26 Thread Steven D'Aprano
On Thu, Aug 26, 2021 at 12:44:18PM +0200, Marc-Andre Lemburg wrote: > Fair enough. Would it then make sense to at least have all possible > NAN objects compare equal, treating the extra error information as an > attribute value rather than a distinct value and perhaps exposing this > as such ? >

[Python-ideas] Re: NAN handling in statistics functions

2021-08-26 Thread David Mertz, Ph.D.
On Thu, Aug 26, 2021, 6:46 AM Marc-Andre Lemburg > Fair enough. Would it then make sense to at least have all possible NAN > objects compare equal, treating the extra error information as an attribute > value rather than a distinct value and perhaps exposing this as such ? > No, no, no! Almost t

[Python-ideas] Re: NAN handling in statistics functions

2021-08-26 Thread Marc-Andre Lemburg
On 26.08.2021 17:36, Christopher Barker wrote: > There have been a number of discussions on this list, and at least one PEP, > about NaN (and other special values).  > > Let’s keep this thread about handling them in the statistics lib. > > But briefly: > > NaNs are weird on purpose, and Python s

[Python-ideas] Re: NAN handling in statistics functions

2021-08-26 Thread Brendan Barnwell
On 2021-08-23 20:53, Steven D'Aprano wrote: So I propose that statistics functions gain a keyword only parameter to specify the desired behaviour when a NAN is found: - raise an exception - return NAN - ignore it (filter out NANs) which seem to be the three most common preference. (It seems t

[Python-ideas] Re: NAN handling in statistics functions

2021-08-26 Thread Christopher Barker
There have been a number of discussions on this list, and at least one PEP, about NaN (and other special values). Let’s keep this thread about handling them in the statistics lib. But briefly: NaNs are weird on purpose, and Python should absolutely not deviate from IEEE. That’s (one reason) Pyt

[Python-ideas] Re: NAN handling in statistics functions

2021-08-26 Thread Marc-Andre Lemburg
On 26.08.2021 12:15, Steven D'Aprano wrote: > On Thu, Aug 26, 2021 at 11:05:01AM +0200, Marc-Andre Lemburg wrote: > >> Oh, good point. I was under the impression that NAN is handled >> as a singleton. > > There are 4503599627370496 distinct quiet NANs (plus about the same > signalling NANs). So

[Python-ideas] Re: NAN handling in statistics functions

2021-08-26 Thread Steven D'Aprano
On Wed, Aug 25, 2021 at 10:40:59PM -0700, Christopher Barker wrote: > On Wed, Aug 25, 2021 at 5:39 PM Finn Mason wrote: > > > Or the NaNs could be treated as zeros and a warning raised: > > > > Absolutely not! NaN in no way means zero, ever. We should never provide a > known incorrect result. I

[Python-ideas] Re: NAN handling in statistics functions

2021-08-26 Thread Steven D'Aprano
On Thu, Aug 26, 2021 at 11:05:01AM +0200, Marc-Andre Lemburg wrote: > Oh, good point. I was under the impression that NAN is handled > as a singleton. There are 4503599627370496 distinct quiet NANs (plus about the same signalling NANs). So it would need to be 4-quadrillion-ton :-) (If anyone is

[Python-ideas] Re: NAN handling in statistics functions

2021-08-26 Thread Marc-Andre Lemburg
On 26.08.2021 10:02, Peter Otten wrote: > On 26/08/2021 09:36, Marc-Andre Lemburg wrote: > >> In Python you can use a simple test for this: > > I think you need math.isnan(). > > nan = float('nan') > l = [1,2,3,nan] > d = {nan:1, 2:3, 4:5, 5:nan} > s = set(l) > nan in l >> Tr

[Python-ideas] Re: NAN handling in statistics functions

2021-08-26 Thread Mark Dickinson
Returning a NaN by default has the advantage of being consistent with IEEE 754 semantics for sequence-based operations (like `sum` and `dot`) and with existing Python `math` module functions like `fsum`, `prod` and `hypot`. In IEEE 754, the majority of operations silently return a NaN (not signa

[Python-ideas] Re: NAN handling in statistics functions

2021-08-26 Thread Peter Otten
On 26/08/2021 09:36, Marc-Andre Lemburg wrote: In Python you can use a simple test for this: I think you need math.isnan(). nan = float('nan') l = [1,2,3,nan] d = {nan:1, 2:3, 4:5, 5:nan} s = set(l) nan in l True That only works with identical nan-s, and because the container omits the e

[Python-ideas] Re: NAN handling in statistics functions

2021-08-26 Thread Marc-Andre Lemburg
On 26.08.2021 02:36, Finn Mason wrote: > Perhaps a warning could be raised but the NaNs are ignored. For example: > > Input: statistics.mean([4, 2, float('nan')]) > Output: [warning blah blah blah] > 3 > > Or the NaNs could be treated as zeros and a warning raised: > > Input: statistics.mean([4,

[Python-ideas] Re: NAN handling in statistics functions

2021-08-25 Thread Christopher Barker
On Wed, Aug 25, 2021 at 5:39 PM Finn Mason wrote: > Or the NaNs could be treated as zeros and a warning raised: > Absolutely not! NaN in no way means zero, ever. We should never provide a known incorrect result. > I do feel there should be a catchable warning but not an outright > exception, a

[Python-ideas] Re: NAN handling in statistics functions

2021-08-25 Thread Finn Mason
Perhaps a warning could be raised but the NaNs are ignored. For example: Input: statistics.mean([4, 2, float('nan')]) Output: [warning blah blah blah] 3 Or the NaNs could be treated as zeros and a warning raised: Input: statistics.mean([4, 2, float('nan')]) Output: [warning blah blah blah] 2 I

[Python-ideas] Re: NAN handling in statistics functions

2021-08-24 Thread Marc-Andre Lemburg
On 24.08.2021 05:53, Steven D'Aprano wrote: > At the moment, the handling of NANs in the statistics module is > implementation dependent. In practice, that *usually* means that if your > data has a NAN in it, the result you get will probably be a NAN. > > >>> statistics.mean([1, 2, float('na

[Python-ideas] Re: NAN handling in statistics functions

2021-08-23 Thread Guido van Rossum
Urgh. That's a nasty dilemma. I propose that the default should be return NAN, since that's what you'd expect if you did the super-naive arithmetic version (e.g. mean(x, y, z) = (x+y+z)/3). On Mon, Aug 23, 2021 at 8:55 PM Steven D'Aprano wrote: > At the moment, the handling of NANs in the statis

[Python-ideas] Re: NAN handling in statistics functions

2021-08-23 Thread Christopher Barker
Note that numpy has a set of nan* functions that ignore NaNs. I’m not suggesting that here, but it is prior art to be considered, and I do like that it explicitly is ignoring NaNs. > - raise an exception > > - return NAN > > - ignore it (filter out NANs) > > Does anyone have any strong feeling

[Python-ideas] Re: NAN handling in statistics functions

2021-08-23 Thread David Mertz, Ph.D.
We had this discussion about a year and a half ago, in which I strongly advocated exactly this keyword argument to median*(). As before, I don't care about the default if there is an option. I don't even really care about the exception case, but don't object to it. On Mon, Aug 23, 2021 at 11:55