[Python-ideas] Re: NAN handling in statistics functions

2021-08-27 Thread David Mertz, Ph.D.
On Sat, Aug 28, 2021, 1:58 AM Steven D'Aprano wrote: > On Sat, Aug 28, 2021 at 01:36:33AM -0400, David Mertz, Ph.D. wrote: > > > I like the statsmodels spelling better: missing : str; Available options > > are ‘none’, ‘drop’, and ‘raise’ > > NANs do not necessarily represent missing data. > I th

[Python-ideas] Re: NAN handling in statistics functions

2021-08-27 Thread Steven D'Aprano
On Sat, Aug 28, 2021 at 01:36:33AM -0400, David Mertz, Ph.D. wrote: > I like the statsmodels spelling better: missing : str; Available options > are ‘none’, ‘drop’, and ‘raise’ NANs do not necessarily represent missing data. -- Steve ___ Python-ideas

[Python-ideas] Re: NAN handling in statistics functions

2021-08-27 Thread David Mertz, Ph.D.
I like the statsmodels spelling better: missing : str; Available options are ‘none’, ‘drop’, and ‘raise’ But this is bikeshed painting if the options exist. However, I WOULD urge the argument to take EITHER a string OR an enum. I don't think any other libraries mentioned do that, but it would ju

[Python-ideas] Re: NAN handling in statistics functions

2021-08-27 Thread Christopher Barker
> SciPy should probably also be a data-point, it uses: > > nan_policy : {'propagate', 'raise', 'omit'}, optional +1 Also +1 on a string flag, rather than an Enum. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Deskto

[Python-ideas] Re: NAN handling in statistics functions

2021-08-27 Thread Sebastian Berg
On Sat, 2021-08-28 at 11:49 +1000, Steven D'Aprano wrote: > On Tue, Aug 24, 2021 at 01:53:51PM +1000, Steven D'Aprano wrote: > > > I've spoken to users of other statistics packages and languages, > > such as > > R, and I cannot find any consensus on what the "right" behaviour > > should > > be f

[Python-ideas] Re: NAN handling in statistics functions

2021-08-27 Thread Steven D'Aprano
On Thu, Aug 26, 2021 at 09:36:27AM +0200, Marc-Andre Lemburg wrote: > Indeed. The NAN handling in median() looks like a bug, more than > anything else: [slightly paraphrased] > >>> l1 = [1,2,nan,4] > >>> l2 = [nan,1,2,4] > > >>> statistics.median(l1) > nan > >>> statistics.median(l2) > 1.5 Looks

[Python-ideas] Re: NAN handling in statistics functions

2021-08-27 Thread Steven D'Aprano
On Tue, Aug 24, 2021 at 01:53:51PM +1000, Steven D'Aprano wrote: > I've spoken to users of other statistics packages and languages, such as > R, and I cannot find any consensus on what the "right" behaviour should > be for NANs except "not that!". > > So I propose that statistics functions gain

[Python-ideas] Re: Complete recursive pickle dump

2021-08-27 Thread Greg Ewing
On 27/08/21 11:27 pm, Evan Greenup via Python-ideas wrote: If it contains function it will persist all its attribute, and code object. For example, if user pickle recursively dump a numpy ndarray into a pickle file, when user pickle load this file from a system which doesn't install numpy, its

[Python-ideas] Re: NAN handling in statistics functions

2021-08-27 Thread Finn Mason
Perhaps a math.hasnan() function for collections could be implemented with binary search? math.hasnan(seq) Though it is true that if you're using datasets large enough to care about speed, you should probably be using the SciPy stack instead of statistics in the first place. On Fri, Aug 27, 2021

[Python-ideas] Re: NAN handling in statistics functions

2021-08-27 Thread Christopher Barker
If folks want faster processing (checking for, replacing) of NaNs in sequences, a function written in C could be added to the math module. Or the statistics module) Now that I said that, it might make sense to put such a function in the statistics package, for use their anyway. Personally, I thin

[Python-ideas] Re: synatx sugar for quickly run shell command and return stdout of shell command as string result

2021-08-27 Thread Finn Mason
> But Python isn't a language that needs everything to be done with command invocations and pipes. If you're doing long pipelines like this, there's probably something wrong. Why use external programs to do things that Python can do far more viably itself? That's a really good point. I think the m

[Python-ideas] Re: synatx sugar for quickly run shell command and return stdout of shell command as string result

2021-08-27 Thread Oleg Broytman
On Thu, Aug 26, 2021 at 09:15:21PM -0600, Finn Mason wrote: > > Is this too magical? > > result = run('cat file.txt') | run('sort) | run('grep hello', > capture_output=True, text=True).stdout > > Interesting idea, especially overloading the union/pipe operator (|). I > like it a lot. It reminds

[Python-ideas] Complete recursive pickle dump

2021-08-27 Thread Evan Greenup via Python-ideas
Currently, pickle can only save some very simple data types into bytes. And the object itself contains reference to some builtin data or variable, only reference will be saved. I am wondering if it is possible recursively persist all the data into pickle representation. Even if some data might

[Python-ideas] Re: NAN handling in statistics functions

2021-08-27 Thread Jeff Allen
On 26/08/2021 19:41, Brendan Barnwell wrote: On 2021-08-23 20:53, Steven D'Aprano wrote: So I propose that statistics functions gain a keyword only parameter to specify the desired behaviour when a NAN is found: - raise an exception - return NAN - ignore it (filter out NANs) which seem to be t

[Python-ideas] Re: NAN handling in statistics functions

2021-08-27 Thread Marc-Andre Lemburg
On 27.08.2021 09:58, Serhiy Storchaka wrote: > 26.08.21 12:05, Marc-Andre Lemburg пише: >> Oh, good point. I was under the impression that NAN is handled >> as a singleton. >> >> Perhaps this should be changed to make to make it easier to >> detect NANs ?! > > Even ignoring a NaN payload, there ar

[Python-ideas] Re: NAN handling in statistics functions

2021-08-27 Thread Serhiy Storchaka
26.08.21 12:05, Marc-Andre Lemburg пише: > Oh, good point. I was under the impression that NAN is handled > as a singleton. > > Perhaps this should be changed to make to make it easier to > detect NANs ?! Even ignoring a NaN payload, there are many different NaNs of different types. For example,

[Python-ideas] Re: NAN handling in statistics functions

2021-08-27 Thread Marc-Andre Lemburg
On 27.08.2021 03:24, David Mertz, Ph.D. wrote: > > > On Thu, Aug 26, 2021, 6:46 AM Marc-Andre Lemburg  > > Fair enough. Would it then make sense to at least have all possible NAN > objects compare equal, treating the extra error information as an > attribute > value rather than a di