Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-07 Thread Greg Ewing
Victor Stinner wrote: Users don't use stdin and stdout as regular files, they are more used as pipes to pass data between programs with the Unix pipe in a shell like "producer | consumer". Sometimes stdout is redirected to a file, but I consider that it is expected to behave as a pipe and the

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-07 Thread Chris Barker - NOAA Federal
I’m a bit confused: File names and the like are one thing, and the CONTENTS of files is quite another. I get that there is theoretically a “default” encoding for the contents of text files, but that is SO likely to be wrong as to be ignorable. open() already defaults to utf-8. Which is a fine

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-07 Thread Glenn Linderman
On 12/7/2017 5:45 PM, Jonathan Goble wrote: On Thu, Dec 7, 2017 at 8:38 PM Glenn Linderman > wrote: If it were to be changed, one could add a text-mode option in 3.7, say "t" in the mode string, and a PendingDeprecationWarning for

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-07 Thread Jonathan Goble
On Thu, Dec 7, 2017 at 8:38 PM Glenn Linderman wrote: > If it were to be changed, one could add a text-mode option in 3.7, say "t" > in the mode string, and a PendingDeprecationWarning for open calls without > the specification of either t or b in the mode string. > "t"

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-07 Thread Glenn Linderman
On 12/7/2017 4:48 PM, Victor Stinner wrote: Ok, now comes the real question, open(). For open(), I used the example of a code snippet *writing* the content of a directory (os.listdir) into a text file. Another example is to read filenames from a text files but pass-through undecodable bytes

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-07 Thread Victor Stinner
2017-12-08 0:26 GMT+01:00 Guido van Rossum : > You will quickly get decoding errors, and that is INADA's point. (Unless you > use encoding='Latin-1'.) His worry is that the surrogateescape error handler > makes it so that you won't get decoding errors, and then the failure mode

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-07 Thread Guido van Rossum
On Thu, Dec 7, 2017 at 3:02 PM, Victor Stinner wrote: > 2017-12-06 5:07 GMT+01:00 INADA Naoki : > > And opening binary file without "b" option is very common mistake of new > > developers. If default error handler is surrogateescape, they lose a

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-07 Thread Victor Stinner
2017-12-06 5:07 GMT+01:00 INADA Naoki : > And opening binary file without "b" option is very common mistake of new > developers. If default error handler is surrogateescape, they lose a chance > to notice their bug. To come back to your original point, I didn't know that

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-07 Thread Victor Stinner
While I'm not strongly convinced that open() error handler must be changed for surrogateescape, first I would like to make sure that it's really a very bad idea because changing it :-) 2017-12-07 7:49 GMT+01:00 INADA Naoki : > I just came up with crazy idea; changing

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread INADA Naoki
> I care only about builtin open()'s behavior. > PEP 538 doesn't change default error handler of open(). > > I think PEP 538 and PEP 540 should behave almost identical except > changing locale > or not. So I need very strong reason if PEP 540 changes default error > handler of open(). > I just

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Nick Coghlan
On 7 December 2017 at 08:20, Victor Stinner wrote: > 2017-12-06 23:07 GMT+01:00 Antoine Pitrou : >> One question: how do you plan to test for the POSIX locale? > > I'm not sure. I will probably rely on Nick for that ;-) Nick already > implemented

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Nick Coghlan
On 7 December 2017 at 01:59, Jakub Wilk wrote: > * Nick Coghlan , 2017-12-06, 16:15: >> The one that's relevant to default locale detection is just the string >> that "setlocale(LC_CTYPE, NULL)" returns. > > POSIX doesn't require any particular return value

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Antoine Pitrou
On Thu, 7 Dec 2017 00:22:52 +0100 Victor Stinner wrote: > 2017-12-06 23:36 GMT+01:00 Antoine Pitrou : > > Other than that, +1 on the PEP. > > Naoki doesn't seem to be confortable with the usage of the > surrogateescape error handler by default for

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Victor Stinner
2017-12-06 23:36 GMT+01:00 Antoine Pitrou : > Other than that, +1 on the PEP. Naoki doesn't seem to be confortable with the usage of the surrogateescape error handler by default for open(). Are you ok with that? If yes, would you mind to explain why? :-) Victor

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Antoine Pitrou
On Wed, 6 Dec 2017 23:20:41 +0100 Victor Stinner wrote: > 2017-12-06 23:07 GMT+01:00 Antoine Pitrou : > > One question: how do you plan to test for the POSIX locale? > > I'm not sure. I will probably rely on Nick for that ;-) Nick already >

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Victor Stinner
2017-12-06 23:07 GMT+01:00 Antoine Pitrou : > One question: how do you plan to test for the POSIX locale? I'm not sure. I will probably rely on Nick for that ;-) Nick already implemented this exact check for his PEP 538 which is already implemented in Python 3.7. I already

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Antoine Pitrou
On Wed, 6 Dec 2017 01:49:41 +0100 Victor Stinner wrote: > Hi, > > I knew that I had to rewrite my PEP 540, but I was too lazy. Since > Guido explicitly requested a shorter PEP, here you have! > > https://www.python.org/dev/peps/pep-0540/ > > Trust me, it's the same

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Greg Ewing
Victor Stinner wrote: Maybe the "UTF-8 Mode" should be renamed to "UTF-8 with surrogateescape, or backslashreplace for stderr, or surrogatepass for fsencode/fsencode on Windows, or strict for Strict UTF-8 Mode"... But the PEP title would be too long, no? :-) Relaxed UTF-8 Mode?

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Brett Cannon
On Wed, 6 Dec 2017 at 06:10 INADA Naoki wrote: > >> And I have one worrying point. > >> With UTF-8 mode, open()'s default encoding/error handler is > >> UTF-8/surrogateescape. > > > > The Strict UTF-8 Mode is for you if you prioritize correctness over > usability. > >

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Jakub Wilk
* Nick Coghlan , 2017-12-06, 16:15: Something I've just noticed that needs to be clarified: on Linux, "C" locale and "POSIX" locale are aliases, but this isn't true in general (e.g. it's not the case on *BSD systems, including Mac OS X). For those of us with little to no

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread INADA Naoki
>> And I have one worrying point. >> With UTF-8 mode, open()'s default encoding/error handler is >> UTF-8/surrogateescape. > > The Strict UTF-8 Mode is for you if you prioritize correctness over usability. Yes, but as I said, I cares about not experienced developer who doesn't know what UTF-8

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Nick Coghlan
On 6 December 2017 at 20:38, Victor Stinner wrote: > Nick: >> So if PEP 540 is going to implicitly trigger switching encodings, it >> needs to specify whether it's going to look for the C locale or the >> POSIX locale (I'd suggest C locale, since that's the actual

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Victor Stinner
Nick: > So if PEP 540 is going to implicitly trigger switching encodings, it > needs to specify whether it's going to look for the C locale or the > POSIX locale (I'd suggest C locale, since that's the actual default > that causes problems). I'm thinking at the test already used by

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-06 Thread Victor Stinner
Hi Naoki, 2017-12-06 5:07 GMT+01:00 INADA Naoki : > Oh, revised version is really short! > > And I have one worrying point. > With UTF-8 mode, open()'s default encoding/error handler is > UTF-8/surrogateescape. The Strict UTF-8 Mode is for you if you prioritize

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-05 Thread Nick Coghlan
On 6 December 2017 at 16:18, Glenn Linderman wrote: > "b" mostly matters on Windows, correct? And Windows doesn't use C or POSIX > locale, correct? And if these are correct, then is this an issue? And if so, > why? In Python 3, "b" matters everywhere, since it controls

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-05 Thread Glenn Linderman
On 12/5/2017 8:07 PM, INADA Naoki wrote: Oh, revised version is really short! And I have one worrying point. With UTF-8 mode, open()'s default encoding/error handler is UTF-8/surrogateescape. Containers are really growing. PyCharm supports Docker and many new Python developers use Docker

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-05 Thread Nick Coghlan
On 6 December 2017 at 15:59, Chris Angelico wrote: > On Wed, Dec 6, 2017 at 4:46 PM, Nick Coghlan wrote: >> Something I've just noticed that needs to be clarified: on Linux, "C" >> locale and "POSIX" locale are aliases, but this isn't true in general >>

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-05 Thread Chris Angelico
On Wed, Dec 6, 2017 at 4:46 PM, Nick Coghlan wrote: > Something I've just noticed that needs to be clarified: on Linux, "C" > locale and "POSIX" locale are aliases, but this isn't true in general > (e.g. it's not the case on *BSD systems, including Mac OS X). For those of us

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-05 Thread Nick Coghlan
Something I've just noticed that needs to be clarified: on Linux, "C" locale and "POSIX" locale are aliases, but this isn't true in general (e.g. it's not the case on *BSD systems, including Mac OS X). To handle that in PEP 538, I made it clear that everything is keyed specifically off the "C"

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-05 Thread INADA Naoki
Oh, revised version is really short! And I have one worrying point. With UTF-8 mode, open()'s default encoding/error handler is UTF-8/surrogateescape. Containers are really growing. PyCharm supports Docker and many new Python developers use Docker instead of installing Python directly on their

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-05 Thread INADA Naoki
I'm sorry about my laziness. I've very busy these months, but I'm back to OSS world from today. While I should review carefully again, I think I'm close to accept PEP 540. * PEP 540 really helps containers and old Linux machines PEP 538 doesn't work. And containers is really important for

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-05 Thread Nick Coghlan
On 6 December 2017 at 11:01, Victor Stinner wrote: >> Annex: Differences between the PEP 538 and the PEP 540 >> == >> >> The PEP 538 uses the "C.UTF-8" locale which is quite new and only >> supported by a few Linux

Re: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-05 Thread Victor Stinner
> Annex: Differences between the PEP 538 and the PEP 540 > == > > The PEP 538 uses the "C.UTF-8" locale which is quite new and only > supported by a few Linux distributions; this locale is not currently > supported by FreeBSD or macOS for

[Python-Dev] PEP 540: Add a new UTF-8 mode (v2)

2017-12-05 Thread Victor Stinner
Hi, I knew that I had to rewrite my PEP 540, but I was too lazy. Since Guido explicitly requested a shorter PEP, here you have! https://www.python.org/dev/peps/pep-0540/ Trust me, it's the same PEP, but focused on the most important information and with a shorter rationale ;-) Full text below.