Re: [Python-ideas] Fix default encodings on Windows

2016-08-20 Thread Stephen J. Turnbull
Chris Barker writes: > Sure -- but it's entirely unnecessary, yes? If you don't change > your code, you'll get py2(bytes) strings as paths in py2, and py3 > (Unicode) strings as paths on py3. So different, yes. But wouldn't > it all work? The difference is that if you happen to have a file na

Re: [Python-ideas] Fix default encodings on Windows

2016-08-19 Thread Chris Barker
On Fri, Aug 19, 2016 at 12:30 AM, Nick Coghlan wrote: > > So in porting to py3, they would have had to *add* that 'b' (and a bunch > of > > b'filename') to keep the good old bytes is text interface. > > > > Why would anyone do that? > > For a fair amount of *nix-centric code that primarily works

Re: [Python-ideas] Fix default encodings on Windows

2016-08-19 Thread eryk sun
On Thu, Aug 18, 2016 at 3:25 PM, Steve Dower wrote: > allow us to change locale.getpreferredencoding() to utf-8 on Windows _bootlocale.getpreferredencoding would need to be hard coded to return 'utf-8' on Windows. _locale._getdefaultlocale() itself shouldn't return 'utf-8' as the encoding because

Re: [Python-ideas] Fix default encodings on Windows

2016-08-19 Thread Nick Coghlan
On 19 August 2016 at 08:05, Chris Barker wrote: > On Thu, Aug 18, 2016 at 6:23 AM, Steve Dower wrote: >> >> "You consistently ignore Makefiles, .ini, etc." >> >> Do people really do open('makefile', 'rb'), extract filenames and try to >> use them without ever decoding the file contents? > > > I'm

Re: [Python-ideas] Fix default encodings on Windows

2016-08-18 Thread Terry Reedy
On 8/18/2016 1:39 PM, Steve Dower wrote: On 18Aug2016 1036, Terry Reedy wrote: On 8/18/2016 11:25 AM, Steve Dower wrote: In this case, we would announce in 3.6 that using bytes as paths on Windows is no longer deprecated, My understanding is the the first 2 fixes refine the deprecation rathe

Re: [Python-ideas] Fix default encodings on Windows

2016-08-18 Thread Chris Barker
On Thu, Aug 18, 2016 at 6:23 AM, Steve Dower wrote: > "You consistently ignore Makefiles, .ini, etc." > > Do people really do open('makefile', 'rb'), extract filenames and try to > use them without ever decoding the file contents? > I'm sure they do :-( But this has always confused me - back in

Re: [Python-ideas] Fix default encodings on Windows

2016-08-18 Thread eryk sun
On Thu, Aug 18, 2016 at 4:44 PM, Chris Angelico wrote: > On Fri, Aug 19, 2016 at 2:39 AM, eryk sun wrote: >> They're all just characters in the context of Unicode, so I think it's >> clearest to use the character code, e.g.: >> >> The second call to glob has replaced the U+AB00 character with

Re: [Python-ideas] Fix default encodings on Windows

2016-08-18 Thread Steve Dower
On 18Aug2016 1036, Terry Reedy wrote: On 8/18/2016 11:25 AM, Steve Dower wrote: In this case, we would announce in 3.6 that using bytes as paths on Windows is no longer deprecated, My understanding is the the first 2 fixes refine the deprecation rather than reversing it. And #3 simply applie

Re: [Python-ideas] Fix default encodings on Windows

2016-08-18 Thread Terry Reedy
On 8/18/2016 11:25 AM, Steve Dower wrote: In this case, we would announce in 3.6 that using bytes as paths on Windows is no longer deprecated, My understanding is the the first 2 fixes refine the deprecation rather than reversing it. And #3 simply applies it. -- Terry Jan Reedy _

Re: [Python-ideas] Fix default encodings on Windows

2016-08-18 Thread Steve Dower
On 18Aug2016 1018, MRAB wrote: Could we use still call it 'mbcs', but use 'surrogateescape'? surrogateescape is used for escaping undecodable values when you want to represent arbitrary bytes in Unicode. It's the wrong direction for this situation - we are starting with valid Unicode and en

Re: [Python-ideas] Fix default encodings on Windows

2016-08-18 Thread Random832
On Thu, Aug 18, 2016, at 13:18, MRAB wrote: > > If you see an alternative choice to those listed above, feel free to > > contribute it. Otherwise, can we focus the discussion on these (or any > > new) choices? > > > Could we use still call it 'mbcs', but use 'surrogateescape'? Er, this discussion

Re: [Python-ideas] Fix default encodings on Windows

2016-08-18 Thread MRAB
On 2016-08-16 16:56, Steve Dower wrote: I just want to clearly address two points, since I feel like multiple posts have been unclear on them. 1. The bytes API was deprecated in 3.3 and it is listed in https://docs.python.org/3/whatsnew/3.3.html. Lack of mention in the docs is an unfortunate ove

Re: [Python-ideas] Fix default encodings on Windows

2016-08-18 Thread Chris Angelico
On Fri, Aug 19, 2016 at 2:39 AM, eryk sun wrote: > They're all just characters in the context of Unicode, so I think it's > clearest to use the character code, e.g.: > > The second call to glob has replaced the U+AB00 character with '?', > which means ... Technically the character has bee

Re: [Python-ideas] Fix default encodings on Windows

2016-08-18 Thread eryk sun
On Thu, Aug 18, 2016 at 4:07 PM, Steve Dower wrote: > On 18Aug2016 0900, Chris Angelico wrote: >> >> On Fri, Aug 19, 2016 at 1:54 AM, Steve Dower >> wrote: >>> >>> On 18Aug2016 0829, Chris Angelico wrote: The second call to glob doesn't have any Unicode characters at all, the

Re: [Python-ideas] Fix default encodings on Windows

2016-08-18 Thread Chris Angelico
On Fri, Aug 19, 2016 at 2:07 AM, Steve Dower wrote: > I think so, though I find the wording a little awkward (and on rereading, my > original wording was pretty bad). How about: > > "The second call to glob has replaced the Unicode character with '?', which > means the actual filename cannot be re

Re: [Python-ideas] Fix default encodings on Windows

2016-08-18 Thread Steve Dower
On 18Aug2016 0900, Chris Angelico wrote: On Fri, Aug 19, 2016 at 1:54 AM, Steve Dower wrote: On 18Aug2016 0829, Chris Angelico wrote: The second call to glob doesn't have any Unicode characters at all, the way I see it - it's all bytes. Am I completely misunderstanding this? You're not the

Re: [Python-ideas] Fix default encodings on Windows

2016-08-18 Thread Chris Angelico
On Fri, Aug 19, 2016 at 1:54 AM, Steve Dower wrote: > On 18Aug2016 0829, Chris Angelico wrote: >> >> The second call to glob doesn't have any Unicode characters at all, >> the way I see it - it's all bytes. Am I completely misunderstanding >> this? > > > You're not the only one - I think this has

Re: [Python-ideas] Fix default encodings on Windows

2016-08-18 Thread Steve Dower
On 18Aug2016 0829, Chris Angelico wrote: The second call to glob doesn't have any Unicode characters at all, the way I see it - it's all bytes. Am I completely misunderstanding this? You're not the only one - I think this has been the most common misunderstanding. On Windows, the paths as st

Re: [Python-ideas] Fix default encodings on Windows

2016-08-18 Thread Random832
On Thu, Aug 18, 2016, at 11:29, Chris Angelico wrote: > glob.glob('test*') > > ['test\uab00.txt'] > glob.glob(b'test*') > > [b'test?.txt'] > > > > The Unicode character in the second call to glob is missing information. > > Apologies if this is just noise, but I'm a little confused by t

Re: [Python-ideas] Fix default encodings on Windows

2016-08-18 Thread Chris Angelico
On Fri, Aug 19, 2016 at 1:25 AM, Steve Dower wrote: open('test\uAB00.txt', 'wb').close() import glob glob.glob('test*') > ['test\uab00.txt'] glob.glob(b'test*') > [b'test?.txt'] > > The Unicode character in the second call to glob is missing information. You > can observe the s

Re: [Python-ideas] Fix default encodings on Windows

2016-08-18 Thread Steve Dower
Summary for python-dev. This is the email I'm proposing to take over to the main mailing list to get some actual decisions made. As I don't agree with some of the possible recommendations, I want to make sure that they're represented fairly. I also want to summarise the background leading to

Re: [Python-ideas] Fix default encodings on Windows

2016-08-18 Thread Steve Dower
r" Cc: "Paul Moore" ; "Python-Ideas" Subject: Re: [Python-ideas] Fix default encodings on Windows Steve Dower writes: > On 17Aug2016 0235, Stephen J. Turnbull wrote: > > So a full statement is, "How do we best represent Windows file > > system p

Re: [Python-ideas] Fix default encodings on Windows

2016-08-18 Thread eryk sun
On Thu, Aug 18, 2016 at 2:32 AM, Stephen J. Turnbull wrote: > > So it's not just invalid surrogate *pairs*, it's invalid surrogates of > all kinds. This means that it's theoretically possible (though I > gather that it's unlikely in the extreme) for a real Windows filename > to indistinguishable

Re: [Python-ideas] Fix default encodings on Windows

2016-08-17 Thread Stephen J. Turnbull
Steve Dower writes: > On 17Aug2016 0235, Stephen J. Turnbull wrote: > > So a full statement is, "How do we best represent Windows file > > system paths in bytes for interoperability with systems that > > natively represent paths in bytes?" ("Other systems" refers to > > both other platforms

Re: [Python-ideas] Fix default encodings on Windows

2016-08-17 Thread Stephen J. Turnbull
eryk sun writes: > On Wed, Aug 17, 2016 at 9:35 AM, Stephen J. Turnbull > wrote: > > BTW, why "surrogate pairs"? Does Windows validate surrogates to > > ensure they come in pairs, but not necessarily in the right order (or > > perhaps sometimes they resolve to non-characters such as U+1)

Re: [Python-ideas] Fix default encodings on Windows

2016-08-17 Thread Steve Dower
On 17Aug2016 0901, Nick Coghlan wrote: On 17 August 2016 at 02:06, Chris Barker wrote: So the Solution is to either: (A) get everyone to use Unicode "properly", which will work on all platforms (but only on py3.5 and above?) or (B) kludge some *nix-compatible support for byte paths into Wi

Re: [Python-ideas] Fix default encodings on Windows

2016-08-17 Thread Nick Coghlan
On 17 August 2016 at 02:06, Chris Barker wrote: > Just to make sure this is clear, the Pragmatic logic is thus: > > * There are more *nix-centric developers in the Python ecosystem than > Windows-centric (or even Windows-agnostic) developers. > > * The bytes path approach works fine on *nix system

Re: [Python-ideas] Fix default encodings on Windows

2016-08-17 Thread Steve Dower
On 17Aug2016 0235, Stephen J. Turnbull wrote: Paul Moore writes: > On 16 August 2016 at 16:56, Steve Dower wrote: > > This discussion is for the developers who insist on using bytes > > for paths within Python, and the question is, "how do we best > > represent UTF-16 encoded paths in bytes

Re: [Python-ideas] Fix default encodings on Windows

2016-08-17 Thread eryk sun
On Wed, Aug 17, 2016 at 9:35 AM, Stephen J. Turnbull wrote: > BTW, why "surrogate pairs"? Does Windows validate surrogates to > ensure they come in pairs, but not necessarily in the right order (or > perhaps sometimes they resolve to non-characters such as U+1)? A program can pass the filesy

Re: [Python-ideas] Fix default encodings on Windows

2016-08-17 Thread Stephen J. Turnbull
Paul Moore writes: > On 16 August 2016 at 16:56, Steve Dower wrote: > > This discussion is for the developers who insist on using bytes > > for paths within Python, and the question is, "how do we best > > represent UTF-16 encoded paths in bytes?" That's incomplete, AFAICS. (Paul makes this

Re: [Python-ideas] Fix default encodings on Windows

2016-08-16 Thread eryk sun
On Tue, Aug 16, 2016 at 3:56 PM, Steve Dower wrote: > > 2. Windows file system encoding is *always* UTF-16. There's no "assuming > mbcs" or "assuming ACP" or "assuming UTF-8" or "asking the OS what encoding > it is". We know exactly what the encoding is on every supported version of > Windows. UTF

Re: [Python-ideas] Fix default encodings on Windows

2016-08-16 Thread Steve Dower
I've just created http://bugs.python.org/issue27781 with a patch removing use of the *A API from posixmodule.c and changing the default FS encoding to utf-8. Since we're still discussing whether the encoding should be utf-8 or something else, let's keep that here. But if you want to see how th

Re: [Python-ideas] Fix default encodings on Windows

2016-08-16 Thread Steve Dower
On 16Aug2016 1915, Brendan Barnwell wrote: On 2016-08-16 17:14, Steve Dower wrote: The existence of bugs in other applications is not a good reason to help people create new bugs. I haven't been following all the details in this thread, but isn't the whole purpose of this proposed change

Re: [Python-ideas] Fix default encodings on Windows

2016-08-16 Thread Brendan Barnwell
On 2016-08-16 17:14, Steve Dower wrote: The existence of bugs in other applications is not a good reason to help people create new bugs. I haven't been following all the details in this thread, but isn't the whole purpose of this proposed change to accommodate code (apparently on Linux?) tha

Re: [Python-ideas] Fix default encodings on Windows

2016-08-16 Thread Steve Dower
On 16Aug2016 1650, Victor Stinner wrote: 2016-08-17 1:27 GMT+02:00 Steve Dower : filenameb = os.listdir(b'.')[0] # Python 3.5 encodes Unicode (UTF-16) to the ANSI code page # what if Python 3.7 encodes Unicode (UTF-16) to UTF-8? print("filename bytes: %a" % filenameb) proc =

Re: [Python-ideas] Fix default encodings on Windows

2016-08-16 Thread Victor Stinner
2016-08-17 1:27 GMT+02:00 Steve Dower : >> filenameb = os.listdir(b'.')[0] >> # Python 3.5 encodes Unicode (UTF-16) to the ANSI code page >> # what if Python 3.7 encodes Unicode (UTF-16) to UTF-8? >> print("filename bytes: %a" % filenameb) >> >> proc = subprocess.Popen(['py', '-

Re: [Python-ideas] Fix default encodings on Windows

2016-08-16 Thread Steve Dower
On 16Aug2016 1603, Victor Stinner wrote: 2016-08-16 17:56 GMT+02:00 Steve Dower : 2. Windows file system encoding is *always* UTF-16. There's no "assuming mbcs" or "assuming ACP" or "assuming UTF-8" or "asking the OS what encoding it is". We know exactly what the encoding is on every supported v

Re: [Python-ideas] Fix default encodings on Windows

2016-08-16 Thread Victor Stinner
2016-08-16 17:56 GMT+02:00 Steve Dower : > 2. Windows file system encoding is *always* UTF-16. There's no "assuming > mbcs" or "assuming ACP" or "assuming UTF-8" or "asking the OS what encoding > it is". We know exactly what the encoding is on every supported version of > Windows. UTF-16. I think

Re: [Python-ideas] Fix default encodings on Windows

2016-08-16 Thread Paul Moore
On 16 August 2016 at 16:56, Steve Dower wrote: > I just want to clearly address two points, since I feel like multiple posts > have been unclear on them. > > 1. The bytes API was deprecated in 3.3 and it is listed in > https://docs.python.org/3/whatsnew/3.3.html. Lack of mention in the docs is > a

Re: [Python-ideas] Fix default encodings on Windows

2016-08-16 Thread Sven R. Kunze
On 16.08.2016 19:44, Steve Dower wrote: On 16Aug2016 1006, Sven R. Kunze wrote: Question is: are we in a hurry? Has somebody too little time to wait for the "Right Thing" to happen? Not really in a hurry. It's just that I decided to attack a few of the encoding issues on Windows and this was

Re: [Python-ideas] Fix default encodings on Windows

2016-08-16 Thread Steve Dower
On 16Aug2016 1006, Sven R. Kunze wrote: Question is: are we in a hurry? Has somebody too little time to wait for the "Right Thing" to happen? Not really in a hurry. It's just that I decided to attack a few of the encoding issues on Windows and this was one of them. Ideally I'd want to get th

Re: [Python-ideas] Fix default encodings on Windows

2016-08-16 Thread Sven R. Kunze
On 16.08.2016 18:06, Chris Barker wrote: It's clear (to me at least) that (A) it the "Right Thing", but real world experience has shown that it's unlikely to happen any time soon. Practicality beats Purity and all that -- this is a judgment call. Maybe, but even when it takes a lot of time to

Re: [Python-ideas] Fix default encodings on Windows

2016-08-16 Thread Random832
On Tue, Aug 16, 2016, at 12:12, Chris Barker wrote: > * convert and fail on invalid surrogate pairs > > where would an invalid surrogate pair come from? never from a file system > API call, yes? In principle it could, if the filesystem contains a file with an invalid surrogate pair. Nothing else,

Re: [Python-ideas] Fix default encodings on Windows

2016-08-16 Thread Chris Barker
Thanks for the clarity, Steve, a couple questions/thoughts: The choices are: > > * don't represent them at all (remove bytes API) > Would the bytes API be removed on *nix also? > * convert and drop characters not in the (legacy) active code page > * convert and fail on characters not in the (le

Re: [Python-ideas] Fix default encodings on Windows

2016-08-16 Thread Brendan Barnwell
On 2016-08-16 08:56, Steve Dower wrote: I just want to clearly address two points, since I feel like multiple posts have been unclear on them. 1. The bytes API was deprecated in 3.3 and it is listed in https://docs.python.org/3/whatsnew/3.3.html. Lack of mention in the docs is an unfortunate ove

Re: [Python-ideas] Fix default encodings on Windows

2016-08-16 Thread Chris Barker
Just to make sure this is clear, the Pragmatic logic is thus: * There are more *nix-centric developers in the Python ecosystem than Windows-centric (or even Windows-agnostic) developers. * The bytes path approach works fine on *nix systems. * Whatever might be Right and Just -- the reality is th

Re: [Python-ideas] Fix default encodings on Windows

2016-08-16 Thread Steve Dower
I just want to clearly address two points, since I feel like multiple posts have been unclear on them. 1. The bytes API was deprecated in 3.3 and it is listed in https://docs.python.org/3/whatsnew/3.3.html. Lack of mention in the docs is an unfortunate oversight, but it was certainly announced

Re: [Python-ideas] Fix default encodings on Windows

2016-08-16 Thread Guido van Rossum
I am going to mute this thread but I am worried about the outcome. Once there is agreement please check with me first. --Guido (mobile) ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Co

Re: [Python-ideas] Fix default encodings on Windows

2016-08-16 Thread Chris Barker - NOAA Federal
> There also seems to be an undercurrent in the discussions we're having > now that using bytes paths and not unicode paths is somehow The Right > Thing for unix-like OSes, Almost -- from my perusing of discussions from the last few years, there do seem to be some library developers and *nix affec

Re: [Python-ideas] Fix default encodings on Windows

2016-08-16 Thread Random832
On Tue, Aug 16, 2016, at 09:59, Paul Moore wrote: > It probably should be. Although if we're changing the deprecation to a > behaviour change, then maybe there's no point. But some of the > arguments here about breaking code are hinging on the idea that people > currently using the bytes API are us

Re: [Python-ideas] Fix default encodings on Windows

2016-08-16 Thread Paul Moore
On 16 August 2016 at 14:09, eryk sun wrote: > On Tue, Aug 16, 2016 at 10:53 AM, Paul Moore wrote: >> >> Having said all this, I can't find the documentation stating that >> bytes paths are deprecated - the open() documentation for 3.5 says >> "file is either a string or bytes object giving the pa

Re: [Python-ideas] Fix default encodings on Windows

2016-08-16 Thread Steve Dower
sted from my Windows Phone -Original Message- From: "Paul Moore" Sent: ‎8/‎16/‎2016 3:54 To: "Nick Coghlan" Cc: "python-ideas" Subject: Re: [Python-ideas] Fix default encodings on Windows On 15 August 2016 at 19:26, Steve Dower wrote: > Passing path_as_byt

Re: [Python-ideas] Fix default encodings on Windows

2016-08-16 Thread Stephen J. Turnbull
Nick Coghlan writes: > At an ecosystem level, that means we're faced with a choice between > implicitly encouraging folks to make their code *nix only, and > finding a way to provide a more *nix like experience when running > on Windows (where UTF-8 encoded binary data just works, and either

Re: [Python-ideas] Fix default encodings on Windows

2016-08-16 Thread eryk sun
On Tue, Aug 16, 2016 at 10:53 AM, Paul Moore wrote: > > Having said all this, I can't find the documentation stating that > bytes paths are deprecated - the open() documentation for 3.5 says > "file is either a string or bytes object giving the pathname (absolute > or relative to the current worki

Re: [Python-ideas] Fix default encodings on Windows

2016-08-16 Thread Paul Moore
On 15 August 2016 at 19:26, Steve Dower wrote: > Passing path_as_bytes in that location has been deprecated since 3.3, so we > are well within our rights (and probably overdue) to make it a TypeError in > 3.6. While it's obviously an invalid assumption, for the purposes of > changing the language

Re: [Python-ideas] Fix default encodings on Windows

2016-08-16 Thread Victor Stinner
2016-08-16 8:06 GMT+02:00 eryk sun : > My proposal was to use the wide-character APIs, but transcoding CP_ACP > without best-fit characters and raising a warning whenever the default > character is used (e.g. substituting Katakana middle dot when creating > a file using a bytes path that has an inv

Re: [Python-ideas] Fix default encodings on Windows

2016-08-15 Thread eryk sun
>> On Mon, Aug 15, 2016 at 6:26 PM, Steve Dower >> wrote: > > and using the *W APIs exclusively is the right way to go. My proposal was to use the wide-character APIs, but transcoding CP_ACP without best-fit characters and raising a warning whenever the default character is used (e.g. substitutin

Re: [Python-ideas] Fix default encodings on Windows

2016-08-15 Thread Nick Coghlan
On 16 August 2016 at 11:34, Chris Barker - NOAA Federal wrote: >> Given that, I'm proposing adding support for using byte strings encoded with >> UTF-8 in file system functions on Windows. This allows Python users to omit >> switching code like: >> >> if os.name == 'nt': >>f = os.stat(os.lis

Re: [Python-ideas] Fix default encodings on Windows

2016-08-15 Thread Steve Dower
On 15Aug2016 1819, eryk sun wrote: On Mon, Aug 15, 2016 at 6:26 PM, Steve Dower wrote: (Frankly I don't mind what encoding we use, and I'd be quite happy to force bytes paths to be UTF-16-LE encoded, which would also round-trip invalid surrogate pairs. But that would prevent basic manipulatio

Re: [Python-ideas] Fix default encodings on Windows

2016-08-15 Thread Chris Barker - NOAA Federal
> Given that, I'm proposing adding support for using byte strings encoded with > UTF-8 in file system functions on Windows. This allows Python users to omit > switching code like: > > if os.name == 'nt': >f = os.stat(os.listdir('.')[-1]) > else: >f = os.stat(os.listdir(b'.')[-1]) REALLY?

Re: [Python-ideas] Fix default encodings on Windows

2016-08-15 Thread eryk sun
On Mon, Aug 15, 2016 at 6:26 PM, Steve Dower wrote: > > (Frankly I don't mind what encoding we use, and I'd be quite happy to force > bytes > paths to be UTF-16-LE encoded, which would also round-trip invalid surrogate > pairs. But that would prevent basic manipulation which seems to be a higher

Re: [Python-ideas] Fix default encodings on Windows

2016-08-15 Thread Steve Dower
On 15Aug2016 1126, Steve Dower wrote: My proposal is to remove all use of the *A APIs and only use the *W APIs. That completely removes the (already deprecated) use of bytes as paths. I then propose to change the (unused on Windows) sys.getfsdefaultencoding() to 'utf-8' and handle bytes being pas

Re: [Python-ideas] Fix default encodings on Windows

2016-08-15 Thread Steve Dower
On 15Aug2016 0954, Random832 wrote: On Mon, Aug 15, 2016, at 12:35, Steve Dower wrote: I'm still not sure we're talking about the same thing right now. For `open(path_as_bytes).read()`, are we talking about the way path_as_bytes is passed to the file system? Or the codec used to decide the retu

Re: [Python-ideas] Fix default encodings on Windows

2016-08-15 Thread Random832
On Mon, Aug 15, 2016, at 12:35, Steve Dower wrote: > I'm still not sure we're talking about the same thing right now. > > For `open(path_as_bytes).read()`, are we talking about the way > path_as_bytes is passed to the file system? Or the codec used to decide > the returned string? We are talking

Re: [Python-ideas] Fix default encodings on Windows

2016-08-15 Thread Steve Dower
- From: "Random832" Sent: ‎8/‎15/‎2016 6:41 To: "Steve Dower" ; "Stephen J. Turnbull" Cc: "Victor Stinner" ; "python-ideas" Subject: Re: [Python-ideas] Fix default encodings on Windows On Mon, Aug 15, 2016, at 09:23, Steve Dower wrote: > I

Re: [Python-ideas] Fix default encodings on Windows

2016-08-15 Thread Random832
On Mon, Aug 15, 2016, at 09:23, Steve Dower wrote: > I guess I'm not sure what your question is then. > > Using text internally is of course the best way to deal with it. But for > those who insist on using bytes, this change at least makes Windows a > feasible target without requiring manual enco

Re: [Python-ideas] Fix default encodings on Windows

2016-08-15 Thread Steve Dower
om my Windows Phone -Original Message- From: "Stephen J. Turnbull" Sent: ‎8/‎14/‎2016 22:06 To: "Steve Dower" Cc: "Victor Stinner" ; "python-ideas" ; "Random832" Subject: RE: [Python-ideas] Fix default encodings on Windows Steve Dower writes:

Re: [Python-ideas] Fix default encodings on Windows

2016-08-14 Thread Stephen J. Turnbull
Steve Dower writes: > I plan to use only Unicode to interact with the OS and then utf8 > within Python if the caller wants bytes. This doesn't answer Victor's questions, or mine. This proposal requires identifying and transcoding bytes that represent text in encodings other than UTF-8. 1. Ho

Re: [Python-ideas] Fix default encodings on Windows

2016-08-14 Thread Steve Dower
om my Windows Phone -Original Message- From: "Victor Stinner" Sent: ‎8/‎14/‎2016 9:20 To: "Steve Dower" Cc: "Stephen J. Turnbull" ; "python-ideas" ; "Random832" Subject: Re: [Python-ideas] Fix default encodings on Windows > The last

Re: [Python-ideas] Fix default encodings on Windows

2016-08-14 Thread Victor Stinner
> The last point is correct: if you get bytes from a file system API, you should be able to pass them back in without losing information. CP_ACP (a.k.a. the *A API) does not allow this, so I'm proposing using the *W API everywhere and encoding to utf-8 when the user wants/gives bytes. You get trou

Re: [Python-ideas] Fix default encodings on Windows

2016-08-13 Thread Steve Dower
osted from my Windows Phone -Original Message- From: "Stephen J. Turnbull" Sent: ‎8/‎13/‎2016 12:11 To: "Random832" Cc: "python-ideas@python.org" Subject: Re: [Python-ideas] Fix default encodings on Windows Random832 writes: > And what's going t

Re: [Python-ideas] Fix default encodings on Windows

2016-08-13 Thread Stephen J. Turnbull
Random832 writes: > And what's going to happen if you shovel those bytes into the > filesystem without conversion on Linux, or worse, OSX? Off topic. See Subject: field. > This proposal embodies an assumption that bytes from unknown sources > used as filenames are more likely to be UTF-8 th

Re: [Python-ideas] Fix default encodings on Windows

2016-08-13 Thread Steve Dower
On 13Aug2016 0523, Random832 wrote: On Sat, Aug 13, 2016, at 04:12, Stephen J. Turnbull wrote: Steve Dower writes: > ISTM that changing sys.getfilesystemencoding() on Windows to > "utf-8" and updating path_converter() (Python/posixmodule.c; I think this proposal requires the assumption that s

Re: [Python-ideas] Fix default encodings on Windows

2016-08-13 Thread Steve Dower
Just a heads-up that I've assigned http://bugs.python.org/issue1602 to myself and started a patch for the console changes. Let's move the console discussion back over there. Hopefully it will show up in 3.6.0b1, but if you're prepared to apply a patch and test on Windows, feel free to grab my

Re: [Python-ideas] Fix default encodings on Windows

2016-08-13 Thread Random832
On Sat, Aug 13, 2016, at 04:12, Stephen J. Turnbull wrote: > Steve Dower writes: > > ISTM that changing sys.getfilesystemencoding() on Windows to > > "utf-8" and updating path_converter() (Python/posixmodule.c; > > I think this proposal requires the assumption that strings intended to > be inter

Re: [Python-ideas] Fix default encodings on Windows

2016-08-13 Thread Adam Bartoš
Stephen J. Turnbull writes: > The exception is the proposed console changes, because there you *do* > perform all I/O with OS APIs. But I don't know anything about the > Windows console except that nobody seems happy with it. > > I'm quite happy with it. I mean, it's far from perfect, and when yo

Re: [Python-ideas] Fix default encodings on Windows

2016-08-13 Thread Adam Bartoš
On Fri Aug 12 19:03:38 EDT 2016 Victor Stinner wrote: > For the Windows console: I played with all Windows functions, tried all > fonts and many code pages. I also read technical blog articles of Microsoft > employees. I gave up on this issue. It doesn't seem possible to support > fully Unicode th

[Python-ideas] Fix default encodings on Windows

2016-08-13 Thread Stephen J. Turnbull
Steve Dower writes: > ISTM that changing sys.getfilesystemencoding() on Windows to > "utf-8" and updating path_converter() (Python/posixmodule.c; I think this proposal requires the assumption that strings intended to be interpreted as file names invariably come from the Windows APIs. I don't t

Re: [Python-ideas] Fix default encodings on Windows

2016-08-12 Thread eryk sun
On Fri, Aug 12, 2016 at 2:20 PM, Random832 wrote: > On Wed, Aug 10, 2016, at 14:10, Steve Dower wrote: >> * force the console encoding to UTF-8 on initialize and revert on >> finalize >> >> So what are your concerns? Suggestions? > > As far as I know, the single biggest problem caused by the statu

Re: [Python-ideas] Fix default encodings on Windows

2016-08-12 Thread Victor Stinner
Le 10 août 2016 20:16, "Steve Dower" a écrit : > So what are your concerns? Suggestions? Add a new option specific to Windows to switch to UTF-8 everywhere, use BOM, whatever you want, *but* don't change the defaults. IMO mbcs encoding is the least worst encoding for the default. I have an idea

Re: [Python-ideas] Fix default encodings on Windows

2016-08-12 Thread Victor Stinner
Hello, I'm in holiday and I'm writing on a phone, so sorry in advance for the short answer. In short: we should drop support for the bytes API. Just use Unicode on all platforms, especially for filenames. Sorry but most of these changes look like very bad ideas. Or maybe I misunderstood somethin

Re: [Python-ideas] Fix default encodings on Windows

2016-08-12 Thread Chris Barker
On Fri, Aug 12, 2016 at 10:19 AM, Paul Moore wrote: > > In which case, something IS better than nothing. > > > I'm not arguing that we do nothing. Are you saying we should use > CP_UTF8 *in preference* to wide character APIs? Or that we should > implement CP_UTF8 first and then wide chars later?

Re: [Python-ideas] Fix default encodings on Windows

2016-08-12 Thread tritium-list
> -Original Message- > From: Python-ideas [mailto:python-ideas-bounces+tritium- > list=sdamon@python.org] On Behalf Of Paul Moore > Sent: Friday, August 12, 2016 9:42 AM > To: eryk sun > Cc: python-ideas > Subject: Re: [Python-ideas] Fix default encodings

Re: [Python-ideas] Fix default encodings on Windows

2016-08-12 Thread Random832
On Fri, Aug 12, 2016, at 12:24, Adam Bartoš wrote: > There is no buffer just on those wrapping streams because the bytes I > have are not in UTF-8. Adding one would mean a fake buffer that just > decodes and writes to the text stream. AFAIK there is no guarantee > that sys.std* objects have buffer

Re: [Python-ideas] Fix default encodings on Windows

2016-08-12 Thread Paul Moore
On 12 August 2016 at 18:05, Chris Barker wrote: > On Fri, Aug 12, 2016 at 6:41 AM, Paul Moore wrote: >> >> I >> understand Steve's point about being an improvement over 100% wrong, >> but we've lived with the current state of affairs long enough that I >> think we should take whatever time is ne

Re: [Python-ideas] Fix default encodings on Windows

2016-08-12 Thread Chris Barker
On Fri, Aug 12, 2016 at 6:41 AM, Paul Moore wrote: > I > understand Steve's point about being an improvement over 100% wrong, > but we've lived with the current state of affairs long enough that I > think we should take whatever time is needed to do it right, Sure -- but his is such a freakin'

Re: [Python-ideas] Fix default encodings on Windows

2016-08-12 Thread Adam Bartoš
*On Fri Aug 12 11:33:35 EDT 2016, * *Random832 wrote:*> On Wed, Aug 10, 2016, at 15:08, Steve Dower wrote: >>* That's the hope, though that module approaches the solution differently *>>* and may still uses. An alternative way for us to fix this whole thing *>>* would be to bring win_unicode_conso

Re: [Python-ideas] Fix default encodings on Windows

2016-08-12 Thread Random832
On Wed, Aug 10, 2016, at 15:08, Steve Dower wrote: > That's the hope, though that module approaches the solution differently > and may still uses. An alternative way for us to fix this whole thing > would be to bring win_unicode_console into the standard library and use > it by default (or proba

Re: [Python-ideas] Fix default encodings on Windows

2016-08-12 Thread Random832
On Wed, Aug 10, 2016, at 14:10, Steve Dower wrote: > * force the console encoding to UTF-8 on initialize and revert on > finalize > > So what are your concerns? Suggestions? As far as I know, the single biggest problem caused by the status quo for console encoding is "some string containing chara

Re: [Python-ideas] Fix default encodings on Windows

2016-08-12 Thread Paul Moore
On 12 August 2016 at 13:38, eryk sun wrote: >> ... At this point what codepage does Python see? What codepage does >> process X see? (Note that they are both sharing the same console). > > The input and output codepages are global data in conhost.exe. They > aren't tracked for each attached proces

Re: [Python-ideas] Fix default encodings on Windows

2016-08-12 Thread Steve Dower
isn't that much of a simplification. I have some airport/aeroplane time today where I can experiment. Top-posted from my Windows Phone -Original Message- From: "eryk sun" Sent: ‎8/‎12/‎2016 5:40 To: "python-ideas" Subject: Re: [Python-ideas] Fix default encoding

Re: [Python-ideas] Fix default encodings on Windows

2016-08-12 Thread eryk sun
On Thu, Aug 11, 2016 at 9:07 AM, Paul Moore wrote: > set codepage to UTF-8 > ... > set codepage back > spawn subprocess X, but don't wait for it > set codepage to UTF-8 > ... > ... At this point what codepage does Python see? What codepage does > process X see? (Note that they are both sharing the

Re: [Python-ideas] Fix default encodings on Windows

2016-08-12 Thread eryk sun
Thu, Aug 11, 2016 at 6:41 PM, Adam Bartoš wrote: > The transcoding wrappers with 'utf-8' encoding are used just as a work > around the fact that Python tokenizer cannot use utf-16-le and that the > readlinehook machinery is unfortunately bytes-based. The tanscoding wrapper > just has encoding 'utf

Re: [Python-ideas] Fix default encodings on Windows

2016-08-11 Thread Adam Bartoš
Eryk Sun wrote: > IMO, Python needs a C implementation of the win_unicode_console > module, using the wide-character APIs ReadConsoleW and WriteConsoleW. > Note that this sets sys.std*.encoding as UTF-8 and transcodes, so > Python code never has to work directly with UTF-16 encoded text. > > The t

Re: [Python-ideas] Fix default encodings on Windows

2016-08-11 Thread Adam Bartoš
> > On 11 August 2016 at 04:10, Steve Dower > wrote: > >>* I suspect there's a lot of discussion to be had around this topic, so I > >>want to get it started. There are some fairly drastic ideas here and I need > >>help figuring out whether

Re: [Python-ideas] Fix default encodings on Windows

2016-08-11 Thread Chris Angelico
On Fri, Aug 12, 2016 at 1:31 AM, Steve Dower wrote: > My big concern is the console... I think that change is inevitably going to > have to break someone, but I need to map out the possibilities first to > figure out just how bad it'll be. Obligatory XKCD: https://xkcd.com/1172/ Subprocess invoc

Re: [Python-ideas] Fix default encodings on Windows

2016-08-11 Thread Steve Dower
t just how bad it'll be. Top-posted from my Windows Phone -Original Message- From: "Random832" Sent: ‎8/‎11/‎2016 7:54 To: "python-ideas@python.org" Subject: Re: [Python-ideas] Fix default encodings on Windows On Thu, Aug 11, 2016, at 10:25, Steven D'Apra

Re: [Python-ideas] Fix default encodings on Windows

2016-08-11 Thread Random832
On Thu, Aug 11, 2016, at 10:25, Steven D'Aprano wrote: > > Interesting. Are you assuming that a text file cannot be empty? > > Hmmm... not consciously, but I guess I was. > > If the file is empty, how do you know it's text? Heh. That's the *other* thing that Notepad does wrong in the opinion of

Re: [Python-ideas] Fix default encodings on Windows

2016-08-11 Thread Steven D'Aprano
On Thu, Aug 11, 2016 at 02:09:00PM +1000, Chris Angelico wrote: > On Thu, Aug 11, 2016 at 1:14 PM, Steven D'Aprano wrote: > > The way I have done auto-detection based on BOMs is you start by reading > > four bytes from the file in binary mode. (If there are fewer than four > > bytes, it cannot be

Re: [Python-ideas] Fix default encodings on Windows

2016-08-11 Thread Paul Moore
On 11 August 2016 at 01:41, Chris Angelico wrote: > I've almost never seen files stored in UTF-32 (even UTF-16 isn't all > that common compared to UTF-8), so I wouldn't stress too much about > that. Recognizing FE FF or FF FE and decoding as UTF-16 might be worth > doing, but it could easily be re

  1   2   >