Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Glenn Linderman
On approximately 4/25/2009 5:35 AM, came the following characters from the keyboard of Martin v. Löwis: Because the encoding is not reliably reversible. Why do you say that? The encoding is completely reversible (unless we disagree on what reversible means). I'm +1 on the concept, -1 on the

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Glenn Linderman
On approximately 4/25/2009 5:22 AM, came the following characters from the keyboard of Martin v. Löwis: The problem with this, and other preceding schemes that have been discussed here, is that there is no means of ascertaining whether a particular file name str was obtained from a str API, or

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Cameron Simpson
On 26Apr2009 23:39, Glenn Linderman v+pyt...@g.nevcal.com wrote: [...snip...] There are still issues regarding how Windows and POSIX programs that are sharing cross-mounted file systems might communicate file names between each other, which is not at all clear from the PEP. If this is an

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Glenn Linderman
On approximately 4/27/2009 12:55 AM, came the following characters from the keyboard of Cameron Simpson: On 26Apr2009 23:39, Glenn Linderman v+pyt...@g.nevcal.com wrote: [...snip...] There are still issues regarding how Windows and POSIX programs that are sharing cross-mounted file systems

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Antoine Pitrou
Stephen J. Turnbull stephen at xemacs.org writes: If you see a broken encoding once, you're likely to see it a million times (spammers have the most broken software) or maybe have it raise an unhandled Exception a dozen times (in rate of using busted software, the spammers are closely

Re: [Python-Dev] Dropping bytes support in json

2009-04-27 Thread Damien Diederen
Hello, Antoine Pitrou solip...@pitrou.net writes: Hello, We're in the process of forward-porting the recent (massive) json updates to 3.1, and we are also thinking of dropping remnants of support of the bytes type in the json library (in 3.1, again). This bytes support almost didn't work

Re: [Python-Dev] Dropping bytes support in json

2009-04-27 Thread Eric Smith
I couldn't figure out a way to get rid of it short of multi-#including templates and playing with the C preprocessor, however, and have the nagging feeling the latter would be frowned upon by the maintainers. Not sure if this is exactly what you mean, but look at Objects/stringlib.

Re: [Python-Dev] Dropping bytes support in json

2009-04-27 Thread Bob Ippolito
On Mon, Apr 27, 2009 at 7:25 AM, Damien Diederen d...@crosstwine.com wrote: Antoine Pitrou solip...@pitrou.net writes: Hello, We're in the process of forward-porting the recent (massive) json updates to 3.1, and we are also thinking of dropping remnants of support of the bytes type in the

Re: [Python-Dev] Dropping bytes support in json

2009-04-27 Thread Antoine Pitrou
Damien Diederen dd at crosstwine.com writes: I couldn't figure out a way to get rid of it short of multi-#including templates and playing with the C preprocessor, however, and have the nagging feeling the latter would be frowned upon by the maintainers. There is a precedent with

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Stephen J. Turnbull
Antoine Pitrou writes: I'm not sure how mail being stuck in a pipeline has anything to do with Martin's proposal (which deals with file paths, not with SMTP...). I hate to break it to you, but most stages of mail processing have very little to do with SMTP. In particular, processing MIME

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Paul Moore
2009/4/27 Stephen J. Turnbull step...@xemacs.org: I believe there are solutions that don't have that problem. Specifically, if the return values were bytes, or (better for 2.x, where bytes are strings as far as most programmers are concerned) as a new data type, to indicate that they're not

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System?Character?Interfaces

2009-04-27 Thread Aahz
On Mon, Apr 27, 2009, Antoine Pitrou wrote: Stephen J. Turnbull stephen at xemacs.org writes: If you see a broken encoding once, you're likely to see it a million times (spammers have the most broken software) or maybe have it raise an unhandled Exception a dozen times (in rate of using

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System C haracter Interfaces

2009-04-27 Thread Antoine Pitrou
Stephen J. Turnbull stephen at xemacs.org writes: I hate to break it to you, but most stages of mail processing have very little to do with SMTP. In particular, processing MIME attachments often requires dealing with file names. AFAIK, the file name is only there as an indication for the

Re: [Python-Dev] Dropping bytes support in json

2009-04-27 Thread Damien Diederen
Hi Antoine, Antoine Pitrou solip...@pitrou.net writes: Damien Diederen dd at crosstwine.com writes: I couldn't figure out a way to get rid of it short of multi-#including templates and playing with the C preprocessor, however, and have the nagging feeling the latter would be frowned upon by

[Python-Dev] 2.6.2 Vista installer failure on upgrade from 2.6.1

2009-04-27 Thread Jim Kleckner
I went to upgrade a Vista machine from 2.6.1 to 2.6.2 and got error 2755 with the message system cannot open the device or file. I uninstalled 2.6.1, removing all residual files also, and got the error message again. When I ran msiexec as follows to get a log, it magically worked: msiexec

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Stephen J. Turnbull
Paul Moore writes: 2009/4/27 Stephen J. Turnbull step...@xemacs.org: I believe there are solutions that don't have that problem. Specifically, if the return values were bytes, or (better for 2.x, where bytes are strings as far as most programmers are concerned) as a new data type, to

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System C haracter Interfaces

2009-04-27 Thread Stephen J. Turnbull
Antoine Pitrou writes: or (better for 2.x, where bytes are strings as far as most programmers are concerned) as a new data type, I'm -1 on any new string-like type (for file paths or whatever else) with custom encoding/decoding semantics. It's the best way to ruin the clean

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Tony Nelson
At 23:39 -0700 04/26/2009, Glenn Linderman wrote: On approximately 4/25/2009 5:35 AM, came the following characters from the keyboard of Martin v. Löwis: Because the encoding is not reliably reversible. Why do you say that? The encoding is completely reversible (unless we disagree on what

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System C haracter Interfaces

2009-04-27 Thread Tony Nelson
At 16:09 + 04/27/2009, Antoine Pitrou wrote: Stephen J. Turnbull stephen at xemacs.org writes: I hate to break it to you, but most stages of mail processing have very little to do with SMTP. In particular, processing MIME attachments often requires dealing with file names. AFAIK, the

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Antoine Pitrou
Stephen J. Turnbull stephen at xemacs.org writes: Excuse me, but I can't see a scheme that encodes bytes as Unicodes but only sometimes as a clean separation. Yet it is. Filenames are all unicode, without exception, and there's no implicit conversion to bytes. That's a clean separation. So

Re: [Python-Dev] UTF-8 Decoder

2009-04-27 Thread Jeroen Ruigrok van der Werven
-On [20090414 16:43], Antoine Pitrou (solip...@pitrou.net) wrote: If you have some time on your hands, you could try benchmarking it against Python 3.1's (py3k) decoder. There are two cases to consider: Bjoern actually did it himself already:

Re: [Python-Dev] UTF-8 Decoder

2009-04-27 Thread Antoine Pitrou
Jeroen Ruigrok van der Werven asmodai at in-nomine.org writes: So on medium and large datasets the decoder of Bjoern is very interesting, but the tiny case (just Bjoern's name) is quite a tad bit slower. The other cases seems more typical of what the average use in Python would be. Keep in

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Martin v. Löwis
It's a private use area. It will never carry an official character assignment. I know that U+F - U+F is a private use area. I don't find a definition of U+F01xx to know what the notation means. Are you picking a particular character within the private use area, or a particular

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Martin v. Löwis
There are still issues regarding how Windows and POSIX programs that are sharing cross-mounted file systems might communicate file names between each other, which is not at all clear from the PEP. If this is an insoluble or un-addressed issue, it should be stated. (It is probably

Re: [Python-Dev] 2.6.2 Vista installer failure on upgrade from 2.6.1

2009-04-27 Thread Martin v. Löwis
Jim Kleckner wrote: I went to upgrade a Vista machine from 2.6.1 to 2.6.2 and got error 2755 with the message system cannot open the device or file. I uninstalled 2.6.1, removing all residual files also, and got the error message again. When I ran msiexec as follows to get a log, it

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Cameron Simpson
On 27Apr2009 00:07, Glenn Linderman v+pyt...@g.nevcal.com wrote: On approximately 4/25/2009 5:22 AM, came the following characters from the keyboard of Martin v. Löwis: The problem with this, and other preceding schemes that have been discussed here, is that there is no means of ascertaining

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Simon Cross
On Mon, Apr 27, 2009 at 9:48 PM, Martin v. Löwis mar...@v.loewis.de wrote: As Cameron says: it's out of the scope of the PEP. It really depends how the operating system deals with them. Most likely, the files are not accessible - not only not from Python, but also not accessible from any other

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Martin v. Löwis
$ touch $'\xFF\xAA\xFF' $ vi $'\xFF\xAA\xFF' $ egrep foo $'\xFF\xAA\xFF' All worked fine from my Bash shell with locale encoding set to UTF-8. I can also open the created file from the GNOME editor file dialog (it even tells me the filename is not valid in my locale's encoding). The Nedit

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System C haracter Interfaces

2009-04-27 Thread Antoine Pitrou
Simon Cross hodgestar+pythondev at gmail.com writes: $ touch $'\xFF\xAA\xFF' $ vi $'\xFF\xAA\xFF' $ egrep foo $'\xFF\xAA\xFF' All worked fine from my Bash shell with locale encoding set to UTF-8. The PEP is precisely about making py3k able to better handle these files (right now

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System C haracter Interfaces

2009-04-27 Thread Michael Foord
Stephen J. Turnbull wrote: Antoine Pitrou writes: or (better for 2.x, where bytes are strings as far as most programmers are concerned) as a new data type, I'm -1 on any new string-like type (for file paths or whatever else) with custom encoding/decoding semantics. It's the best

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Glenn Linderman
On approximately 4/27/2009 12:42 PM, came the following characters from the keyboard of Martin v. Löwis: It's a private use area. It will never carry an official character assignment. I know that U+F - U+F is a private use area. I don't find a definition of U+F01xx to know what the

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Glenn Linderman
On approximately 4/27/2009 12:48 PM, came the following characters from the keyboard of Martin v. Löwis: There are still issues regarding how Windows and POSIX programs that are sharing cross-mounted file systems might communicate file names between each other, which is not at all clear from

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Steven D'Aprano
On Tue, 28 Apr 2009 04:13:47 am Antoine Pitrou wrote: Stephen J. Turnbull stephen at xemacs.org writes: ... So what you'll get here, AFAICS, is a new situation where many Windows-centric programmers will produce code that's incapable of dealing with non-Unicode input because they don't have

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Cameron Simpson
On 27Apr2009 23:27, Simon Cross hodgestar+python...@gmail.com wrote: | On Mon, Apr 27, 2009 at 9:48 PM, Martin v. Löwis mar...@v.loewis.de wrote: | As Cameron says: it's out of the scope of the PEP. It really depends how | the operating system deals with them. Most likely, the files are not |

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Benjamin Peterson
2009/4/27 Cameron Simpson c...@zip.com.au: I think that, almost independent of this PEP, there should be an os.fsencode() function that takes a byte string (as a POSIX OS call will take) and performs the _same_ byte-string encoding that listdir() and friends are doing under the hood. And a

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Glenn Linderman
On approximately 4/27/2009 2:14 PM, came the following characters from the keyboard of Cameron Simpson: On 27Apr2009 00:07, Glenn Linderman v+pyt...@g.nevcal.com wrote: On approximately 4/25/2009 5:22 AM, came the following characters from the keyboard of Martin v. Löwis: The problem

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Cameron Simpson
On 27Apr2009 21:48, Martin v. L�wis mar...@v.loewis.de wrote: | There are still issues regarding how Windows and POSIX programs that | are sharing cross-mounted file systems might communicate file names | between each other, which is not at all clear from the PEP. If this | is an insoluble

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Glenn Linderman
On approximately 4/27/2009 5:42 PM, came the following characters from the keyboard of Cameron Simpson: I think that, almost independent of this PEP, there should be an os.fsencode() function that takes a byte string (as a POSIX OS call will take) and performs the _same_ byte-string encoding

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Cameron Simpson
On 27Apr2009 18:15, Glenn Linderman v+pyt...@g.nevcal.com wrote: The problem with this, and other preceding schemes that have been discussed here, is that there is no means of ascertaining whether a particular file name str was obtained from a str API, or was funny- decoded from a bytes API...

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Benjamin Peterson
2009/4/27 Cameron Simpson c...@zip.com.au: PROPOSAL: add to the PEP the following functions:  os.fsdecode(bytes) - funny-encoded Unicode    This is what os.listdir() does to produce the strings it hands out.  os.fsencode(funny-string) - bytes    This is what open(filename,..) does to turn

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Martin v. Löwis
Glenn Linderman wrote: On approximately 4/27/2009 12:42 PM, came the following characters from the keyboard of Martin v. Löwis: It's a private use area. It will never carry an official character assignment. I know that U+F - U+F is a private use area. I don't find a definition of

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Martin v. Löwis
I'm not suggesting the PEP should solve the problem of mounting foreign file systems, although if it doesn't it should probably point that out. I'm just suggesting that if the people that write software to solve the problem of mounting foreign file systems have already solved the naming

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Cameron Simpson
On 27Apr2009 21:58, Benjamin Peterson benja...@python.org wrote: | 2009/4/27 Cameron Simpson c...@zip.com.au: | PROPOSAL: add to the PEP the following functions: [...] | and for me, I would like to see: |  os.setfilesystemencoding(coding) | | Currently os.getfilesystemencoding() returns you

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Martin v. Löwis
I don't understand what you're saying. py3k filenames are all unicode, even on POSIX systems, How is that possible on POSIX systems where the underlying file system uses bytes for filenames? If I write a piece of Python code: filename = 'some path/some name' I might call it

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Stephen J. Turnbull
Michael Foord writes: The problem you don't address, which is still the reality for most programmers (especially Mac OS X where filesystem encoding is UTF 8), is that programmers *are* going to treat filenames as strings. The proposed PEP allows that to work for them - whatever

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Stephen J. Turnbull
Tony Nelson writes: At 16:09 + 04/27/2009, Antoine Pitrou wrote: Stephen J. Turnbull stephen at xemacs.org writes: I hate to break it to you, but most stages of mail processing have very little to do with SMTP. In particular, processing MIME attachments often requires dealing

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread James Y Knight
On Apr 27, 2009, at 11:35 PM, Martin v. Löwis wrote: No. You seem to assume that all bytes 128 decode successfully always. I believe this assumption is wrong, in general: py \x1b$B' \x1b(B.decode(iso-2022-jp) #2.x syntax Traceback (most recent call last): File stdin, line 1, in module

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Glenn Linderman
On approximately 4/27/2009 8:35 PM, came the following characters from the keyboard of Martin v. Löwis: Glenn Linderman wrote: On approximately 4/27/2009 12:42 PM, came the following characters from the keyboard of Martin v. Löwis: It's a private use area. It will never carry an official

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Robert Collins
On Mon, 2009-04-27 at 22:25 -0700, Glenn Linderman wrote: Indeed, that was the missing piece. I'd forgotten about the encodings that use escape sequences, rather than UTF-8, and DBCS. I don't think those encodings are permitted by POSIX file systems, but I suppose they could sneak in

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-27 Thread Glenn Linderman
On approximately 4/27/2009 8:39 PM, came the following characters from the keyboard of Martin v. Löwis: I'm not suggesting the PEP should solve the problem of mounting foreign file systems, although if it doesn't it should probably point that out. I'm just suggesting that if the people that