Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-30 Thread norseman
Martin v. Löwis wrote: How do get a printable unicode version of these path strings if they contain none unicode data? Define "printable". One way would be to use a regular expression, replacing all codes in a certain range with a question mark. What I mean by printable is that the string must

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-30 Thread Barry Scott
On 30 Apr 2009, at 21:06, Martin v. Löwis wrote: How do get a printable unicode version of these path strings if they contain none unicode data? Define "printable". One way would be to use a regular expression, replacing all codes in a certain range with a question mark. What I mean by pr

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-30 Thread Martin v. Löwis
>>> How do get a printable unicode version of these path strings if they >>> contain none unicode data? >> >> Define "printable". One way would be to use a regular expression, >> replacing all codes in a certain range with a question mark. > > What I mean by printable is that the string must be va

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-30 Thread Barry Scott
On 30 Apr 2009, at 05:52, Martin v. Löwis wrote: How do get a printable unicode version of these path strings if they contain none unicode data? Define "printable". One way would be to use a regular expression, replacing all codes in a certain range with a question mark. What I mean by prin

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Martin v. Löwis
> How do get a printable unicode version of these path strings if they > contain none unicode data? Define "printable". One way would be to use a regular expression, replacing all codes in a certain range with a question mark. > I'm guessing that an app has to understand that filenames come in tw

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Barry Scott
On 22 Apr 2009, at 07:50, Martin v. Löwis wrote: If the locale's encoding is UTF-8, the file system encoding is set to a new encoding "utf-8b". The UTF-8b codec decodes non-decodable bytes (which must be >= 0x80) into half surrogate codes U+DC80..U+DCFF. Forgive me if this has been covered.

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-26 Thread Martin v. Löwis
> How about another str-like type, a sequence of char-or-bytes? That would be a different PEP. I personally like my own proposal more, but feel free to propose something different. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-26 Thread Adrian
How about another str-like type, a sequence of char-or-bytes? Could be called strbytes or stringwithinvalidcharacters. It would support whatever subset of str functionality makes sense / is easy to implement plus a to_escaped_str() method (that does the escaping the PEP talks about) for people who

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-25 Thread Martin v. Löwis
> If the bytes are mapped to single half surrogate codes instead of the > normal pairs (low+high), then I can see that decoding could never be > ambiguous and encoding could produce the original bytes. I was confused by Markus Kuhn's original UTF-8b specification. I have now changed the PEP to avo

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-25 Thread Martin v. Löwis
Cameron Simpson wrote: > On 22Apr2009 08:50, Martin v. Löwis wrote: > | File names, environment variables, and command line arguments are > | defined as being character data in POSIX; > > Specific citation please? I'd like to check the specifics of this. For example, on environment variables: h

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-24 Thread Martin v. Löwis
> Why not use U+DCxx for non-UTF-8 encodings too? I thought of that, and was tricked into believing that only U+DC8x is a half surrogate. Now I see that you are right, and have fixed the PEP accordingly. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-24 Thread Lino Mastrodomenico
2009/4/22 "Martin v. Löwis" : > To convert non-decodable bytes, a new error handler "python-escape" is > introduced, which decodes non-decodable bytes using into a private-use > character U+F01xx, which is believed to not conflict with private-use > characters that currently exist in Python codecs.

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-23 Thread MRAB
Martin v. Löwis wrote: MRAB wrote: Martin v. Löwis wrote: [snip] To convert non-decodable bytes, a new error handler "python-escape" is introduced, which decodes non-decodable bytes using into a private-use character U+F01xx, which is believed to not conflict with private-use characters that cu

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-23 Thread James Y Knight
On Apr 22, 2009, at 2:50 AM, Martin v. Löwis wrote: I'm proposing the following PEP for inclusion into Python 3.1. Please comment. +1. Even if some people still want a low-level bytes API, it's important that the easy case be easy. That is: the majority of Python applications should *just

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-22 Thread Martin v. Löwis
>> The python-escape codec is only used/meaningful if the env encoding >> is not UTF-8. For any other encoding, it is assumed that no character >> actually maps to the private-use characters. > > Which should be true for any encoding from the pre-unicode era, but not > for UTF-16/32 and variants.

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-22 Thread M.-A. Lemburg
On 2009-04-22 22:06, Walter Dörwald wrote: > Martin v. Löwis wrote: >>> "correct" -> "corrected" >> Thanks, fixed. >> To convert non-decodable bytes, a new error handler "python-escape" is introduced, which decodes non-decodable bytes using into a private-use character U+F01xx, which

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-22 Thread Walter Dörwald
Martin v. Löwis wrote: >> "correct" -> "corrected" > > Thanks, fixed. > >>> To convert non-decodable bytes, a new error handler "python-escape" is >>> introduced, which decodes non-decodable bytes using into a private-use >>> character U+F01xx, which is believed to not conflict with private-use >

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-22 Thread Martin v. Löwis
MRAB wrote: > Martin v. Löwis wrote: > [snip] >> To convert non-decodable bytes, a new error handler "python-escape" is >> introduced, which decodes non-decodable bytes using into a private-use >> character U+F01xx, which is believed to not conflict with private-use >> characters that currently exi

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-22 Thread Martin v. Löwis
> "correct" -> "corrected" Thanks, fixed. >> To convert non-decodable bytes, a new error handler "python-escape" is >> introduced, which decodes non-decodable bytes using into a private-use >> character U+F01xx, which is believed to not conflict with private-use >> characters that currently exist

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-22 Thread Walter Dörwald
Martin v. Löwis wrote: > I'm proposing the following PEP for inclusion into Python 3.1. > Please comment. > > Regards, > Martin > > PEP: 383 > Title: Non-decodable Bytes in System Character Interfaces > Version: $Revision: 71793 $ > Last-Modified: $Date: 2009-04-22 08:42:06 +0200 (Mi, 22. Apr 20

Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-22 Thread Nick Coghlan
Martin v. Löwis wrote: > I'm proposing the following PEP for inclusion into Python 3.1. > Please comment. That seems like a much nicer solution than having parallel bytes/Unicode APIs everywhere. When the locale encoding is UTF-8, would UTF-8b also be used for the command line decoding and enviro