Martin v. Löwis wrote:
How do get a printable unicode version of these path strings if they
contain none unicode data?
Define "printable". One way would be to use a regular expression,
replacing all codes in a certain range with a question mark.
What I mean by printable is that the string must
On 30 Apr 2009, at 21:06, Martin v. Löwis wrote:
How do get a printable unicode version of these path strings if
they
contain none unicode data?
Define "printable". One way would be to use a regular expression,
replacing all codes in a certain range with a question mark.
What I mean by pr
>>> How do get a printable unicode version of these path strings if they
>>> contain none unicode data?
>>
>> Define "printable". One way would be to use a regular expression,
>> replacing all codes in a certain range with a question mark.
>
> What I mean by printable is that the string must be va
On 30 Apr 2009, at 05:52, Martin v. Löwis wrote:
How do get a printable unicode version of these path strings if they
contain none unicode data?
Define "printable". One way would be to use a regular expression,
replacing all codes in a certain range with a question mark.
What I mean by prin
> How do get a printable unicode version of these path strings if they
> contain none unicode data?
Define "printable". One way would be to use a regular expression,
replacing all codes in a certain range with a question mark.
> I'm guessing that an app has to understand that filenames come in tw
On 22 Apr 2009, at 07:50, Martin v. Löwis wrote:
If the locale's encoding is UTF-8, the file system encoding is set to
a new encoding "utf-8b". The UTF-8b codec decodes non-decodable bytes
(which must be >= 0x80) into half surrogate codes U+DC80..U+DCFF.
Forgive me if this has been covered.
> How about another str-like type, a sequence of char-or-bytes?
That would be a different PEP. I personally like my own proposal
more, but feel free to propose something different.
Regards,
Martin
--
http://mail.python.org/mailman/listinfo/python-list
How about another str-like type, a sequence of char-or-bytes? Could be
called strbytes or stringwithinvalidcharacters. It would support
whatever subset of str functionality makes sense / is easy to
implement plus a to_escaped_str() method (that does the escaping the
PEP talks about) for people who
> If the bytes are mapped to single half surrogate codes instead of the
> normal pairs (low+high), then I can see that decoding could never be
> ambiguous and encoding could produce the original bytes.
I was confused by Markus Kuhn's original UTF-8b specification. I have
now changed the PEP to avo
Cameron Simpson wrote:
> On 22Apr2009 08:50, Martin v. Löwis wrote:
> | File names, environment variables, and command line arguments are
> | defined as being character data in POSIX;
>
> Specific citation please? I'd like to check the specifics of this.
For example, on environment variables:
h
> Why not use U+DCxx for non-UTF-8 encodings too?
I thought of that, and was tricked into believing that only U+DC8x
is a half surrogate. Now I see that you are right, and have fixed
the PEP accordingly.
Regards,
Martin
--
http://mail.python.org/mailman/listinfo/python-list
2009/4/22 "Martin v. Löwis" :
> To convert non-decodable bytes, a new error handler "python-escape" is
> introduced, which decodes non-decodable bytes using into a private-use
> character U+F01xx, which is believed to not conflict with private-use
> characters that currently exist in Python codecs.
Martin v. Löwis wrote:
MRAB wrote:
Martin v. Löwis wrote:
[snip]
To convert non-decodable bytes, a new error handler "python-escape" is
introduced, which decodes non-decodable bytes using into a private-use
character U+F01xx, which is believed to not conflict with private-use
characters that cu
On Apr 22, 2009, at 2:50 AM, Martin v. Löwis wrote:
I'm proposing the following PEP for inclusion into Python 3.1.
Please comment.
+1. Even if some people still want a low-level bytes API, it's
important that the easy case be easy. That is: the majority of Python
applications should *just
>> The python-escape codec is only used/meaningful if the env encoding
>> is not UTF-8. For any other encoding, it is assumed that no character
>> actually maps to the private-use characters.
>
> Which should be true for any encoding from the pre-unicode era, but not
> for UTF-16/32 and variants.
On 2009-04-22 22:06, Walter Dörwald wrote:
> Martin v. Löwis wrote:
>>> "correct" -> "corrected"
>> Thanks, fixed.
>>
To convert non-decodable bytes, a new error handler "python-escape" is
introduced, which decodes non-decodable bytes using into a private-use
character U+F01xx, which
Martin v. Löwis wrote:
>> "correct" -> "corrected"
>
> Thanks, fixed.
>
>>> To convert non-decodable bytes, a new error handler "python-escape" is
>>> introduced, which decodes non-decodable bytes using into a private-use
>>> character U+F01xx, which is believed to not conflict with private-use
>
MRAB wrote:
> Martin v. Löwis wrote:
> [snip]
>> To convert non-decodable bytes, a new error handler "python-escape" is
>> introduced, which decodes non-decodable bytes using into a private-use
>> character U+F01xx, which is believed to not conflict with private-use
>> characters that currently exi
> "correct" -> "corrected"
Thanks, fixed.
>> To convert non-decodable bytes, a new error handler "python-escape" is
>> introduced, which decodes non-decodable bytes using into a private-use
>> character U+F01xx, which is believed to not conflict with private-use
>> characters that currently exist
Martin v. Löwis wrote:
> I'm proposing the following PEP for inclusion into Python 3.1.
> Please comment.
>
> Regards,
> Martin
>
> PEP: 383
> Title: Non-decodable Bytes in System Character Interfaces
> Version: $Revision: 71793 $
> Last-Modified: $Date: 2009-04-22 08:42:06 +0200 (Mi, 22. Apr 20
Martin v. Löwis wrote:
> I'm proposing the following PEP for inclusion into Python 3.1.
> Please comment.
That seems like a much nicer solution than having parallel bytes/Unicode
APIs everywhere.
When the locale encoding is UTF-8, would UTF-8b also be used for the
command line decoding and enviro
21 matches
Mail list logo