Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Martin v. Löwis
The name utf8b suggested in the PEP is not in line with the codec design Where is that design documented, and how exactly violates the name the design (chapter and verse, please). Error handlers and codecs are two different things, so the namespaces need to be clearly separate. They *are*

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Martin v. Löwis
Stephen J. Turnbull wrote: Martin v. Löwis writes: It occurs to me that the PEP maybe should say that it is an error to have your POSIX locale set to UTF-16 or something like that. No. It is *impossible* to have UTF-16 as the locale character set, not an error. Your

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Martin v. Löwis
Second, I suggest surrogate-replace as the name of the error handler rather than utf8b. I think this is bike-shedding. I don't personally care (I already was aware of UTF-8B), but there are plenty of others who do. I think it is a fairly bad name, because it is easy to confuse

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Martin v. Löwis
Yeah, yeah, this is the same old same old from PEP 3131. Anything that handles the various attacks based on ASCII-alike characters should at least rule out invalid Unicode, too! And where is this U+DC2F supposed to be coming from, anyway? The user's *local* environment or the user's

[Python-Dev] Help on issue 5941

2009-05-06 Thread Tarek Ziadé
Hello, I need some help on http://bugs.python.org/issue5941 The bug is quite simple: the Distutils unixcompiler used to set the archiver command to ar -rc. For quite a while now, this behavior has changed in order to be able to customize the compiler behavior from the environment. That

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Antoine Pitrou
Martin v. Löwis martin at v.loewis.de writes: I don't personally care (I already was aware of UTF-8B), but there are plenty of others who do. I think it is a fairly bad name, because it is easy to confuse it with the surrogates error handler (unless you suggest to rename that also). I

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Stephen J. Turnbull
Martin v. Löwis writes: I fail to see how this could ever matter. If, by media, you mean things like removable disks, and the file name encoding used on them, it's fairly irrelevant for the PEP, since Python won't start using Shift JIS as its file system encoding just because that's the

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread M.-A. Lemburg
Martin v. Löwis wrote: The name utf8b suggested in the PEP is not in line with the codec design Where is that design documented, and how exactly violates the name the design (chapter and verse, please). Martin, I designed the whole Python codec machinery, so even if this is not explicitly

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread MRAB
M.-A. Lemburg wrote: Martin v. Löwis wrote: The name utf8b suggested in the PEP is not in line with the codec design Where is that design documented, and how exactly violates the name the design (chapter and verse, please). Martin, I designed the whole Python codec machinery, so even if this

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Antoine Pitrou
MRAB google at mrabarnett.plus.com writes: Judging by the existing names, I think that 'surrogate' would be reasonable. It already contains the meaning of substitute, Only if you are a native English-speaker I suppose... For me it's just a technical term denoting a certain class of unicode

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Lino Mastrodomenico
2009/5/6 Antoine Pitrou solip...@pitrou.net: By the way, what are the ASCII characters that are not suppported by Shift-JIS? Not many I suppose? (if I read the Wikipedia entry correctly, it's only the backslash and the tilde). The biggest problem with Shift-JIS is that a perfectly valid

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Lennart Regebro
On Wed, May 6, 2009 at 09:31, Martin v. Löwis mar...@v.loewis.de wrote: They *are* separate naemspaces; that's guaranteed by the implementation. Yes. But utf8b *sounds like* an encoding. When it isn't. I sure thought it was when it was first mentioned. I agree that it would be better to find

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Stephen J. Turnbull
Lino Mastrodomenico writes: It's a know problem with Shift-JIS and was fixed in UTF-8. It was fixed in EUC before Shift-JIS was invented by Microsoft or Big5 was invented by the Taiwanese clone makers. Guido's not the only language designer with a time machine

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Stephen J. Turnbull
Martin v. Löwis writes: Yeah, yeah, this is the same old same old from PEP 3131. Anything that handles the various attacks based on ASCII-alike characters should at least rule out invalid Unicode, too! And where is this U+DC2F supposed to be coming from, anyway? The user's

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread R. David Murray
On Wed, 6 May 2009 at 13:40, Antoine Pitrou wrote: Stephen J. Turnbull stephen at xemacs.org writes: Nothing is lost compared to 'strict', true, but under the PEP as it is a large fraction of Shift JIS and Big5 filenames cannot be read under ASCII-compatible file system encodings using

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Zooko Wilcox-O'Hearn
On May 6, 2009, at 7:33 AM, Stephen J. Turnbull wrote: You have convinced me that the PEP should wait as well. In its current form it is incomplete and dangerous. +1 on delaying PEP 383 I think PEP 383 is a good idea in principle, but I'm still struggling to understand it myself, and it

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread James Y Knight
On May 6, 2009, at 5:39 AM, Stephen J. Turnbull wrote: Now, with Python's file system encoding == UTF-8 or any packed EUC, and more than a handful of Shift JIS or Big5 characters in file names, one is *almost certain* to encounter ASCII as the second byte of a multibyte sequence. PEP 383 can't

Re: [Python-Dev] Undocumented change / bug in Python3's PyMapping_Check

2009-05-06 Thread Nick Coghlan
John Millikin wrote: In Python 2, PyMapping_Check will return 0 for list objects. In Python 3, it returns 1. Obviously, this makes it rather difficult to differentiate between mappings and other sized iterables. In addition, it differs from the behavior of the ``collections.Mapping`` ABC --

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Antoine Pitrou
Zooko Wilcox-O'Hearn zooko at zooko.com writes: I'm not thinking of API compatibility as much as data compatibility -- someone used Python 3.1 to write down some filenames, and now a few years later they are trying to use the latest and greatest Python release to read those

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Glenn Linderman
On approximately 5/6/2009 6:33 AM, came the following characters from the keyboard of Stephen J. Turnbull: Martin v. Löwis writes: In any case, Python 3.1b1 may get released today, so it's way too late for new features in the PEP. They can wait for Python 3.2. You have convinced me that the

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Glenn Linderman
On approximately 5/6/2009 3:08 AM, came the following characters from the keyboard of MRAB: M.-A. Lemburg wrote: Martin v. Löwis wrote: Judging by the existing names, I think that 'surrogate' would be reasonable. It already contains the meaning of substitute, it's not too long, and the codes

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Glenn Linderman
On approximately 5/6/2009 12:53 AM, came the following characters from the keyboard of Martin v. Löwis: Sorry! I suggest substituting the paragraph above for the paragraph which begins The encode error handler interface presentlyrequires... at line 129. Ah, ok. This was Glen Linderman's

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Terry Reedy
Glenn Linderman wrote: On approximately 5/6/2009 3:08 AM, came the following characters from the keyboard of MRAB: M.-A. Lemburg wrote: Martin v. Löwis wrote: Judging by the existing names, I think that 'surrogate' would be reasonable. It already contains the meaning of substitute, it's not

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Zooko Wilcox-O'Hearn
On May 6, 2009, at 10:54 AM, Antoine Pitrou wrote: Zooko Wilcox-O'Hearn zooko at zooko.com writes: I'm not thinking of API compatibility as much as data compatibility -- someone used Python 3.1 to write down some filenames, and now a few years later they are trying to use the latest and

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Glenn Linderman
On approximately 5/6/2009 12:18 PM, came the following characters from the keyboard of Zooko Wilcox-O'Hearn: On May 6, 2009, at 10:54 AM, Antoine Pitrou wrote: Zooko Wilcox-O'Hearn zooko at zooko.com writes: I'm not thinking of API compatibility as much as data compatibility -- someone used

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Martin v. Löwis
The name utf8b suggested in the PEP is not in line with the codec design Where is that design documented, and how exactly violates the name the design (chapter and verse, please). Martin, I designed the whole Python codec machinery Not true. PEP 293 was written and designed by Walter

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Martin v. Löwis
I'm sorry for the lack of clarity of my posts, but somehow you're completely missing the point. The point is precisely that Python *won't* use Shift JIS as the file system encoding (if it did there would be no problem with reading Shift JIS), but the people who created the media *did*.

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Martin v. Löwis
Judging by the existing names, I think that 'surrogate' would be reasonable MAL's list of existing names is incomplete. surrogates is already an existing name, also, and it means something different (similar, but different). Regards, Martin ___

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Martin v. Löwis
Terry Reedy wrote: Glenn Linderman wrote: On approximately 5/6/2009 3:08 AM, came the following characters from the keyboard of MRAB: M.-A. Lemburg wrote: Martin v. Löwis wrote: Judging by the existing names, I think that 'surrogate' would be reasonable. It already contains the meaning of

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Martin v. Löwis
Is it only usable with utf8 as an encoding? No, it applies to any codec which potentially cannot decode all bytes 127. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe:

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Antoine Pitrou
Martin v. Löwis martin at v.loewis.de writes: Despite there being also an error handler called surrogates. People, perhaps we could end all the bikeshedding and call one of those handlers surrogates-pass and the other surrogates-escape, which sounds quite faithful to what they actually /do/?

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Martin v. Löwis
But first, it should be stopped by any of several standard precautions. For example, applying os.path.realpath (come to think of it, PEP 383 should say something about realpath, shouldn't it?) Why do you think so? I think the existing documentation of realpath is correct and complete. and

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Martin v. Löwis
Antoine Pitrou wrote: Martin v. Löwis martin at v.loewis.de writes: Despite there being also an error handler called surrogates. People, perhaps we could end all the bikeshedding and call one of those handlers surrogates-pass and the other surrogates-escape, which sounds quite faithful

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Terry Reedy
Martin v. Löwis wrote: +1 for surrogate as the name for the error handler. +1 from me also Despite there being also an error handler called surrogates. Given that additional information which MAL apparently omitted, I would revise. Are you serious? Are you? ;-? You are the one

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Terry Reedy
Martin v. Löwis wrote: Because utf8b (or, perhaps UTF-8b) is the official name for this algorithm: http://hyperreal.org/~est/utf-8b/ Thank you for the link. It starts: This directory contains a C implementation of a UTF-8b codec. A Python codec based on it is provided as well. 'RTF-8b'

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Paul Moore
2009/5/6 Antoine Pitrou solip...@pitrou.net: Martin v. Löwis martin at v.loewis.de writes: Despite there being also an error handler called surrogates. People, perhaps we could end all the bikeshedding and call one of those handlers surrogates-pass and the other surrogates-escape, which

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Terry Reedy
Martin v. Löwis wrote: Antoine Pitrou wrote: Martin v. Löwis martin at v.loewis.de writes: Despite there being also an error handler called surrogates. People, perhaps we could end all the bikeshedding and call one of those handlers surrogates-pass and the other surrogates-escape, which

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Martin v. Löwis
Are you serious? Are you? ;-? You are the one naming a codec-agnostic error handler (if I understand correctly, and correct me if I do not) after a particular codec, and denying that that could cause confusion. See other message. I can only repeat what I said before: I call it utf8b

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread MRAB
Antoine Pitrou wrote: Martin v. Löwis martin at v.loewis.de writes: Despite there being also an error handler called surrogates. People, perhaps we could end all the bikeshedding and call one of those handlers surrogates-pass and the other surrogates-escape, which sounds quite faithful to

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Martin v. Löwis
I qualify with a). I believe I understand c) but, as explained in my other post, I do not think your reason applies. In fact, I think concern for naming rights might suggest that you *not* reuse the name for something different. I would have to learn more about the existing 'surrogates'

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Antoine Pitrou
Martin v. Löwis martin at v.loewis.de writes: py b'\xed\xa0\x80'.decode(utf-8,surrogates) '\ud800' The point is, surrogates does not mean anything intuitive for an /error handler/. You seem to be the only one who finds this name explicit enough, perhaps because you chose it. Most other

Re: [Python-Dev] Proposed: add support for UNC paths to all functions in ntpath

2009-05-06 Thread Mark Hammond
Eric Smith wrote: Mark: I've reviewed this and it looks okay to me. Thanks Eric - I've now applied that patch. As you mentioned in a followup to the bug: | Thanks for looking at this, Mark. If we could only assign issues to | Python 3.2 and 3.3 to change the pending deprecation warning to

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Michael Urman
On Wed, May 6, 2009 at 15:42, Martin v. Löwis mar...@v.loewis.de wrote: Despite there being also an error handler called surrogates. Not that I have to be, but I'm not sold on the previous UTF-8 codec behavior becoming an error handler of the name surrogates for two reasons (I do respect the

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread M.-A. Lemburg
Martin v. Löwis wrote: The name utf8b suggested in the PEP is not in line with the codec design Where is that design documented, and how exactly violates the name the design (chapter and verse, please). Martin, I designed the whole Python codec machinery Not true. PEP 293 was written and

[Python-Dev] test - please ignore

2009-05-06 Thread Benjamin Peterson
Some of my messages appear not to have gotten through. -- Regards, Benjamin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe:

[Python-Dev] [RELEASED] Python 3.1 beta 1

2009-05-06 Thread Benjamin Peterson
On behalf of the Python development team, I'm thrilled to announce the first and only beta release of Python 3.1. Python 3.1 focuses on the stabilization and optimization of features and changes Python 3.0 introduced. For example, the new I/O system has been rewritten in C for speed. File

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Stephen J. Turnbull
Martin v. Löwis writes: Now, with Python's file system encoding == UTF-8 or any packed EUC, and more than a handful of Shift JIS or Big5 characters in file names, one is *almost certain* to encounter ASCII as the second byte of a multibyte sequence. PEP 383 can't handle this Ah, I

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Terry Reedy
Martin v. Löwis wrote: Are you serious? Are you? ;-? You are the one naming a codec-agnostic error handler (if I understand correctly, and correct me if I do not) after a particular codec, and denying that that could cause confusion. See other message. I can only repeat what I said before:

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Glenn Linderman
On approximately 5/6/2009 6:06 PM, came the following characters from the keyboard of M.-A. Lemburg: Martin, please stop being silly and just change the name. Yes, please. If indeed Marc-Andre invented the codec business as he claims, he would be an appropriate person to give a fiat name

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Martin v. Löwis
Michael Urman wrote: On Wed, May 6, 2009 at 15:42, Martin v. Löwis mar...@v.loewis.de wrote: Despite there being also an error handler called surrogates. Not that I have to be, but I'm not sold on the previous UTF-8 codec behavior becoming an error handler of the name surrogates for two

Re: [Python-Dev] PEP 383 update: utf8b is now the error handler

2009-05-06 Thread Martin v. Löwis
The error handler designed with utf-8 in mind has no name in the encode direction and is called utf_8b_decoder_invalid_bytes in the decode direction. By your reasoning, *that* should be its name in Python. The encoding error handler would then be named analogously