Re: [Python-3000] Unicode IDs -- why NFC? Why allow ligatures?

2007-06-06 Thread Rauli Ruohonen
On 6/6/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote: > No. The point is that people want to use their current tools; they > may not be able to easily specify normalization. > Please look through the list (I've already done so; I'm speaking from > detailed examination of the data) and state w

Re: [Python-3000] Unicode IDs -- why NFC? Why allow ligatures?

2007-06-06 Thread Hagen Fürstenau
Stephen J. Turnbull writes: > > http://www.unicode.org/versions/corrigendum3.html suggests that many > > of the Hangul are either pronunciation guide variants or even exact > > duplicates (that were presumably missed when the canonicalization was > > frozen?) > > I'll have to ask some Koreans

Re: [Python-3000] Unicode IDs -- why NFC? Why allow ligatures?

2007-06-06 Thread Stephen J. Turnbull
Rauli Ruohonen writes: > There are some cases where users might in the future want to make > a distinction between "compatibility" characters, such as these: > http://en.wikipedia.org/wiki/Mathematical_alphanumeric_symbols I don't think they belong in identifiers in a general purpose programmi

[Python-3000] String comparison

2007-06-06 Thread Rauli Ruohonen
(Martin's right, it's not good to discuss this in the huge PEP 3131 thread, so I'm changing the subject line) On 6/6/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote: > In the language of these standards, I would expect that string > comparison is exactly the kind of higher-level process they hav

[Python-3000] String comparison

2007-06-06 Thread Stephen J. Turnbull
Rauli Ruohonen writes: > Strings are internal to Python. This is a whole separate issue from > normalization of source code or its parts (such as identifiers). Agreed. But please note that we're not talking about representation. We're talking about the result of evaluating a comparison: i

Re: [Python-3000] String comparison

2007-06-06 Thread Josiah Carlson
"Stephen J. Turnbull" <[EMAIL PROTECTED]> wrote: > Rauli Ruohonen writes: > > > Strings are internal to Python. This is a whole separate issue from > > normalization of source code or its parts (such as identifiers). > > Agreed. But please note that we're not talking about representation. > W

Re: [Python-3000] String comparison

2007-06-06 Thread Guido van Rossum
On 6/6/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote: > Rauli Ruohonen writes: > > > Strings are internal to Python. This is a whole separate issue from > > normalization of source code or its parts (such as identifiers). > > Agreed. But please note that we're not talking about representatio

Re: [Python-3000] String comparison

2007-06-06 Thread Bill Janssen
> Hear me out for a moment. People type what they want. I do a lot of Pythonic processing of UTF-8, which is not "typed by people", but instead extracted from documents by automated processing. Text is also data -- an important thing to keep in mind. As far as normalization goes, I agree with yo

Re: [Python-3000] String comparison

2007-06-06 Thread Guido van Rossum
On 6/6/07, Bill Janssen <[EMAIL PROTECTED]> wrote: > > Hear me out for a moment. People type what they want. > > I do a lot of Pythonic processing of UTF-8, which is not "typed by > people", but instead extracted from documents by automated processing. > Text is also data -- an important thing to

Re: [Python-3000] String comparison

2007-06-06 Thread Rauli Ruohonen
On 6/6/07, Guido van Rossum <[EMAIL PROTECTED]> wrote: > Why should the lexer apply normalization to literals behind my back? The lexer shouldn't, but NFC normalizing the source before the lexer sees it would be slightly more robust and standards-compliant. This is because technically an editor or

Re: [Python-3000] String comparison

2007-06-06 Thread Martin v. Löwis
> FWIW, I don't buy that normalization is expensive, as most strings are > in NFC form anyway, and there are fast checks for that (see UAX#15, > "Detecting Normalization Forms"). Python does not currently have > a fast path for this, but if it's added, then normalizing everything > to NFC should be

Re: [Python-3000] String comparison

2007-06-06 Thread Martin v. Löwis
Guido van Rossum schrieb: > Clearly we will have a normalization routine so the > lexer can normalize identifiers, so if you need normalized data it is > as simple as writing 'XXX'.normalize() (or whatever the spelling > should be). It's actually in Python already, and spelled as unicodedata.norma

Re: [Python-3000] String comparison

2007-06-06 Thread Stephen J. Turnbull
Guido van Rossum writes: > But I'm not about to change the == operator to apply normalization > first. It would affect too much (e.g. hashing). Yah, that's one reason why Jim Jewett and I lean to normalizing on the way in for explicitly Unicode data. But since that's not going to happen, I gue

Re: [Python-3000] String comparison

2007-06-06 Thread Martin v. Löwis
> > But I'm not about to change the == operator to apply normalization > > first. It would affect too much (e.g. hashing). > > Yah, that's one reason why Jim Jewett and I lean to normalizing on the > way in for explicitly Unicode data. But since that's not going to > happen, I guess the thing i

Re: [Python-3000] String comparison

2007-06-06 Thread Guido van Rossum
On 6/6/07, Rauli Ruohonen <[EMAIL PROTECTED]> wrote: > On 6/6/07, Guido van Rossum <[EMAIL PROTECTED]> wrote: > > Why should the lexer apply normalization to literals behind my back? > > The lexer shouldn't, but NFC normalizing the source before the lexer > sees it would be slightly more robust and

Re: [Python-3000] String comparison

2007-06-06 Thread Josiah Carlson
Bill Janssen <[EMAIL PROTECTED]> wrote: > > > Hear me out for a moment. People type what they want. > > I do a lot of Pythonic processing of UTF-8, which is not "typed by > people", but instead extracted from documents by automated processing. > Text is also data -- an important thing to keep i

Re: [Python-3000] [Python-Dev] PEP 367: New Super

2007-06-06 Thread Guido van Rossum
On 5/31/07, Phillip J. Eby <[EMAIL PROTECTED]> wrote: > At 07:48 PM 5/31/2007 +0800, Guido van Rossum wrote: > >I've updated the patch; the latest version now contains the grammar > >and compiler changes needed to make super a keyword and to > >automatically add a required parameter 'super' when su

Re: [Python-3000] PEP 3131 roundup

2007-06-06 Thread Jim Jewett
On 6/5/07, Steve Howell <[EMAIL PROTECTED]> wrote: > > --- Jim Jewett <[EMAIL PROTECTED]> wrote: > > Ideally, either that equivalence would also include > > compatibility, or > > else characters whose compatibility and canonical > > equivalents are > > different would be banned for use in identifi

Re: [Python-3000] Conservative Defaults (was: Re: Support for PEP 3131)

2007-06-06 Thread Baptiste Carvello
BJörn Lindqvist a écrit : >> Those most eager for unicode identifiers are afraid that people >> (particularly beginning students) won't be able to use local-script >> identifiers, unless it is the default. My feeling is that the teacher >> (or the person who pointed them to python) can change the

Re: [Python-3000] PEP 3131 roundup

2007-06-06 Thread Jim Jewett
On 6/6/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > > I think "obvious" referred to the reasoning, not the outcome. > > I can tell that the decision was "NFC, anything goes", but I don't see why. > I think I'm repeating myself: Because UAX 31 says so. That's it. There > is a standard that e

[Python-3000] Renumbering old PEPs now targeted for Py3k?

2007-06-06 Thread Guido van Rossum
A few PEPs with numbers < 400 are now targeting Python 3000, e.g. PEP 367 (new super) and PEP 344 (exception chaining). Are there any others? I propose that we renumber these to numbers in the 3100+ range. I can see two forms of renaming: (a) 344 -> 3344 and 367 -> 3367, i.e. add 3000 to the numbe

Re: [Python-3000] Renumbering old PEPs now targeted for Py3k?

2007-06-06 Thread Collin Winter
On 6/6/07, Guido van Rossum <[EMAIL PROTECTED]> wrote: > A few PEPs with numbers < 400 are now targeting Python 3000, e.g. PEP > 367 (new super) and PEP 344 (exception chaining). Are there any > others? I propose that we renumber these to numbers in the 3100+ > range. I can see two forms of renamin

Re: [Python-3000] Unicode IDs -- why NFC? Why allow ligatures?

2007-06-06 Thread Jim Jewett
On 6/5/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote: > A scan of the full table for Unicode Version 2.0 ... 'n (Afrikaans), and I asked a friend who speaks Afrikaans; apparently it is more a word than a letter. """ ʼn is derived from the Dutch word en which means "a" in English. The ` is in

Re: [Python-3000] Unicode IDs -- why NFC? Why allow ligatures?

2007-06-06 Thread Jim Jewett
On 6/6/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote: > Jim Jewett writes: > > Depends on what you mean by technical symbols. ... The math > > versions (generally 1D400 - 1DC7B) are included. But > > http://unicode.org/reports/tr39/data/xidmodifications.txt suggests > > excluding them ag

Re: [Python-3000] String comparison

2007-06-06 Thread Jim Jewett
On 6/6/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote: > Rauli Ruohonen writes: > > FWIW, I don't buy that normalization is expensive, as most strings are > > in NFC form anyway, and there are fast checks for that (see UAX#15, > > "Detecting Normalization Forms"). Python does not currently h

Re: [Python-3000] String comparison

2007-06-06 Thread Jim Jewett
On 6/6/07, Guido van Rossum <[EMAIL PROTECTED]> wrote: > > about normalization of data strings. The big issue is string literals. > > I think I agree with Stephen here: > > u"L\u00F6wis" == u"Lo\u0308wis" > > should be True (assuming he typed it correctly in the first place :-), > > because

Re: [Python-3000] String comparison

2007-06-06 Thread Guido van Rossum
On 6/6/07, Jim Jewett <[EMAIL PROTECTED]> wrote: > On 6/6/07, Guido van Rossum <[EMAIL PROTECTED]> wrote: > > > > about normalization of data strings. The big issue is string literals. > > > I think I agree with Stephen here: > > > > u"L\u00F6wis" == u"Lo\u0308wis" > > > > should be True (assu

Re: [Python-3000] String comparison

2007-06-06 Thread Jim Jewett
On 6/6/07, Guido van Rossum <[EMAIL PROTECTED]> wrote: > On 6/6/07, Rauli Ruohonen <[EMAIL PROTECTED]> wrote: > > On 6/6/07, Guido van Rossum <[EMAIL PROTECTED]> wrote: > > > Why should the lexer apply normalization to literals behind my back? > > The lexer shouldn't, but NFC normalizing the sourc

Re: [Python-3000] String comparison

2007-06-06 Thread Jim Jewett
On 6/6/07, Guido van Rossum <[EMAIL PROTECTED]> wrote: > On 6/6/07, Jim Jewett <[EMAIL PROTECTED]> wrote: > > On 6/6/07, Guido van Rossum <[EMAIL PROTECTED]> wrote: > > > > > > about normalization of data strings. The big issue is string literals. > > > > I think I agree with Stephen here: > > >

Re: [Python-3000] Renumbering old PEPs now targeted for Py3k?

2007-06-06 Thread Chris Monson
On 6/6/07, Collin Winter <[EMAIL PROTECTED]> wrote: On 6/6/07, Guido van Rossum <[EMAIL PROTECTED]> wrote: > A few PEPs with numbers < 400 are now targeting Python 3000, e.g. PEP > 367 (new super) and PEP 344 (exception chaining). Are there any > others? I propose that we renumber these to numbe

Re: [Python-3000] PEP: Supporting Non-ASCII Identifiers

2007-06-06 Thread Greg Ewing
Jim Jewett wrote: > Since we don't want the results of (str1 == str2) to change based on > context, I think string equality also needs to look at canonicalized > (though probably not compatibility) forms. Are you suggesting that this should be done on the fly when comparing strings? Or that all st

Re: [Python-3000] String comparison

2007-06-06 Thread Bill Janssen
> So let me explain it. I see two different sequences of code points: > 'L', '\u00F6', 'w', 'i', 's' on the one hand, and 'L', 'o', '\u0308', > 'w', 'i', 's' on the other. Never mind that Unicode has semantics that > claim they are equivalent. They are two different sequences of code > points. If

Re: [Python-3000] String comparison

2007-06-06 Thread Bill Janssen
> But > if someone didn't want normalization, and Python did it anyways, then > there would be an error that passed silently. Then they'd read it as bytes, and do the processing themselves explicitly (actually, what I do). > It's the unicode character versus code point issue. I personally prefer

Re: [Python-3000] Renumbering old PEPs now targeted for Py3k?

2007-06-06 Thread Terry Reedy
"Chris Monson" <[EMAIL PROTECTED]> wrote in message | Leaving (old PEP number) in place as a stripped down PEP that just points to | the new number: +1 Good idea. And new number = next available. Special PEP numbers should be for special PEPs. tjr _

Re: [Python-3000] String comparison

2007-06-06 Thread Bill Janssen
I wrote: > Guido wrote: > > So let me explain it. I see two different sequences of code points: > > 'L', '\u00F6', 'w', 'i', 's' on the one hand, and 'L', 'o', '\u0308', > > 'w', 'i', 's' on the other. Never mind that Unicode has semantics that > > claim they are equivalent. They are two different

Re: [Python-3000] PEP 3131 roundup

2007-06-06 Thread Greg Ewing
Steve Howell wrote: > Current Python has the precedence that color/colour > are treated as two separate identifers, But there's always a clear visual difference between "color" and "colour", and your editor is not going to turn one into the other while you're not looking (unless you've got some so

Re: [Python-3000] Unicode IDs -- why NFC? Why allow ligatures?

2007-06-06 Thread Greg Ewing
Stephen J. Turnbull wrote: > Jim Jewett writes: > > > I am slightly concerned that it might mean > > "string as string" and "string as identifier" have different tests > > for equality. > > It does mean that; see Rauli's code. Does anybody know if this > bothers LISP users, where identifiers

Re: [Python-3000] setup.py fails in the py3k-struni branch

2007-06-06 Thread Neal Norwitz
On 6/5/07, Ron Adam <[EMAIL PROTECTED]> wrote: > Alexandre Vassalotti wrote: > > On 6/5/07, Guido van Rossum <[EMAIL PROTECTED]> wrote: > >> If "make clean" makes the problem go away, it's usually because there > >> were old .pyc files with incompatible byte code. We don't change the > >> .pyc magi

Re: [Python-3000] String comparison

2007-06-06 Thread Rauli Ruohonen
On 6/7/07, Bill Janssen <[EMAIL PROTECTED]> wrote: > I meant to say that *strings* are explicitly sequences of characters, > not codepoints. This is false. When you access the contents of a string using the *sequence* protocol, what you get is code points, not characters (grapheme clusters). To ge

Re: [Python-3000] Renumbering old PEPs now targeted for Py3k?

2007-06-06 Thread Brett Cannon
On 6/6/07, Collin Winter <[EMAIL PROTECTED]> wrote: On 6/6/07, Guido van Rossum <[EMAIL PROTECTED]> wrote: > A few PEPs with numbers < 400 are now targeting Python 3000, e.g. PEP > 367 (new super) and PEP 344 (exception chaining). Are there any > others? I propose that we renumber these to numbe

Re: [Python-3000] String comparison

2007-06-06 Thread Steve Howell
--- Guido van Rossum <[EMAIL PROTECTED]> wrote: > http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf > (Conformance) > > > > C9 A process shall not assume that the > interpretations of two > > canonical-equivalent character sequences are > distinct. > > That is surely contained inside all sort