On 6/6/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote:
> No. The point is that people want to use their current tools; they
> may not be able to easily specify normalization.
> Please look through the list (I've already done so; I'm speaking from
> detailed examination of the data) and state w
Stephen J. Turnbull writes:
> > http://www.unicode.org/versions/corrigendum3.html suggests that many
> > of the Hangul are either pronunciation guide variants or even exact
> > duplicates (that were presumably missed when the canonicalization was
> > frozen?)
>
> I'll have to ask some Koreans
Rauli Ruohonen writes:
> There are some cases where users might in the future want to make
> a distinction between "compatibility" characters, such as these:
> http://en.wikipedia.org/wiki/Mathematical_alphanumeric_symbols
I don't think they belong in identifiers in a general purpose
programmi
(Martin's right, it's not good to discuss this in the huge PEP 3131
thread, so I'm changing the subject line)
On 6/6/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote:
> In the language of these standards, I would expect that string
> comparison is exactly the kind of higher-level process they hav
Rauli Ruohonen writes:
> Strings are internal to Python. This is a whole separate issue from
> normalization of source code or its parts (such as identifiers).
Agreed. But please note that we're not talking about representation.
We're talking about the result of evaluating a comparison:
i
"Stephen J. Turnbull" <[EMAIL PROTECTED]> wrote:
> Rauli Ruohonen writes:
>
> > Strings are internal to Python. This is a whole separate issue from
> > normalization of source code or its parts (such as identifiers).
>
> Agreed. But please note that we're not talking about representation.
> W
On 6/6/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote:
> Rauli Ruohonen writes:
>
> > Strings are internal to Python. This is a whole separate issue from
> > normalization of source code or its parts (such as identifiers).
>
> Agreed. But please note that we're not talking about representatio
> Hear me out for a moment. People type what they want.
I do a lot of Pythonic processing of UTF-8, which is not "typed by
people", but instead extracted from documents by automated processing.
Text is also data -- an important thing to keep in mind.
As far as normalization goes, I agree with yo
On 6/6/07, Bill Janssen <[EMAIL PROTECTED]> wrote:
> > Hear me out for a moment. People type what they want.
>
> I do a lot of Pythonic processing of UTF-8, which is not "typed by
> people", but instead extracted from documents by automated processing.
> Text is also data -- an important thing to
On 6/6/07, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> Why should the lexer apply normalization to literals behind my back?
The lexer shouldn't, but NFC normalizing the source before the lexer
sees it would be slightly more robust and standards-compliant. This is
because technically an editor or
> FWIW, I don't buy that normalization is expensive, as most strings are
> in NFC form anyway, and there are fast checks for that (see UAX#15,
> "Detecting Normalization Forms"). Python does not currently have
> a fast path for this, but if it's added, then normalizing everything
> to NFC should be
Guido van Rossum schrieb:
> Clearly we will have a normalization routine so the
> lexer can normalize identifiers, so if you need normalized data it is
> as simple as writing 'XXX'.normalize() (or whatever the spelling
> should be).
It's actually in Python already, and spelled as
unicodedata.norma
Guido van Rossum writes:
> But I'm not about to change the == operator to apply normalization
> first. It would affect too much (e.g. hashing).
Yah, that's one reason why Jim Jewett and I lean to normalizing on the
way in for explicitly Unicode data. But since that's not going to
happen, I gue
> > But I'm not about to change the == operator to apply normalization
> > first. It would affect too much (e.g. hashing).
>
> Yah, that's one reason why Jim Jewett and I lean to normalizing on the
> way in for explicitly Unicode data. But since that's not going to
> happen, I guess the thing i
On 6/6/07, Rauli Ruohonen <[EMAIL PROTECTED]> wrote:
> On 6/6/07, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> > Why should the lexer apply normalization to literals behind my back?
>
> The lexer shouldn't, but NFC normalizing the source before the lexer
> sees it would be slightly more robust and
Bill Janssen <[EMAIL PROTECTED]> wrote:
>
> > Hear me out for a moment. People type what they want.
>
> I do a lot of Pythonic processing of UTF-8, which is not "typed by
> people", but instead extracted from documents by automated processing.
> Text is also data -- an important thing to keep i
On 5/31/07, Phillip J. Eby <[EMAIL PROTECTED]> wrote:
> At 07:48 PM 5/31/2007 +0800, Guido van Rossum wrote:
> >I've updated the patch; the latest version now contains the grammar
> >and compiler changes needed to make super a keyword and to
> >automatically add a required parameter 'super' when su
On 6/5/07, Steve Howell <[EMAIL PROTECTED]> wrote:
>
> --- Jim Jewett <[EMAIL PROTECTED]> wrote:
> > Ideally, either that equivalence would also include
> > compatibility, or
> > else characters whose compatibility and canonical
> > equivalents are
> > different would be banned for use in identifi
BJörn Lindqvist a écrit :
>> Those most eager for unicode identifiers are afraid that people
>> (particularly beginning students) won't be able to use local-script
>> identifiers, unless it is the default. My feeling is that the teacher
>> (or the person who pointed them to python) can change the
On 6/6/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> > I think "obvious" referred to the reasoning, not the outcome.
> > I can tell that the decision was "NFC, anything goes", but I don't see why.
> I think I'm repeating myself: Because UAX 31 says so. That's it. There
> is a standard that e
A few PEPs with numbers < 400 are now targeting Python 3000, e.g. PEP
367 (new super) and PEP 344 (exception chaining). Are there any
others? I propose that we renumber these to numbers in the 3100+
range. I can see two forms of renaming:
(a) 344 -> 3344 and 367 -> 3367, i.e. add 3000 to the numbe
On 6/6/07, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> A few PEPs with numbers < 400 are now targeting Python 3000, e.g. PEP
> 367 (new super) and PEP 344 (exception chaining). Are there any
> others? I propose that we renumber these to numbers in the 3100+
> range. I can see two forms of renamin
On 6/5/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote:
> A scan of the full table for Unicode Version 2.0 ... 'n (Afrikaans), and
I asked a friend who speaks Afrikaans; apparently it is more a word
than a letter.
"""
ʼn is derived from the Dutch word en which means "a" in English. The `
is in
On 6/6/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote:
> Jim Jewett writes:
> > Depends on what you mean by technical symbols. ... The math
> > versions (generally 1D400 - 1DC7B) are included. But
> > http://unicode.org/reports/tr39/data/xidmodifications.txt suggests
> > excluding them ag
On 6/6/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote:
> Rauli Ruohonen writes:
> > FWIW, I don't buy that normalization is expensive, as most strings are
> > in NFC form anyway, and there are fast checks for that (see UAX#15,
> > "Detecting Normalization Forms"). Python does not currently h
On 6/6/07, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> > about normalization of data strings. The big issue is string literals.
> > I think I agree with Stephen here:
> > u"L\u00F6wis" == u"Lo\u0308wis"
> > should be True (assuming he typed it correctly in the first place :-),
> > because
On 6/6/07, Jim Jewett <[EMAIL PROTECTED]> wrote:
> On 6/6/07, Guido van Rossum <[EMAIL PROTECTED]> wrote:
>
> > > about normalization of data strings. The big issue is string literals.
> > > I think I agree with Stephen here:
>
> > > u"L\u00F6wis" == u"Lo\u0308wis"
>
> > > should be True (assu
On 6/6/07, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> On 6/6/07, Rauli Ruohonen <[EMAIL PROTECTED]> wrote:
> > On 6/6/07, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> > > Why should the lexer apply normalization to literals behind my back?
> > The lexer shouldn't, but NFC normalizing the sourc
On 6/6/07, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> On 6/6/07, Jim Jewett <[EMAIL PROTECTED]> wrote:
> > On 6/6/07, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> >
> > > > about normalization of data strings. The big issue is string literals.
> > > > I think I agree with Stephen here:
> > >
On 6/6/07, Collin Winter <[EMAIL PROTECTED]> wrote:
On 6/6/07, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> A few PEPs with numbers < 400 are now targeting Python 3000, e.g. PEP
> 367 (new super) and PEP 344 (exception chaining). Are there any
> others? I propose that we renumber these to numbe
Jim Jewett wrote:
> Since we don't want the results of (str1 == str2) to change based on
> context, I think string equality also needs to look at canonicalized
> (though probably not compatibility) forms.
Are you suggesting that this should be done on the fly
when comparing strings? Or that all st
> So let me explain it. I see two different sequences of code points:
> 'L', '\u00F6', 'w', 'i', 's' on the one hand, and 'L', 'o', '\u0308',
> 'w', 'i', 's' on the other. Never mind that Unicode has semantics that
> claim they are equivalent. They are two different sequences of code
> points.
If
> But
> if someone didn't want normalization, and Python did it anyways, then
> there would be an error that passed silently.
Then they'd read it as bytes, and do the processing themselves
explicitly (actually, what I do).
> It's the unicode character versus code point issue. I personally prefer
"Chris Monson" <[EMAIL PROTECTED]> wrote in message | Leaving (old PEP
number) in place as a stripped down PEP that just points to
| the new number: +1
Good idea. And new number = next available. Special PEP numbers should be
for special PEPs.
tjr
_
I wrote:
> Guido wrote:
> > So let me explain it. I see two different sequences of code points:
> > 'L', '\u00F6', 'w', 'i', 's' on the one hand, and 'L', 'o', '\u0308',
> > 'w', 'i', 's' on the other. Never mind that Unicode has semantics that
> > claim they are equivalent. They are two different
Steve Howell wrote:
> Current Python has the precedence that color/colour
> are treated as two separate identifers,
But there's always a clear visual difference between
"color" and "colour", and your editor is not going
to turn one into the other while you're not looking
(unless you've got some so
Stephen J. Turnbull wrote:
> Jim Jewett writes:
>
> > I am slightly concerned that it might mean
> > "string as string" and "string as identifier" have different tests
> > for equality.
>
> It does mean that; see Rauli's code. Does anybody know if this
> bothers LISP users, where identifiers
On 6/5/07, Ron Adam <[EMAIL PROTECTED]> wrote:
> Alexandre Vassalotti wrote:
> > On 6/5/07, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> >> If "make clean" makes the problem go away, it's usually because there
> >> were old .pyc files with incompatible byte code. We don't change the
> >> .pyc magi
On 6/7/07, Bill Janssen <[EMAIL PROTECTED]> wrote:
> I meant to say that *strings* are explicitly sequences of characters,
> not codepoints.
This is false. When you access the contents of a string using the
*sequence* protocol, what you get is code points, not characters
(grapheme clusters). To ge
On 6/6/07, Collin Winter <[EMAIL PROTECTED]> wrote:
On 6/6/07, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> A few PEPs with numbers < 400 are now targeting Python 3000, e.g. PEP
> 367 (new super) and PEP 344 (exception chaining). Are there any
> others? I propose that we renumber these to numbe
--- Guido van Rossum <[EMAIL PROTECTED]> wrote:
>
http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf
> (Conformance)
> >
> > C9 A process shall not assume that the
> interpretations of two
> > canonical-equivalent character sequences are
> distinct.
>
> That is surely contained inside all sort
41 matches
Mail list logo