from:"Rauli Ruohonen"

Re: [Python-3000] String comparison

2007-06-14 Thread Rauli Ruohonen

On 6/13/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote: > except that people will sneak in some UTF-16 behavior where it seems useful. How about sneaking these in py3k-struni: - chr(i) returns a len-1 or len-2 string for all i in range(0, 0x11) and ord(chr(i)) == i for all i in range(0,

Re: [Python-3000] String comparison

2007-06-14 Thread Rauli Ruohonen

On 6/14/07, Guido van Rossum <[EMAIL PROTECTED]> wrote: > On 6/13/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > > A code point is something that has a 1:1 relationship with a logical > > character (in particular, a Unicode character). As the word "character" is ambiguous, I'd put it this way:

Re: [Python-3000] String comparison

2007-06-13 Thread Rauli Ruohonen

On 6/13/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote: > What you are saying is that if you write a 10-line script that claims > Unicode conformance, you are responsible for the Unicode-correctness of > all modules you call implicitly as well as that of the Python interpreter. If text files ar

Re: [Python-3000] String comparison

2007-06-12 Thread Rauli Ruohonen

On 6/12/07, Rauli Ruohonen <[EMAIL PROTECTED]> wrote: > Another example would be unichr(), which gives you TypeError if you > pass it a surrogate pair (oddly enough, as strings of different length > are of the same type). Sorry, I meant ord(), not unichr. Anyway, ord(unichr(i)) ==

Re: [Python-3000] String comparison

2007-06-12 Thread Rauli Ruohonen

On 6/12/07, Jim Jewett <[EMAIL PROTECTED]> wrote: > On 6/12/07, Rauli Ruohonen <[EMAIL PROTECTED]> wrote: > > Practically speaking, there's little need to interpret surrogate pairs > > as two code points instead of as one non-BMP code point. > > Depe

Re: [Python-3000] String comparison

2007-06-12 Thread Rauli Ruohonen

On 6/10/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote: > I think you misunderstand. Anything in Unicode that is normative is > about interchange. Strings are also a means of interchange---between > modules (separate Unicode processes) in a program (single OS process). Like Martin said, "what

Re: [Python-3000] Support for PEP 3131

2007-06-11 Thread Rauli Ruohonen

On 6/12/07, Baptiste Carvello <[EMAIL PROTECTED]> wrote: > This is where we strongly disagree. If an identifier is written in > transliterated chinese, I cannot understand what it means, but I can > recognise　it when it is used in the code. I will then find out the > meaning from the context. By co

Re: [Python-3000] Support for PEP 3131

2007-06-11 Thread Rauli Ruohonen

On 6/11/07, Jim Jewett <[EMAIL PROTECTED]> wrote: > "In fact, it might even use something downright misleading, and > you won't have any warning, because we thought that maybe someone, > somewhere, might have wanted that character in a different context." > > And no, I don't think I'm exagerati

Re: [Python-3000] Support for PEP 3131

2007-06-10 Thread Rauli Ruohonen

On 6/10/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > > To truly enable Python in a non-English teaching > > environment, I think you'd actually want to go a step > > further and just internationalize the whole program. > > I don't know why that theory keeps popping up when people > have repea

Re: [Python-3000] String comparison

2007-06-09 Thread Rauli Ruohonen

On 6/9/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote: > Rauli Ruohonen writes: > > The ones it absolutely prohibits in interchange are surrogates. > > Excuse me? Surrogates are code points with a specific interpretation > if it is "purported that the stream is in

Re: [Python-3000] String comparison

2007-06-08 Thread Rauli Ruohonen

On 6/8/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote: > AFAIK, the only strings the Unicode standard absolutely prohibits > emitting are those containing code points guaranteed not to be > characters by the standard. The ones it absolutely prohibits in interchange are surrogates. They are also

Re: [Python-3000] String comparison

2007-06-08 Thread Rauli Ruohonen

On 6/8/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > In principle, yes. What's the cost of the additional field in terms of > a size increase? If you just need another bit, could that fit into > _PyUnicode_TypeRecord.flags instead? The additional field is 8 bits, two bits for each normalizati

Re: [Python-3000] String comparison

2007-06-07 Thread Rauli Ruohonen

On 6/6/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > > FWIW, I don't buy that normalization is expensive, as most strings are > > in NFC form anyway, and there are fast checks for that (see UAX#15, > > "Detecting Normalization Forms"). Python does not currently have > > a fast path for this, b

Re: [Python-3000] String comparison

2007-06-07 Thread Rauli Ruohonen

On 6/8/07, Jim Jewett <[EMAIL PROTECTED]> wrote: > How would you expect them to work on arrays of code points? Just like they do with Python 2.5 unicode objects, as long as the "array of code points" is str, not e.g. a numpy array or tuple of ints, which I don't expect to grow string methods :-)

Re: [Python-3000] String comparison

2007-06-07 Thread Rauli Ruohonen

On 6/7/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote: > I apologize for mistyping the example. *I* *was* talking about a > string literal containing Unicode characters. Then I misunderstood you too. To avoid such problems, I will use XML character references to denote code points here. Wherev

Re: [Python-3000] String comparison

2007-06-06 Thread Rauli Ruohonen

On 6/7/07, Bill Janssen <[EMAIL PROTECTED]> wrote: > I meant to say that *strings* are explicitly sequences of characters, > not codepoints. This is false. When you access the contents of a string using the *sequence* protocol, what you get is code points, not characters (grapheme clusters). To ge

Re: [Python-3000] String comparison

2007-06-06 Thread Rauli Ruohonen

On 6/6/07, Guido van Rossum <[EMAIL PROTECTED]> wrote: > Why should the lexer apply normalization to literals behind my back? The lexer shouldn't, but NFC normalizing the source before the lexer sees it would be slightly more robust and standards-compliant. This is because technically an editor or

[Python-3000] String comparison

2007-06-06 Thread Rauli Ruohonen

(Martin's right, it's not good to discuss this in the huge PEP 3131 thread, so I'm changing the subject line) On 6/6/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote: > In the language of these standards, I would expect that string > comparison is exactly the kind of higher-level process they hav

Re: [Python-3000] Unicode IDs -- why NFC? Why allow ligatures?

2007-06-06 Thread Rauli Ruohonen

On 6/6/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote: > No. The point is that people want to use their current tools; they > may not be able to easily specify normalization. > Please look through the list (I've already done so; I'm speaking from > detailed examination of the data) and state w

Re: [Python-3000] Unicode IDs -- why NFC? Why allow ligatures?

2007-06-05 Thread Rauli Ruohonen

On 6/5/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote: > I'd love to get rid of full-width ASCII and halfwidth kana (via > compatibility decomposition). If you do forbid compatibility characters in identifiers, then they should be flagged as an error, not converted silently. NFC, on the other h

Re: [Python-3000] PEP 3131 roundup

2007-06-04 Thread Rauli Ruohonen

On 6/5/07, Talin <[EMAIL PROTECTED]> wrote: > Thanks so much for this excellent roundup from the RoundUp Master :) > Seriously, I've been staying well away from the PEP 3131 threads, and I > was hoping that someone would post a summary of the issues so I could > catch up. I agree that the roundup

Re: [Python-3000] PEP: Supporting Non-ASCII Identifiers

2007-06-04 Thread Rauli Ruohonen

On 6/4/07, Jim Jewett <[EMAIL PROTECTED]> wrote: > On 6/4/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > > However, what would that mean wrt. non-Unicode source encodings. > > > Say you have a Latin-1-encoded source code. Is that in NFC or not? The path of least surprise for legacy encodings m

Re: [Python-3000] PEP: Supporting Non-ASCII Identifiers

2007-06-03 Thread Rauli Ruohonen

On 6/4/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote: > No, it can't. One might want to write Python code that implements > normalization algorithms, for example, and there will be "binary > strings". Only in the context of Unicode text are you allowed to > do those things. But Python files

Re: [Python-3000] PEP: Supporting Non-ASCII Identifiers

2007-06-03 Thread Rauli Ruohonen

On 6/3/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > Sure - but how can Python tell whether a non-normalized string was > intentionally put into the source, or as a side effect of the editor > modifying it? It can't, but does it really need to? It could always assume the latter. > In most ca

Re: [Python-3000] Support for PEP 3131

2007-06-03 Thread Rauli Ruohonen

On 6/3/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote: > Merely to define those is non-trivial, and it is absolutely out > of the question to expect that the average Python user will know > what the character set "strictly-conforms-to-UTR39-restrictions- > allows-confusables" is. This is a bit

Re: [Python-3000] PEP: Supporting Non-ASCII Identifiers

2007-06-03 Thread Rauli Ruohonen

(sorry about replying to so old mail, but I didn't find a better place to put this) On 5/1/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > All identifiers are converted into the normal form NFC while parsing; Actually, shouldn't the whole file be converted to NFC, instead of only identifiers?

Re: [Python-3000] Support for PEP 3131

2007-06-03 Thread Rauli Ruohonen

On 6/3/07, Jim Jewett <[EMAIL PROTECTED]> wrote: > On 6/2/07, Rauli Ruohonen <[EMAIL PROTECTED]> wrote: > > # identifier_charset: 0-7f > > Why not ASCII? > Why not be more specific, with 0x30-0x39, 0x41-0x5a, 0x5f, 0x61-0x7a > > When adding characters, this isn

Re: [Python-3000] Support for PEP 3131

2007-06-02 Thread Rauli Ruohonen

On 6/2/07, Josiah Carlson <[EMAIL PROTECTED]> wrote: > Whether or not there exists a tool to convert from Python 2.6 to > Python 3.0 (2to3), every tool that currently handles Python source > code encodings via the method specified in the documentation > (just about every Python-centric editor I kno

Re: [Python-3000] Support for PEP 3131

2007-06-02 Thread Rauli Ruohonen

On 6/2/07, Josiah Carlson <[EMAIL PROTECTED]> wrote: > """ > If a comment in the first or second line of the Python script matches > the regular expression coding[=:]\s*([-\w.]+), this comment is processed > as an encoding declaration; the first group of this expression names the > encoding of the

Re: [Python-3000] Support for PEP 3131

2007-06-01 Thread Rauli Ruohonen

On 5/27/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote: >James Y Knight writes: >> a 'pyidchar.txt' file with a list of character ranges, and now that >> pyidchar.txt file is going to have separate sections based on module >> name? Sorry, but are you [EMAIL PROTECTED] kidding me?!? > >The scalab

Re: [Python-3000] String comparison

Re: [Python-3000] String comparison

Re: [Python-3000] String comparison

Re: [Python-3000] String comparison

Re: [Python-3000] String comparison

Re: [Python-3000] String comparison

Re: [Python-3000] Support for PEP 3131

Re: [Python-3000] Support for PEP 3131

Re: [Python-3000] Support for PEP 3131

Re: [Python-3000] String comparison

Re: [Python-3000] String comparison

Re: [Python-3000] String comparison

Re: [Python-3000] String comparison

Re: [Python-3000] String comparison

Re: [Python-3000] String comparison

Re: [Python-3000] String comparison

Re: [Python-3000] String comparison

[Python-3000] String comparison

Re: [Python-3000] Unicode IDs -- why NFC? Why allow ligatures?

Re: [Python-3000] Unicode IDs -- why NFC? Why allow ligatures?

Re: [Python-3000] PEP 3131 roundup

Re: [Python-3000] PEP: Supporting Non-ASCII Identifiers

Re: [Python-3000] PEP: Supporting Non-ASCII Identifiers

Re: [Python-3000] PEP: Supporting Non-ASCII Identifiers

Re: [Python-3000] Support for PEP 3131

Re: [Python-3000] PEP: Supporting Non-ASCII Identifiers

Re: [Python-3000] Support for PEP 3131

Re: [Python-3000] Support for PEP 3131

Re: [Python-3000] Support for PEP 3131

Re: [Python-3000] Support for PEP 3131

30 matches

Site Navigation

Mail list logo

Footer information