Jim Jewett writes:
> > Apart from the surrogates, are there code points that aren't
> > characters?
> Yes. The BOM mark, for one.
Nitpick: The BOM *is* a character (FEFF, aka ZERO-WIDTH NO-BREAK
SPACE). Its byte-swapped counterpart FFFE is guaranteed *not* to be a
character. (Martin wrote
On 6/14/07, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> On 6/13/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> > A code point is something that has a 1:1 relationship with a logical
> > character (in particular, a Unicode character).
As the word "character" is ambiguous, I'd put it this way:
On 6/13/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote:
> except that people will sneak in some UTF-16 behavior where it seems useful.
How about sneaking these in py3k-struni:
- chr(i) returns a len-1 or len-2 string for all i in range(0, 0x11) and
ord(chr(i)) == i for all i in range(0,
On 6/14/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> > There are also some that are explicitly not characters.
> > (U+FD00..U+FDEF)
> ??? U+FD00 is ARABIC LIGATURE HAH WITH YEH ISOLATED FORM,
> U+FDEF is unassigned.
Sorry; typo on my part. The start of the range is u+fdD0, not 00.
I suspe
On 6/14/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote:
> > There are also plenty of things that a native speaker may view as a
> > single character, but which unicode treats as (at most) a Named
> > Sequence.
> Eg, the New Line Function (Unicode's name for "universal newline"),
> which can
Jim Jewett writes:
> I suspect there may be others that are guaranteed never to get an
> assignment, because of their location. (Example: The character would
> have to have certain properties or be part of a specific script, but
> adding more such characters would violate some other stabilit
On 6/13/07, Ron Adam <[EMAIL PROTECTED]> wrote:
> Well I can see where a str8() type with an __incoded_with__ attribute could
> be useful. It would use a bit more memory, but it won't be the
> default/primary string type anymore so maybe it's ok.
>
> Then bytes can be bytes, and unicode can be uni
Guido van Rossum wrote:
> On 6/13/07, Ron Adam <[EMAIL PROTECTED]> wrote:
>> Well I can see where a str8() type with an __incoded_with__ attribute
>> could
>> be useful. It would use a bit more memory, but it won't be the
>> default/primary string type anymore so maybe it's ok.
>>
>> Then bytes
>>> Then bytes can be bytes, and unicode can be unicode, and str8 can be
>>> encoded strings for interfacing with the outside non-unicode world. Or
>>> something like that.
>>
>> Hm... Requiring each str8 instance to have an encoding might be a
>> problem -- it means you can't just create one fro
> - chr(i) returns a len-1 or len-2 string for all i in range(0, 0x11) and
> ord(chr(i)) == i for all i in range(0, 0x11)
This would contradict an explicit decision in PEP 261. I'm don't quite
remember the rationale for that, however, the PEP mentions that ord()
should be symmetric with
10 matches
Mail list logo