Re: UCS-2 and UTF-16 [was Re: Encode, take five]

2000-09-14 Thread Mark Leisher
Philip> Yes, but if you just have a high surrogate, you can't do much with Philip> it -- it doesn't represent a Unicode character but only half of Philip> one. So you need a high surrogate plus a low surrogate to display Philip> a character beyond U+, leading to a 32-bit repre

Re: [EXPERIMENTAL] 1st draft of Encode

2000-09-14 Thread Dominic Dunlop
At 18:00 +0200 2000-09-13, Philip Newton wrote: >What's Perl's take on characters where ord($c) > 0x, anyway? It seems to Just Work, as this one-ish-liner shows: % perl -we '$s.=chr(16**$_-1) for(1..9); \ printf "%#10x\n", ord($t) while $t=substr($s,0,1,"")' 0xf 0xff 0xf

Re: [EXPERIMENTAL] 1st draft of Encode

2000-09-14 Thread Philip Newton
On 14 Sep 2000, at 12:35, Dominic Dunlop wrote: > At 18:00 +0200 2000-09-13, Philip Newton wrote: > >What's Perl's take on characters where ord($c) > 0x, anyway? > > It seems to Just Work, as this one-ish-liner shows: [snip] In that case, if we want to go switch internal encoding UTF-8, we

Re: UCS-2 and UTF-16 [was Re: Encode, take five]

2000-09-14 Thread Philip Newton
On 13 Sep 2000, at 11:57, Mark Leisher wrote: > True, UTF-16 is not known as UCS-2. However, UTF-16 still consists > of 2-byte chunks. It is essentially UCS-2 plus high and low > surrogates (see the Unicode Standard 3.0 page 19). Yes, but if you just have a high surrogate, you can't do much w