Adam,

It is probably best to speak of UTF-32 which has replaced UCS-4 just like
UTF-16 has replaced UCS-2.  The only Unicode encoding that uses surrogates
is UTF-16.  UTF-32 uses scalar values not surrogates.  The surrogate code
points are not valid UTF-32 code points.  There are no UTF-16 or UTF-8
characters that will convert to this range of UTF-32 values just like the
values above 0x0010FFFF are not valid Unicode code points.  Likewise UTF-8
values from EDA080 (U+D800) to EDBFBF (U+DFFF) and above F480BFBF (U+10FFFF)
are not valid code points.

Carl


> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED]]On Behalf Of Adam Twardoch
> Sent: Saturday, August 25, 2001 11:34 PM
> To: Marcin 'Qrczak' Kowalczyk; [EMAIL PROTECTED]
> Subject: Re: Nonsense in
> http://www.unicode.org/Public/PROGRAMS/CVTUTF/CVTUTF.C?
>
>
> ----- Original Message -----
> From: "Marcin 'Qrczak' Kowalczyk" <[EMAIL PROTECTED]>
> > I don't understand. I'm talking about characters above U+FFFF, not
> > about characters from the range U+D800..DFFF. They are represented
> > as themselves in UCS-4. But the said routine represents them as pairs
> > of surrogates.
>
> So my question for clarification:
>
> Does UCS-4 use scalar values or surrogate pairs to represent codes form
> outside of BMP?
>
> Adam
>
>


Reply via email to