RE: UTF8 vs. Unicode (UTF16) in code

Peter_Constable Fri, 09 Mar 2001 11:54:38 -0800


On 03/09/2001 12:53:57 PM "Ayers, Mike" wrote:

>    Um... no.  The UTF-32 CES can handle much more than the current
>space of the Unicode CCS.  As far as I can tell, it's good to go until we
>need more than 32 bits to represent the ACR.  I'm actually surprised that
>this comment was so misunderstood.  Ah, well...

Strictly speaking, I'm afraid you're wrong. The UTF-32 encoding form is
defined in UTR#19 which clearly states

<quote>
     UTF-32 is restricted in values to the range 0..10FFFF(subscript: 16)
</quote>

Unsigned 32-bit integers can directly represent 4G characters; UTF-32 can
accommodate much much less.



- Peter


---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>

RE: UTF8 vs. Unicode (UTF16) in code

Reply via email to