Re: IBM AIX 5 and GB18030

Markus Scherer Thu, 14 Nov 2002 10:13:01 -0800

Carl W. Brown wrote:

Some Unix systems adapted faster because the later Unicode adopters used 32
bit Unicode characters making the job 100 times easier.  Other companies
like Microsoft took a very big gamble and implemented the code for surrogate
support into Windows 2000 based on early drafts of the Unicode standard. If
they had not done it this way or had guessed wrong they might not even have
support in Windows XP.

Hi Carl, I am not going to argue with you on what you say about ICU :-) but I am not sure about your Unix comments.

First, AIX 5 uses 32-bit wchar_t, which is UTF-32 except for the zh_TW locale, as far as I know. (AIX 5 zh_TW uses a different wchar_t encoding.)

Again as far as I know, Unix/Linux systems chose to use 32-bit wchar_t not because of great strategic plans or compelling performance analysis, but because the existing C stdlib functions for wchar_t string handling assume that the single-code-point type is the same as the string base unit. This one design point requires 32-bit wchar_t not just for Unicode but also for the character sets of EUC-TW and GB18030.

You seem to suggest that there is a problem with 16-bit Unicode. It does take some effort to adapt UCS-2-designed functions for UTF-16, but it's not "rocket science" and works very well thanks to the Unicode allocation practice (common characters in the BMP). Making UTF-8/32 functions work with supplementary code points when they had assumed BMP-only operation probably took some work too.

In fact, on Unix/Linux systems you find not only UTF-32 via wchar_t, but also UTF-8 (low-level tools and gnome) and UTF-16 (ICU, KDE/Qt, and many applications like Mozilla and OpenOffice).

Best regards,
markus

--
Opinions expressed here may not reflect my company's positions unless otherwise noted.

Re: IBM AIX 5 and GB18030

Reply via email to