RE: UTF-8 on NT

Carl W. Brown Tue, 04 Sep 2001 16:03:37 -0700

Changjian Sun,

If you have code that is currently setlocale based then there is an easy conversion to Unicode. With xIUA http://www.xnetinc.com/xiua/ you have a straight migration path to use the wonderful power of ICU.

You start by replacing your current i18n functions such as setlocale() to xiua_OpenLocale and strcoll() to xiua_strcoll etc. xiua_OpenLocale takes POSIX style locales like xiua_OpenLocale("fr_CA.cp1252,XDFUTF8); (French Canada with an associated character set of windows-1252 and the data will be represented in UTF-8 so that a xiua_ChartoNative function will convert 1252 to UTF-8) Unlike setlocale it will be thread safe. The one logic change is that before you terminate the thread you must close the locales then terminate the xIUA thread support to retain resources. You can start out in code page mode so that you can test the code as you migrate. Then you can switch to UTF-8 with no code changes by opening the locale in UTF-8 (change XDFCODEPAGE to XDFUTF8).

For example:

LocJP_u2 = xiua_OpenLocale(LocJPWin,XDFUTF16);
LocJP_list[0] = &LocJP_u2;
LocJP_u4 = xiua_OpenLocale(LocJPWin,XDFUTF32);
LocJP_list[1] = &LocJP_u4;
LocJP_u8 = xiua_OpenLocale(LocJPWin,XDFUTF8);
LocJP_list[2] = &LocJP_u8;
LocJP_win = xiua_OpenLocale(LocJPWin,XDFCPWIN);
LocJP_list[3] = &LocJP_win;
LocJP_unix = xiua_OpenLocale(LocJPUnix,XDFCPUNIX);
LocJP_list[4] = &LocJP_unix;
LocJP_cp = xiua_OpenLocale(LocJPWin,XDFCODEPAGE);

for (j=0;j<4;j++)
{
  xiua_SetLocaleHdl(*LocJP_list[j]);
  strcpy(test_buff,"String Test ");
  strcat(test_buff,Loc_DataFormat[j].dfmt);
  runTestb(test_buff, &line);
}

This test uses the same string handling routines to process UTF-32, UTF-16, UTF-8 and code page data.

It works by converting arguments and results for routines like xiua_strcoll. The underlying code will invoke the ICU collation code. If you want more flexibility you can invoke xiua_strcollEx to provide different strength, case, and normalization values. For the full power of ICU you can invoke any ICU API directly.

For functions like strtok you can invoke xiua_strtok. This is implemented differently in that there as separate UTF-32, UTF-16, UTF-8 and code page implementations. Also unlike strtok it is thread safe. There is a xiua_strtok_r implementation with give you this capability even on Windows platforms which do not support it.

This code is designed to be modified so it implements xiua_strcmp using Unicode code point order for UTF-32, UTF-16 & UTF-8 so that they all compare equally. If you don't like it that just change the code. However if you are using a database which does use Unicode point order compares then you might want all forms to compare the same.

It even supports multiple open locales per thread so that you can have HTML/XML files in EUC-JP and a database using UTF-8 SQL to retrieve UTF-16 data and communications to a browser running Shift_JIS. Some calls like xiua_LocaletoLocale use two open locales and will convert the data from the format of one locale to the format of the other or just copy it if the formats are the same.

This is all open source code so you are not locked in. xIUA code is really a starter application that is designed to be integrated into your own application code and changed to suit your needs.

Carl

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of [EMAIL PROTECTED]
Sent: Tuesday, September 04, 2001 10:54 AM
To: Vaintroub, Wladislav
Cc: [EMAIL PROTECTED]
Subject: RE: UTF-8 on NT

Do you think that UTF-8 is the wrong way for internationalization of cross-platform software ?

Our application supports solaris, hpux, aix and NT. To internationalize it, we are
thinking of UTF-8 (setlocale(), strcoll() for sorting, mbstowcs() for length...) so that
we don't have to add wide character (wchar_t) data type everywhere in source code.
I did some tests on unix, setlocale(), strcoll(), mbstowcs() look ok for UTF-8,
but I am stuck with NT.

Do you mean I have to use a different approach for NT internationalization ?

I'm also thinking of 3rd party UTF-8 support such as libutf8, IBM ICU.
They seem no good supports on NT, what do you think ?

Thanks.
-Changjian Sun

"Vaintroub, Wladislav" <[EMAIL PROTECTED]>
09/04/01 01:40 PM

To: "'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]>, [EMAIL PROTECTED]
cc:
Subject: RE: UTF-8 on NT

I'm afraid ,that there no way to set UTF-8 locale on Windows via setlocale. Even if you try to do this with setlocale("French_Canada.65001") it won't work correctly.
It's a pitty , because the porting of Unix programms,relying on UTF-8 locale becomes very challenging task on Windows.

Wladislav Vaintroub.

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, September 04, 2001 6:36 PM
To: [EMAIL PROTECTED]
Subject: UTF-8 on NT

Not like in unix, we can set French UTF-8 locale by calling
setlocale(LC_ALL, "fr_CA.UTF-8"),
On NT, I don't know how to set French UTF-8 locale,
setlocale(LC_ALL, "French_Canada.1252") seems not for UTF-8

My questions:
1. Is UTF-8 supported on NT ?
2. If yes, how to use setlocale() to set it up ?
Thanks.

-Changjian Sun

RE: UTF-8 on NT

Reply via email to