Changjian Sun,
If you have code
that is currently setlocale based then there is an easy conversion to
Unicode. With xIUA http://www.xnetinc.com/xiua/ you have a
straight migration path to use the wonderful power of ICU.
You start by
replacing your current i18n functions such as setlocale() to
xiua_OpenLocale and strcoll() to xiua_strcoll etc.
xiua_OpenLocale takes POSIX style locales like
xiua_OpenLocale("fr_CA.cp1252,XDFUTF8); (French Canada with an associated
character set of windows-1252 and the data will be represented in UTF-8 so that
a xiua_ChartoNative function will convert 1252 to UTF-8) Unlike setlocale
it will be thread safe. The one logic change is that before you terminate
the thread you must close the locales then terminate the xIUA thread support to
retain resources. You can start out in code page mode so that you can test
the code as you migrate. Then you can switch to UTF-8 with no code changes
by opening the locale in UTF-8 (change XDFCODEPAGE to
XDFUTF8).
For
example:
LocJP_u2 =
xiua_OpenLocale(LocJPWin,XDFUTF16);
LocJP_list[0] = &LocJP_u2;
LocJP_u4 = xiua_OpenLocale(LocJPWin,XDFUTF32);
LocJP_list[1] = &LocJP_u4;
LocJP_u8 = xiua_OpenLocale(LocJPWin,XDFUTF8);
LocJP_list[2] = &LocJP_u8;
LocJP_win = xiua_OpenLocale(LocJPWin,XDFCPWIN);
LocJP_list[3] = &LocJP_win;
LocJP_unix = xiua_OpenLocale(LocJPUnix,XDFCPUNIX);
LocJP_list[4] = &LocJP_unix;
LocJP_cp = xiua_OpenLocale(LocJPWin,XDFCODEPAGE);
LocJP_list[0] = &LocJP_u2;
LocJP_u4 = xiua_OpenLocale(LocJPWin,XDFUTF32);
LocJP_list[1] = &LocJP_u4;
LocJP_u8 = xiua_OpenLocale(LocJPWin,XDFUTF8);
LocJP_list[2] = &LocJP_u8;
LocJP_win = xiua_OpenLocale(LocJPWin,XDFCPWIN);
LocJP_list[3] = &LocJP_win;
LocJP_unix = xiua_OpenLocale(LocJPUnix,XDFCPUNIX);
LocJP_list[4] = &LocJP_unix;
LocJP_cp = xiua_OpenLocale(LocJPWin,XDFCODEPAGE);
for
(j=0;j<4;j++)
{
xiua_SetLocaleHdl(*LocJP_list[j]);
strcpy(test_buff,"String Test ");
strcat(test_buff,Loc_DataFormat[j].dfmt);
runTestb(test_buff, &line);
}
{
xiua_SetLocaleHdl(*LocJP_list[j]);
strcpy(test_buff,"String Test ");
strcat(test_buff,Loc_DataFormat[j].dfmt);
runTestb(test_buff, &line);
}
This test uses the
same string handling routines to process UTF-32, UTF-16, UTF-8 and code page
data.
It works by
converting arguments and results for routines like xiua_strcoll. The
underlying code will invoke the ICU collation code. If you want more
flexibility you can invoke xiua_strcollEx to provide different strength, case,
and normalization values. For the full power of ICU you can invoke
any ICU API directly.
For functions like
strtok you can invoke xiua_strtok. This is implemented differently in that
there as separate UTF-32, UTF-16, UTF-8 and code page implementations.
Also unlike strtok it is thread safe. There is a xiua_strtok_r
implementation with give you this capability even on Windows platforms which do
not support it.
This code is
designed to be modified so it implements xiua_strcmp using Unicode code point
order for UTF-32, UTF-16 & UTF-8 so that they all compare equally. If
you don't like it that just change the code. However if you are using a
database which does use Unicode point order compares then you might want all
forms to compare the same.
It even supports
multiple open locales per thread so that you can have HTML/XML files in EUC-JP
and a database using UTF-8 SQL to retrieve UTF-16 data and communications to a
browser running Shift_JIS. Some calls like xiua_LocaletoLocale use two
open locales and will convert the data from the format of one locale to the
format of the other or just copy it if the formats are the
same.
This is all open
source code so you are not locked in. xIUA code is really a starter
application that is designed to be integrated into your own application code and
changed to suit your needs.
Carl
-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of [EMAIL PROTECTED]
Sent: Tuesday, September 04, 2001 10:54 AM
To: Vaintroub, Wladislav
Cc: [EMAIL PROTECTED]
Subject: RE: UTF-8 on NT
Do you think that UTF-8 is the wrong way for internationalization of cross-platform software ?
Our application supports solaris, hpux, aix and NT. To internationalize it, we are
thinking of UTF-8 (setlocale(), strcoll() for sorting, mbstowcs() for length...) so that
we don't have to add wide character (wchar_t) data type everywhere in source code.
I did some tests on unix, setlocale(), strcoll(), mbstowcs() look ok for UTF-8,
but I am stuck with NT.
Do you mean I have to use a different approach for NT internationalization ?
I'm also thinking of 3rd party UTF-8 support such as libutf8, IBM ICU.
They seem no good supports on NT, what do you think ?
Thanks.
-Changjian Sun
"Vaintroub, Wladislav" <[EMAIL PROTECTED]> 09/04/01 01:40 PM
To: "'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]>, [EMAIL PROTECTED]
cc:
Subject: RE: UTF-8 on NT
I'm afraid ,that there no way to set UTF-8 locale on Windows via setlocale. Even if you try to do this with setlocale("French_Canada.65001") it won't work correctly.
It's a pitty , because the porting of Unix programms,relying on UTF-8 locale becomes very challenging task on Windows.
Wladislav Vaintroub.
-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, September 04, 2001 6:36 PM
To: [EMAIL PROTECTED]
Subject: UTF-8 on NT
Not like in unix, we can set French UTF-8 locale by calling
setlocale(LC_ALL, "fr_CA.UTF-8"),
On NT, I don't know how to set French UTF-8 locale,
setlocale(LC_ALL, "French_Canada.1252") seems not for UTF-8
My questions:
1. Is UTF-8 supported on NT ?
2. If yes, how to use setlocale() to set it up ?
Thanks.
-Changjian Sun