On Monday 2003.10.20 13:31:49 -0700, Shao, Yiying wrote: > Thanks for your info. > > >>Just wondering if anybody knowss how unicode is on Linux? > >> > >Very good support. Default charset for recent versions of some popular > distributions. > > What are those popular distributions and which version? > > > >>On Red Hat Linux, if UTF-8 is not made as the default encoding for > >>Chnese/Japanese/Korean, what it is using for those double byte languages? > > >The old multi-byte character sets. > > So, how should I implement my code? Do I have to say if this is Japanese (for > example), convert the unicode (UTF-8) to multi-byte character? That seems very > painful. > No. Forget about old multi-byte encodings. Just set your locale to a UTF-8 locale and use UTF-8 for all languages. In my experience (on SuSE 7.3, 8.1, 8.2, and the 9.0 betas) all of the "important" applications handle CJK languages perfectly well under a UTF-8 locale. The "important" applications for me are things like Open Office 1.1, Konsole, vim, MySQL, and Mozilla. For CJK input, use SCIM (http://ns.turbolinux.com.cn/~suzhe/scim/index.html). For many other details about Unicode on Linux, see my page at http://eyegene.ophthy.med.umich.edu/unicode/index.html.
> >>Does later Red Had Linux makes the UTF-8 the default encoding for them? > > AFAIK only if you manually set it to a UTF-8 locale, e.g. > LANG=zh-CN.UTF-8. Notice, though, that some older software will not be > aware of this change, so many characters will not be displayed properly. > > So, is this setting available from Red Hat 8.0 or later? Also, you mean some old > version of Linux may not aware of this setting? > > > Besides, do you happen to know ICU from IBM? Does it take care of the unicode > problems with double byte language for Linux? Most likely. But I think your life will be easier if you just use UTF-8 for all languages and forget about legacy encodings. I'm sure ICU must have very robust UTF-8 support. > > Thanks, > Yiying