You should use Unicode internally - UTF-16 when you use ICU or most other libraries and software.

Externally, that is for protocols and files and other data exchange, you need to identify (input: determine; output: label) the encoding of the data and convert between it and Unicode. If you can choose the output encoding, then stay with one of the Unicode charsets (UTF-8 or SCSU etc.), or else - if you are absolutely certain that they suffice - use US-ASCII or ISO 8859-1.

The system default encoding or the current process codepage may or may not be a good guess for the encoding in your input/output. Include a user override of the charset in your design.

markus

Shao, Yiying wrote:
*Using ICU, which uses UTF-16, to handle all strings for cross platform localization.

*since UTF-8 is the default locale for Red Hat Linux, so I need to convert the strings from UTF-16 to UTF-8. But UTF-8 is not the default locale for CJK. So, on CJK, I need to set UTF-8 as the default locale, the converted UFT-8 can still work with CJK.

*Or there may be other better ways to do this? If it is possible to find out the current default locale encoding (such as UTF-16, UTF-8, multi-byte and etc.) at run time for an App, then according the current locale, do the correct conversions? ICU provides rich conversion utilities. This way, I can guaranty that my App will work properly and will not screw up other Apps on the same system.



Reply via email to