Re: UTF-16 inside UTF-8

Markus Scherer Thu, 06 Nov 2003 14:01:53 -0800

I would like to comment on several statements that I have seen in this thread -

- Migrating from UCS-2 to UTF-16:
  Doable, and has been done for many applications and libraries.

- Difficult to handle UTF-16?
  Use ICU - it handles all of Unicode for collation,
  regular expressions, string casing, codepage conversion,
  and many other things.

- Support for supplementary characters only for Chinese?
  Japan has defined JIS X 0213 which has characters that map to
  + supplementary characters
  as well as
  + multiple BMP characters
  (ICU 2.8 will support codepage conversion involving
   multiple characters on either side)

  CJKV ideographs, used in several languages, are driving support
  for supplementary characters.

- Case mappings can be modified to return a 32-bit Unicode
  code point instead of 16-bit BMP?
  This works, but only for "simple" case mappings.
  Full Unicode case mappings are defined on strings, and
  single-character APIs won't work at all.
  Full string mappings map 1:n and are context- and language-sensitive.

markus

http://oss.software.ibm.com/icu/

--
Opinions expressed here may not reflect my company's positions unless otherwise noted.

Re: UTF-16 inside UTF-8

Reply via email to