Re: [sqlite] Some clarification needed about Unicode

Igor Tandetnik Thu, 29 Oct 2009 14:08:20 -0700

John Crenshaw <johncrens...@priacta.com>
wrote: 
> 2. MultiByteToWideChar supports a "MB_COMPOSITE" flag, which appears
> to 
> give UTF-16 output.


MB_COMPOSITE has nothing to do with surrogate pairs, and everything to do with 
whether, say, Latin-1 character Á (A with accute) is converted to a single 
character U+00C1, or two characters U+0041 U+0301 (capital A + combining accute 
accent). The latter is "composite", the former is "precomposed".

Do you believe _that's_ what differentiates UTF-16 and UCS-2? If so, you are 
mistaken. The difference between the two is in how Unicode characters U+10000 
and up are represented (as surrogate pairs in one case, unsupported in the 
other). U+0041 U+0301 is a valid UCS-2 sequence and a valid UTF-16 sequence.

> Microsoft never seems to clearly identify whether the wide APIs should
> be given UTF-16 or UCS-2.

You mean, which Unicode normalization form they expect ( see 
http://en.wikipedia.org/wiki/Unicode_equivalence ), which, again, has 
absolutely nothing to do with UTF-16 vs UCS-2. The answer is, Win32 API can 
handle any normalization form as well as denormalized strings. FoldString API 
is provided to normalize strings to various normalization forms if desired.

Igor Tandetnik

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Some clarification needed about Unicode

Reply via email to