Re: Nicest UTF - string case mapping vs. UTF-8/32

Markus Scherer Fri, 03 Dec 2004 12:26:54 -0800

I feel the need to correct one misperception:

Lars Kristan wrote:

4.1 - UTF-32 is probably very useful for certain string operations. Changing case for example. You can do it in-place, like you could with ASCII. Perhaps it can even be done in UTF-8, I am not sure. But even if it is possible today, it is definitely not guaranteed that it will always remain so, so one shouldn't rely on it.

Wrong even for UTF-32. Sharp s (U+00DF) uppercases to two characters, "SS". Other examples of case mapping expansion and contraction are in SpecialCasing.txt (one of the UCD files).

For UTF-8, there are also _simple_ (1:1) case mappings that change the length (e.g., long s [017F] to S) while sharp s to SS happens to not change the UTF-8 string length...

markus

PS: I wrote UTN #12 :-)

--
Opinions expressed here may not reflect my company's positions unless otherwise 
noted.

Re: Nicest UTF - string case mapping vs. UTF-8/32

Reply via email to