I feel the need to correct one misperception:

Lars Kristan wrote:
4.1 - UTF-32 is probably very useful for certain string operations. Changing case for example. You can do it in-place, like you could with ASCII. Perhaps it can even be done in UTF-8, I am not sure. But even if it is possible today, it is definitely not guaranteed that it will always remain so, so one shouldn't rely on it.

Wrong even for UTF-32. Sharp s (U+00DF) uppercases to two characters, "SS". Other examples of case mapping expansion and contraction are in SpecialCasing.txt (one of the UCD files).


For UTF-8, there are also _simple_ (1:1) case mappings that change the length (e.g., long s [017F] to S) while sharp s to SS happens to not change the UTF-8 string length...

markus

PS: I wrote UTN #12 :-)

--
Opinions expressed here may not reflect my company's positions unless otherwise 
noted.



Reply via email to