Lars Kristan <[EMAIL PROTECTED]> writes: > Quite close. Except for the fact that: > * U+EE93 is represented in UTF-32 as 0x0000EE93 > * U+EE93 is represented in UTF-16 as 0xEE93 > * U+EE93 is represented in UTF-8 as 0x93 (_NOT_ 0xEE 0xBA 0x93)
Then it would be impossible to represent sequences like U+EEEE U+EEBA U+EE93 in UTF-8, and conversion UTF-32 -> UTF-8 -> UTF-32 would not round-trip. Concatenation of UTF-8-encoded strings would not be equivalent to UTF-8-encoding of the concatenation of code points. This is broken. -- __("< Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/