RE: UTF-16 inside UTF-8

jarkko.hietaniemi Wed, 03 Dec 2003 03:29:00 -0800

        > We're not speaking about the same thing: I was not discussing the 
        > representation of individual characters (yes it's simple to make 
        > wchar_t 32-bit with UCS4), but the encoding of large amounts of
        > strings for general text processing. That's where UTF-16 is better.


        For some values of "better", and for some values of "text processing".
        Because UTF-16 is variable width, it can be slow for certain string operations:
        basically anything that requires "random access" to the string, like "give me 
the substring
        from (code point) the position 1000 to the position 1999".  Unless you have 
some sort of
        caching, or something else clever, you'll be O(position) instead of O(1).

RE: UTF-16 inside UTF-8

Reply via email to