Twas said: >The thing that most scared me about the higher-order unicode encodings >(16, 32-bit etc) was this "Encoding" list under Windows XP notepad: >ANSI >Unicode >Unicode Big Endian >UTF-8 > >All I can say is "Eeek! Higher-order unicode encodings are >endian-specific!!!". In my mind this puts their applicabilty to on-disk >file formats at risk. UTF-8, on the other hand, can be read regardless of >architecture. While I know there are technical solutions to >endian-specific encodings, so long as you know which endian the data was >saved under, I personally find the endian-free nature of UTF-8 comforting.
You just have to look at the Unicode Consortium website to see that. I did so a couple months back, and the following paragraphs are from memory. One of the very big plusses of UTF-8, in regards to compatability and other things, is that it is a byte-ordered encoding. Whether one byte or multiple bytes are used for a character, their order is guaranteed to be always the same. By contrast, UTF-16 and UTF-32 each can have multiple versions, either big-endian or little-endian. When reading such Unicode strings, the string should begin with a 'byte order mark' or 'bom'. If the first byte of the Unicode is 'FE' (hex value), it says one way, and if its 'FF', then its the other way (I forget which). Of course, some Unicode strings may not have the byte order mark, in which case you have to pick something to assume as a default (I think the consortium says which). Regardless of which UTF encoding you have, they are all designed in such a way to avoid framing errors, so you know where the character boundaries are even if you start in the middle of a string. Now, if you want something more reliable than my memory, I have just returned to the Unicode website and found the page with the juicy details. Observe: http://www.unicode.org/faq/utf_bom.html So, does that document answer peoples' questions? -- Darren Duncan --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

