Re: Wide Characters in Windows and UTF16

Markus Scherer Thu, 12 Aug 2004 10:53:04 -0700

Rick Cameron wrote:

Microsoft Windows uses little-endian byte order on all platforms. Thus, on
Windows UTF-16 code units are stored in little-endian byte order in memory.

I believe that some linux systems are big-endian and some little-endian. I
think linux follows the standard byte order of the CPU. Presumably UTF-16
would be big-endian or little-endian accordingly.

This is somewhat misleading. For internal processing, where we are talking about the UTF-16 encoding form (quite different from the external encoding _scheme_ of the same name), we don't have strings of bytes but strings of 16-bit units (WCHAR in Windows). Program code operating on such strings could not care less what endianness the CPU uses. Endianness is only an issue when the text gets byte-serialized, as is done for the external encoding schemes (and usually by a conversion service).

markus

Re: Wide Characters in Windows and UTF16

Reply via email to