> Tim Greenwood wrote: > > In my interpretation of the C standard (which I am reading from > > http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/n843.pdf) UTF-8 is not a > > valid wchar_t encoding if your execution character set contains > > characters outside the C0 controls and Basic Latin range, and > > UTF-16 is not a valid wchar_t encoding if your execution character > > set has characters outside the BMP. In other words whatever you > > consider to be a character (which may be a combining character) > > must be encoded in one wchar_t code unit.
True. But there are well-known implementations that break that and has UTF-16 code units as wchar_t instead (something that upsets the C standardisation committee a bit). There have been **suggestions** to have utf16_t and utf32_t (for the respective code units, "char" is judged good enough for UTF-8 code units), together with character (code unit really) and string literal syntaxes put into standard C. But don't hold your breath... /kent k