RE: [OT?] The C standard library and UTF's (was RE: Text Editors and Canonical Equivalence (was Coloured diacritics))

Kent Karlsson Fri, 12 Dec 2003 12:41:10 -0800

> Tim Greenwood wrote:
> > In my interpretation of the C standard (which I am reading from 
> > http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/n843.pdf) UTF-8 is not a 
> > valid wchar_t encoding if your execution character set contains 
> > characters outside the C0 controls and Basic Latin range, and 
> > UTF-16 is not a valid wchar_t encoding if your execution character
> > set has characters outside the BMP. In other words whatever you 
> > consider to be a character (which may be a combining character)
> > must be encoded in one wchar_t code unit.


True. But there are well-known implementations that break that
and has UTF-16 code units as wchar_t instead (something that
upsets the C standardisation committee a bit).

There have been **suggestions** to have utf16_t and utf32_t
(for the respective code units, "char" is judged good enough for
UTF-8 code units), together with character (code unit really)
and string literal syntaxes put into standard C. But don't hold
your breath...

                /kent k

RE: [OT?] The C standard library and UTF's (was RE: Text Editors and Canonical Equivalence (was Coloured diacritics))

Reply via email to