> Of course, indeed I just said that! If it were true then that would imply > that '\xNNNN' == '\uNNNN' making the \u and \U escapes rather pointless.
That's not pointless: - '\xNNNN' is interpreted by C compilers as '\xNN' and two uppercase letters N, where '\xNN' is compiled according to the source code character set. This generates 3 characters in the source code character set, which will then be converted to the destination charset used at run-time (this charaset may or may not be Unicode, depending on how the "char" C datatype is set, but most probably, it won't be converted to any of its UTF and will stay in the source charset). The hex sequence after '\x' is almost alwyas limited to 2 digits, as this corresponds to the size of char (most often a single byte), with only exception for 9-bit systems where 1 byte will still contain only 1 char, but with 512 combinations (so there may exist 512 characters in the source charset)... - '\uNNNN' is to be interpreted in the Unicode encoding and charset only, whatever the source or destination charset. It should compile correctly only to create wchar_t instances, provided that the target charset contains this Unicode character. But some compilers may be able to convert the Unicode codepoint into a target charset/encoding, using some UTF scheme (only available for string and wchar_t constants, not for char constants). There's no support here for Unicode characters out of the BMP, except if you specify a pair of surrogates in string constants only, like "\uD800\uDC00". - '\UNNNNNN' is similar but for codepoints in UTF-32 form. It may be available on C compilers that support wchar_t with more than 16 bits (most probably then 32-bit or 24-bit). The C compiler should forbid any assigned invalid codepoint such as surrogates and assigned on-characters like U+FFFE. In practice, for now, most C/C++ compilers support wchar_t as 16-bit unsigned shorts, and have no support for '\U' in character constants, but may provide this support for string constants if the target charset is Unicode (in that case it may convert it first to a UTF-16 sequence), or if the target charset contains the corresponding character. (For example it can be used in some Chinese source code encoded with GB18030, as a way to allow the source code to be remapped to ASCII or UTF-* for transmission, even if the target system will use GB2312 or Unicode at run-time).