-----BEGIN PGP SIGNED MESSAGE----- Rick Cameron wrote: > From: Asmus Freytag [mailto:[EMAIL PROTECTED]] > >Of course, the Unicode Standard 3.0 doesn't even mention a 32-bit > >encoding - but that's not stopping uniphiles from storing Unicode data > >in their wchar_t's! > > The only way such use is conformant is if it follows UTF-32. The latter is > clearly specified in http://www.unicode.org/unicode/reports/tr19/ as: > > "The following lists the important features of this encoding form: > > UTF-32 is restricted in values to the range 0..10FFFF, which precisely > matches the range of characters defined in the Unicode Standard (and other > standards such as XML), and those representable by UTF-8 and UTF-16. "
Well, that's not quite true: D800..DFFF are not representable in UTF-16. I was under the impression that as of Unicode 3.2 they would not be legally representable in UTF-8 or UTF-32 either (i.e. that all mappings between UTFs would be bijections between the sets of legal strings, which is a good thing). Does the official definition of "character" include non-characters? Also, I don't think the comment about XML is correct, taking into account the word "precisely"; XML allows the following subset of code points (from http://www.w3.org/TR/REC-xml#charsets): Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] So, I think the above statement should be: UTF-32 is restricted in code point values to the ranges 0..D7FF and E000..10FFFF, which precisely matches the set of code points designated by the Unicode Standard (excluding surrogate code points), and those representable by UTF-8 and UTF-16. This set also matches the set of characters used in other standards such as XML and HTML 4.01, with the exception of some control codes and non-character codes. Note "designated" instead of "defined" - is that the right term? - -- David Hopwood <[EMAIL PROTECTED]> Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/ RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5 0F 69 8C D4 FA 66 15 01 Nothing in this message is intended to be legally binding. If I revoke a public key but refuse to specify why, it is because the private key has been seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip -----BEGIN PGP SIGNATURE----- Version: 2.6.3i Charset: noconv iQEVAwUBPCAs3TkCAxeYt5gVAQHt9AgA0SAyzfJqWD/bEiOT6YXKHoRhj8f88eGu 2jWFubNiYXAj3RR3NZruIR61WUk0DVtIBXCCmhxBh0ZLIAzZguR2mlO7k6T0OpJk h8qEBEMOeaCNLwrFGq7WKZRanznB9nuoG+OikO7FAQ0/VjeGk+9joJJLZDN8BxRO 8DXuvwgjOUynumlAp71fvQzgj20bTXT1y3ckh37ZKAH+3KWBB8Yrdw2n75n+05uq AkL94iKSO+CWzNUUCKwXPEehI3/mV7y2mbjzCVquOQ+KF1/QIqMcLp5JdaP9OyEI b1MBL2ezmAOVustJyh/ofWeM8Ykke0jvELrsjHRKvp2cpZ1PSSITpg== =sa0a -----END PGP SIGNATURE-----