Richard Wordingham <richard dot wordingham at ntlworld dot com> wrote:
>> Try by yourself, you can perfectly send JSON text containing '\uFFFF' >> (non-character) or '\uF800' (unpaired surrogate) and I've not seen >> any JSON implementation complaining about one or the other, when >> receiving the JSON stream and using it in Javascript, you'll see no >> missing code unit or replaced code units and no exception as well. > > Unicode Consortium standards and recommendations allow non-characters > to be sent; as far as I can make out, they are just not to be thought > of as unstandardised graphic characters. As I understand it, from a purely Unicode standpoint, there are differences here between noncharacters and unpaired surrogates. Noncharacters are Unicode scalar values, while unpaired surrogates are not. This means noncharacters may appear in a well-formed UTF-8, -16, or -32 string, while unpaired surrogates may not. They may both be part of a "Unicode string" which does not claim to be in any given encoding form. Authoritative corrections are welcome to help solidify my understanding. I don't wish to get involved in debates over JSON. I've read RFC 7159 and I know what it says. -- Doug Ewell | http://ewellic.org | Thornton, CO 🇺🇸

