2015-05-09 14:51 GMT+02:00 Philippe Verdy <verd...@wanadoo.fr>: > You say a lot of things about what JSON is supposed to be/has been >> designed for. It would be nice to substantiate your claims by pointing at >> relevant standards. If JSON as in RFC 4627 really wanted to transmit >> sequences of bytes I think it would have been *much more* explicit. >> > > No instead it speaks (incorrectly) about code points and mixes the concept > with code units. >
In fact it mixes/confuses three separate concepts, i.e. three layers distinct (that the Unicode standard distinguishes clearly): -1. the internal dataset (values of "strings" as expected by programmers and transmitted via the CODEC of the JSON parser/encoder), using code units in a fixed size (16-bit) -2. the plain-text syntax of JSON (which is independant of the actual character encoding but can be formalized as a stream of Unicode code points -3. the serialization of this plain-text in a stream of bytes (using some UTF encoding scheme, or other legacy 8-bit charsets). The initial implementation of JSON, in Javascript, still used today, just performs the adaptation of the internal dataset (16-bit streams) to plain-text (layers 1. and 2. above). Then Javascript itself specifies no seialization of its source: this is part of the MIME standard for the transport (using MIME "charset" attribute to the media type) when using protocols like HTTP or HTTPS, or some external metadata, or a static definition which is system-dependant (for example in local file systems if they do not store the metadata as a file attribute, a case for which the "BOM" or similar signatures was created or for which there is specific syntax in some languages like XML or HTML for specifying the charset at the beginning of the file, or by using some "charset guesser"). Here also Javascript programmers do not have to worry about the layers 2. and 3. above, they just have to handle 16-bit streams (same remark in PHP, Java or many programming languages): they work at the layer 1 where there's a single encoding, a single size of code unit for everything, and no restriction of values on code units. Same thing when working with the DOM API in XML, HTTP, XVG...