2015-05-09 3:27 GMT+02:00 Daniel Bünzli <daniel.buen...@erratique.ch>:
> Le samedi, 9 mai 2015 à 02:33, Philippe Verdy a écrit : > > 2015-05-08 14:32 GMT+02:00 Daniel Bünzli <daniel.buen...@erratique.ch > (mailto:daniel.buen...@erratique.ch)>: > > > Well did you test them all ? There's quite a big list here > http://www.json.org. Taking a random one mentioned on that page leads me > to http://golang.org/pkg/encoding/json/ in which they say that they > replace invalid UTF-16 surrogate pairs by U+FFFD. This is really not very > surprising since apparently go's strings as text are UTF-8 encoded so when > you need to produce your results as UTF-8 then you don't have a lot of > solutions... error and/or U+FFFD. > > > > > > I've already saif that JSON is UTF-8 encoded by default, but this does > not mean that JSON invalidates the escape sequence '\uD800' isolated in a > string. > > You didn't get what I said. When a parser returns a JSON string it just > parsed and that it wants to give it back to the programmer using the native > string of the language and that these strings happen to be UTF-8 encoded in > this language, then in presence of such lone surrogates you are stuck and > need to do something as you cannot encode them in the UTF-8 string. > You are not stuck! You can still regenerate a valid JSON output encoded in UTF-8: it will once again use escape sequences (which are also needed if your text contains quotation marks used to delimit the JSON strings in its syntax. Unlike UTF-8, JSON has never been designed to restrict its strings to have its represented values to be only plain-text, it is a only a serialization of "strings" to valid plain-text using a custom syntax. There's absolutely no need to restrict strings values to the same validation rules and the same subset as the set of acceptable plain-text: this is not the same layer: one is the string level (in fact not bound to any character encoding and not restricted to text), another is the plain-text, and JSON is the adapter/converter between these two representations. Do not mix these two distinct layers. (this is also the case when someone confuses an XML document with its DOM: not the same layer)