Re: Ways to detect that XXXX in JSON \uXXXX does not correspond to a Unicode character?

2015-05-08 Thread Daniel Bünzli
Le vendredi, 8 mai 2015 à 05:08, Philippe Verdy a écrit : The RFC is jsut informative not normative, RFC 7159 is not informational, it is a proposed standard. Try by yourself, you can perfectly send JSON text containing '\u' (non-character) or '\uF800' (unpaired surrogate) and I've

RE: Ways to detect that XXXX in JSON \uXXXX does not correspond to a Unicode character?

2015-05-08 Thread Costello, Roger L.
Philippe Verdy wrote: Ø implementations just support JSON as plain 16-bit streams Ø Try by yourself, you can perfectly send JSON text containing Ø '\u' (non-character) or '\uD800' (unpaired surrogate) and Ø I've not seen any JSON implementation complaining about one Ø or the other

RE: Script / font support in Windows 10

2015-05-08 Thread Peter Constable
I think this is the right public link: https://msdn.microsoft.com/en-us/goglobal/bb688099.aspx From: Peter Constable Sent: Thursday, May 7, 2015 10:29 PM To: Peter Constable; unicode@unicode.org Subject: RE: Script / font support in Windows 10 Oops... my bad: maybe it isn't on live servers

Re: Ways to detect that XXXX in JSON \uXXXX does not correspond to a Unicode character?

2015-05-08 Thread Doug Ewell
I interpreted Roger Costello's original question literally, that he wanted to find instances of '\u' that do not represent an ASSIGNED Unicode character. Apologies if this discussion is really about something else. -- Doug Ewell | http://ewellic.org | Thornton, CO 

Re: Script / font support in Windows 10

2015-05-08 Thread Mark Davis ☕️
Thanks! Mark https://google.com/+MarkDavis *— Il meglio è l’inimico del bene —* On Fri, May 8, 2015 at 7:15 AM, Peter Constable peter...@microsoft.com wrote: I think this is the right public link: https://msdn.microsoft.com/en-us/goglobal/bb688099.aspx *From:* Peter Constable

Re: Script / font support in Windows 10

2015-05-08 Thread Richard Wordingham
On Fri, 8 May 2015 14:15:55 + Peter Constable peter...@microsoft.com wrote: I think this is the right public link: https://msdn.microsoft.com/en-us/goglobal/bb688099.aspx Does this confirm the intention of Microsoft that at some stage the Universal Shaping Engine (USE) in Windows 10 will

RE: Script / font support in Windows 10

2015-05-08 Thread Andrew Glass (WINDOWS)
Hi Richard, I agree that there is some work to be done to ensure correct display of Tai Tham. That work may involve changes to USE in a future update. We will have a panel on Universal Shaping at the upcoming IUC conference. That will be a good opportunity for a discussion between implementers

Re: Script / font support in Windows 10

2015-05-08 Thread Richard Wordingham
On Fri, 8 May 2015 17:16:01 + Andrew Glass (WINDOWS) andrew.gl...@microsoft.com wrote: I agree that there is some work to be done to ensure correct display of Tai Tham. That work may involve changes to USE in a future update. That's as I understood it, which I is why I was surprised by the

Re: Ways to detect that XXXX in JSON \uXXXX does not correspond to a Unicode character?

2015-05-08 Thread Richard Wordingham
On Fri, 8 May 2015 05:08:21 +0200 Philippe Verdy verd...@wanadoo.fr wrote: Try by yourself, you can perfectly send JSON text containing '\u' (non-character) or '\uF800' (unpaired surrogate) and I've not seen any JSON implementation complaining about one or the other, when receiving the

Surrogates and noncharacters (was: Re: Ways to detect that XXXX...)

2015-05-08 Thread Doug Ewell
Richard Wordingham richard dot wordingham at ntlworld dot com wrote: Try by yourself, you can perfectly send JSON text containing '\u' (non-character) or '\uF800' (unpaired surrogate) and I've not seen any JSON implementation complaining about one or the other, when receiving the JSON

Re: Ways to detect that XXXX in JSON \uXXXX does not correspond to a Unicode character?

2015-05-08 Thread Daniel Bünzli
Le vendredi, 8 mai 2015 à 13:48, Philippe Verdy a écrit : JSON came initially from Javascript, and it is used extensively with Javascript. But not *only* for a long time now. The RFC is deviating from the currently running implementations. Well did you test them all ? There's quite a

Re: Ways to detect that XXXX in JSON \uXXXX does not correspond to a Unicode character?

2015-05-08 Thread Philippe Verdy
2015-05-08 11:27 GMT+02:00 Costello, Roger L. coste...@mitre.org: Okay, I gave it a try. I created this string which contains binary data (sequence of arbitrary unsigned integers): -- æä}gõ› I did not say that these data had not to be properly escaped.

Re: Surrogates and noncharacters (was: Re: Ways to detect that XXXX...)

2015-05-08 Thread Richard Wordingham
On Sat, 9 May 2015 02:26:59 +0200 Daniel Bünzli daniel.buen...@erratique.ch wrote: Le samedi, 9 mai 2015 à 00:37, Doug Ewell a écrit : Noncharacters are Unicode scalar values, (However noncharacters are not designed to be openly interchanged see Restricted interchange on p. 31. of 7.0.0)

Re: Surrogates and noncharacters

2015-05-08 Thread Richard Wordingham
On Sat, 9 May 2015 02:26:59 +0200 Daniel Bünzli daniel.buen...@erratique.ch wrote: Le samedi, 9 mai 2015 à 00:37, Doug Ewell a écrit : This means noncharacters may appear in a well-formed UTF-8, -16, or -32 string, It take appear to mean be encoded. Yes, any Unicode encoding forms allows

Re: Ways to detect that XXXX in JSON \uXXXX does not correspond to a Unicode character?

2015-05-08 Thread Philippe Verdy
2015-05-09 3:27 GMT+02:00 Daniel Bünzli daniel.buen...@erratique.ch: Le samedi, 9 mai 2015 à 02:33, Philippe Verdy a écrit : 2015-05-08 14:32 GMT+02:00 Daniel Bünzli daniel.buen...@erratique.ch (mailto:daniel.buen...@erratique.ch): Well did you test them all ? There's quite a big list here

Re: Surrogates and noncharacters (was: Re: Ways to detect that XXXX...)

2015-05-08 Thread Markus Scherer
On Fri, May 8, 2015 at 9:13 PM, Philippe Verdy verd...@wanadoo.fr wrote: 2015-05-09 5:13 GMT+02:00 Richard Wordingham richard.wording...@ntlworld.com: I can't think of a practical use for the specific concepts of Unicode 8-bit, 16-bit and 32-bit strings. Unicode 16-bit strings are

Re: Surrogates and noncharacters (was: Re: Ways to detect that XXXX...)

2015-05-08 Thread Philippe Verdy
2015-05-09 5:13 GMT+02:00 Richard Wordingham richard.wording...@ntlworld.com: I can't think of a practical use for the specific concepts of Unicode 8-bit, 16-bit and 32-bit strings. Unicode 16-bit strings are essentially the same as 16-bit strings, and Unicode 32-bit strings are UTF-32

Re: Ways to detect that XXXX in JSON \uXXXX does not correspond to a Unicode character?

2015-05-08 Thread Philippe Verdy
JSON came initially from Javascript, and it is used extensively with Javascript. My tests with their JSON parser is that any string that is valdi for Javascript is also valid in JSON (no exception raised, no replaced characters, no deleted characters even if there are unpaired surrogates or

Re: Surrogates and noncharacters (was: Re: Ways to detect that XXXX...)

2015-05-08 Thread Daniel Bünzli
Le samedi, 9 mai 2015 à 00:37, Doug Ewell a écrit : Noncharacters are Unicode scalar values, Non characters are Unicode scalar values by definitions D14 and D76. while unpaired surrogates are not. All surrogates code points are not Unicode scalar values by D71, D73 and D76. This means

Re: Ways to detect that XXXX in JSON \uXXXX does not correspond to a Unicode character?

2015-05-08 Thread Philippe Verdy
2015-05-08 14:32 GMT+02:00 Daniel Bünzli daniel.buen...@erratique.ch: Le vendredi, 8 mai 2015 à 13:48, Philippe Verdy a écrit : JSON came initially from Javascript, and it is used extensively with Javascript. But not *only* for a long time now. The RFC is deviating from the currently

Re: Ways to detect that XXXX in JSON \uXXXX does not correspond to a Unicode character?

2015-05-08 Thread Daniel Bünzli
Le samedi, 9 mai 2015 à 02:33, Philippe Verdy a écrit : 2015-05-08 14:32 GMT+02:00 Daniel Bünzli daniel.buen...@erratique.ch (mailto:daniel.buen...@erratique.ch): Well did you test them all ? There's quite a big list here http://www.json.org. Taking a random one mentioned on that page