Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-12 Thread Andrew Dunstan
On 06/12/2013 12:13 AM, Craig Ringer wrote: On 06/12/2013 08:42 AM, Andrew Dunstan wrote: If we work by analogy to Postgres' own handling of Unicode escapes, we'll raise an error on any Unicode escape beyond ASCII (not on input for legacy reasons, but on trying to process such datums). I gather

Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-11 Thread Craig Ringer
On 06/12/2013 08:42 AM, Andrew Dunstan wrote: > > If we work by analogy to Postgres' own handling of Unicode escapes, > we'll raise an error on any Unicode escape beyond ASCII (not on input > for legacy reasons, but on trying to process such datums). I gather that > would meet your objection. I c

Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-11 Thread Noah Misch
On Tue, Jun 11, 2013 at 08:42:26PM -0400, Andrew Dunstan wrote: > If we work by analogy to Postgres' own handling of Unicode escapes, > we'll raise an error on any Unicode escape beyond ASCII (not on input > for legacy reasons, but on trying to process such datums). I gather that > would meet

Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-11 Thread Andrew Dunstan
On 06/11/2013 08:18 PM, Noah Misch wrote: On Tue, Jun 11, 2013 at 06:58:05PM -0400, Andrew Dunstan wrote: On 06/11/2013 06:26 PM, Noah Misch wrote: As a final counter example, let me note that Postgres itself handles Unicode escapes differently in UTF8 databases - in other databases it only ac

Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-11 Thread Noah Misch
On Tue, Jun 11, 2013 at 06:58:05PM -0400, Andrew Dunstan wrote: > > On 06/11/2013 06:26 PM, Noah Misch wrote: >> >>> As a final counter example, let me note that Postgres itself handles >>> Unicode escapes differently in UTF8 databases - in other databases it >>> only accepts Unicode escapes up to

Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-11 Thread Andrew Dunstan
On 06/11/2013 06:26 PM, Noah Misch wrote: As a final counter example, let me note that Postgres itself handles Unicode escapes differently in UTF8 databases - in other databases it only accepts Unicode escapes up to U+007f, i.e. ASCII characters. I don't see a counterexample there; every data

Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-11 Thread Noah Misch
On Tue, Jun 11, 2013 at 02:10:45PM -0400, Andrew Dunstan wrote: > > On 06/10/2013 11:22 PM, Noah Misch wrote: >> On Mon, Jun 10, 2013 at 11:20:13AM -0400, Andrew Dunstan wrote: >>> On 06/10/2013 10:18 AM, Tom Lane wrote: Andrew Dunstan writes: > After thinking about this some more I have

Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-11 Thread Tom Lane
Andrew Dunstan writes: > As a final counter example, let me note that Postgres itself handles > Unicode escapes differently in UTF8 databases - in other databases it > only accepts Unicode escapes up to U+007f, i.e. ASCII characters. Good point. What if we adopt that same definition for JSON,

Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-11 Thread Andrew Dunstan
On 06/10/2013 11:22 PM, Noah Misch wrote: On Mon, Jun 10, 2013 at 11:20:13AM -0400, Andrew Dunstan wrote: On 06/10/2013 10:18 AM, Tom Lane wrote: Andrew Dunstan writes: After thinking about this some more I have come to the conclusion that we should only do any de-escaping of \u sequence

Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-11 Thread Hannu Krosing
On 06/11/2013 04:04 PM, Stefan Drees wrote: > On 2013-06-11 15:23 CEST, Hannu Krosing wrote: >> On 06/11/2013 03:08 PM, Stefan Drees wrote: >>> ... >>> >>> What about this: >>> =# SELECT '{"measure":"seconds", "measure":42}'::json; >>> json >>>

Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-11 Thread Hannu Krosing
On 06/11/2013 03:54 PM, Andrew Dunstan wrote: > > On 06/11/2013 09:23 AM, Hannu Krosing wrote: > >> >> I can see no possible JavaScript structure which could produce duplicate >> key when serialised. >> >> And I don't think that any standard JSON reader supports this either. > > You are quite wrong

Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-11 Thread Stefan Drees
On 2013-06-11 15:23 CEST, Hannu Krosing wrote: On 06/11/2013 03:08 PM, Stefan Drees wrote: ... What about this: =# SELECT '{"measure":"seconds", "measure":42}'::json; json -- {"measure":42} I presume people being used to store metadata in

Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-11 Thread Hannu Krosing
On 06/11/2013 03:42 PM, Andrew Dunstan wrote: > > On 06/11/2013 09:16 AM, Hannu Krosing wrote: > > >>> >>> It's a pity that we don't have a non-error producing conversion >>> function >>> (or if we do that I haven't found it). Then we might adopt a rule for >>> processing >>> unicode escapes that s

Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-11 Thread Andrew Dunstan
On 06/11/2013 09:23 AM, Hannu Krosing wrote: I can see no possible JavaScript structure which could produce duplicate key when serialised. And I don't think that any standard JSON reader supports this either. You are quite wrong. This was discussed quite recently on -hackers, too. V8 will

Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-11 Thread Andrew Dunstan
On 06/11/2013 09:16 AM, Hannu Krosing wrote: It's a pity that we don't have a non-error producing conversion function (or if we do that I haven't found it). Then we might adopt a rule for processing unicode escapes that said "convert unicode escapes to the database encoding only when extract

Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-11 Thread Hannu Krosing
On 06/11/2013 03:08 PM, Stefan Drees wrote: > quiring preserving "original text" in json data field is Not Good! >> >> I fully expect '{"a":1, "a":none, "a":true, "a":"b"}'::json to come out >> as '{"a":b"}' > > ahem, do you mean instead to give (none -> null and missing '"' > inserted in "answer")

Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-11 Thread Hannu Krosing
On 06/11/2013 02:41 PM, Andrew Dunstan wrote: > > On 06/11/2013 06:53 AM, Hannu Krosing wrote: >> On 06/11/2013 10:47 AM, Andres Freund wrote: >>> On 2013-06-10 13:01:29 -0400, Andrew Dunstan wrote: > It's legal, is it not, to just write the equivalent Unicode > character in > the JSON

Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-11 Thread Stefan Drees
On 2013-06-11 12:53 CEST, Hannu Krosing wrote: On 06/11/2013 10:47 AM, Andres Freund wrote: On 2013-06-10 13:01:29 -0400, Andrew Dunstan wrote: It's legal, is it not, to just write the equivalent Unicode character in the JSON string and not use the escapes? If so I would think that that would

Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-11 Thread Andrew Dunstan
On 06/11/2013 06:53 AM, Hannu Krosing wrote: On 06/11/2013 10:47 AM, Andres Freund wrote: On 2013-06-10 13:01:29 -0400, Andrew Dunstan wrote: It's legal, is it not, to just write the equivalent Unicode character in the JSON string and not use the escapes? If so I would think that that would b

Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-11 Thread Hannu Krosing
On 06/11/2013 10:47 AM, Andres Freund wrote: > On 2013-06-10 13:01:29 -0400, Andrew Dunstan wrote: >>> It's legal, is it not, to just write the equivalent Unicode character in >>> the JSON string and not use the escapes? If so I would think that that >>> would be the most common usage. If someone

Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-11 Thread Andres Freund
On 2013-06-10 13:01:29 -0400, Andrew Dunstan wrote: > >It's legal, is it not, to just write the equivalent Unicode character in > >the JSON string and not use the escapes? If so I would think that that > >would be the most common usage. If someone's writing an escape, they > >probably had a reaso

Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-10 Thread Noah Misch
On Mon, Jun 10, 2013 at 11:20:13AM -0400, Andrew Dunstan wrote: > > On 06/10/2013 10:18 AM, Tom Lane wrote: >> Andrew Dunstan writes: >>> After thinking about this some more I have come to the conclusion that >>> we should only do any de-escaping of \u sequences, whether or not >>> they are fo

Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-10 Thread Andrew Dunstan
On 06/10/2013 06:07 PM, Robert Haas wrote: On Mon, Jun 10, 2013 at 10:18 AM, Tom Lane wrote: Well, if we have to break backwards compatibility when we try to do binary storage, we're not going to be happy either. So I think we'd better have a plan in mind for what will happen then. Who says

Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-10 Thread Hannu Krosing
On 06/11/2013 12:07 AM, Robert Haas wrote: > On Mon, Jun 10, 2013 at 10:18 AM, Tom Lane wrote: >> Well, if we have to break backwards compatibility when we try to do >> binary storage, we're not going to be happy either. So I think we'd >> better have a plan in mind for what will happen then. > W

Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-10 Thread Robert Haas
On Mon, Jun 10, 2013 at 10:18 AM, Tom Lane wrote: > Well, if we have to break backwards compatibility when we try to do > binary storage, we're not going to be happy either. So I think we'd > better have a plan in mind for what will happen then. Who says we're ever going to do any such thing? T

Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-10 Thread Andrew Dunstan
On 06/10/2013 11:43 AM, Tom Lane wrote: Andrew Dunstan writes: Or we could abandon the conversion altogether, but that doesn't seem very friendly either. I suspect the biggest case for people to use these sequences is where the database is UTF8 but the client encoding is not. Well, if that's

Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-10 Thread Tom Lane
Andrew Dunstan writes: > Or we could abandon the conversion altogether, but that doesn't seem > very friendly either. I suspect the biggest case for people to use these > sequences is where the database is UTF8 but the client encoding is not. Well, if that's actually the biggest use-case, then

Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-10 Thread Andrew Dunstan
On 06/10/2013 10:18 AM, Tom Lane wrote: Andrew Dunstan writes: After thinking about this some more I have come to the conclusion that we should only do any de-escaping of \u sequences, whether or not they are for BMP characters, when the server encoding is utf8. For any other encoding, whi

Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-10 Thread Tom Lane
Andrew Dunstan writes: > After thinking about this some more I have come to the conclusion that > we should only do any de-escaping of \u sequences, whether or not > they are for BMP characters, when the server encoding is utf8. For any > other encoding, which is already a violation of the

Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-09 Thread Andrew Dunstan
On 06/09/2013 07:47 PM, Tom Lane wrote: Andrew Dunstan writes: I did that, but it's evident from the buildfarm that there's more work to do. The problem is that we do the de-escaping as we lex the json to construct the look ahead token, and at that stage we don't know whether or not it's reall

Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-09 Thread Tom Lane
Andrew Dunstan writes: > I did that, but it's evident from the buildfarm that there's more work > to do. The problem is that we do the de-escaping as we lex the json to > construct the look ahead token, and at that stage we don't know whether > or not it's really going to be needed. That means

Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-09 Thread Andrew Dunstan
On 06/06/2013 12:53 PM, Robert Haas wrote: On Wed, Jun 5, 2013 at 10:46 AM, Andrew Dunstan wrote: In 9.2, the JSON parser didn't check the validity of the use of unicode escapes other than that it required 4 hex digits to follow '\u'. In 9.3, that is still the case. However, the JSON accessor

Re: [HACKERS] JSON and unicode surrogate pairs

2013-06-06 Thread Robert Haas
On Wed, Jun 5, 2013 at 10:46 AM, Andrew Dunstan wrote: > In 9.2, the JSON parser didn't check the validity of the use of unicode > escapes other than that it required 4 hex digits to follow '\u'. In 9.3, > that is still the case. However, the JSON accessor functions and operators > also try to tur

[HACKERS] JSON and unicode surrogate pairs

2013-06-05 Thread Andrew Dunstan
In 9.2, the JSON parser didn't check the validity of the use of unicode escapes other than that it required 4 hex digits to follow '\u'. In 9.3, that is still the case. However, the JSON accessor functions and operators also try to turn JSON strings into text in the server encoding, and this