Re: [HACKERS] chr() is still too loose about UTF8 code points

2014-05-16 Thread Tom Lane
David G Johnston writes: > Tom Lane-2 wrote >> While I'd be willing to ignore that risk so far as code points above >> 10 go, if we want pg_utf8_islegal to be happy then we will also >> have to reject surrogate-pair code points. It's not beyond the realm >> of possibility that somebody is int

Re: [HACKERS] chr() is still too loose about UTF8 code points

2014-05-16 Thread David G Johnston
Tom Lane-2 wrote > Noah Misch < > noah@ > > writes: >> On Fri, May 16, 2014 at 11:05:08AM -0400, Tom Lane wrote: >>> I think this probably means we need to change chr() to reject code >>> points >>> above 10. Should we back-patch that, or just do it in HEAD? > >> The compatibility risks res

Re: [HACKERS] chr() is still too loose about UTF8 code points

2014-05-16 Thread Tom Lane
Noah Misch writes: > On Fri, May 16, 2014 at 11:05:08AM -0400, Tom Lane wrote: >> I think this probably means we need to change chr() to reject code points >> above 10. Should we back-patch that, or just do it in HEAD? > The compatibility risks resemble those associated with the fixes for bu

Re: [HACKERS] chr() is still too loose about UTF8 code points

2014-05-16 Thread Noah Misch
On Fri, May 16, 2014 at 11:05:08AM -0400, Tom Lane wrote: > I think this probably means we need to change chr() to reject code points > above 10. Should we back-patch that, or just do it in HEAD? The compatibility risks resemble those associated with the fixes for bug #9210, so I recommend HE

Re: [HACKERS] chr() is still too loose about UTF8 code points

2014-05-16 Thread Tom Lane
Heikki Linnakangas writes: > On 05/16/2014 06:05 PM, Tom Lane wrote: >> I think this probably means we need to change chr() to reject code points >> above 10. Should we back-patch that, or just do it in HEAD? > +1 for back-patching. A value that cannot be restored is bad, and I > can't imag

Re: [HACKERS] chr() is still too loose about UTF8 code points

2014-05-16 Thread Andrew Dunstan
On 05/16/2014 12:43 PM, Heikki Linnakangas wrote: On 05/16/2014 06:05 PM, Tom Lane wrote: Quite some time ago, we made the chr() function accept Unicode code points up to U+1F, which is the largest value that will fit in a 4-byte UTF8 string. It was pointed out to me though that RFC3629

Re: [HACKERS] chr() is still too loose about UTF8 code points

2014-05-16 Thread Heikki Linnakangas
On 05/16/2014 06:05 PM, Tom Lane wrote: Quite some time ago, we made the chr() function accept Unicode code points up to U+1F, which is the largest value that will fit in a 4-byte UTF8 string. It was pointed out to me though that RFC3629 restricted the original definition of UTF8 to only all