Re: Update list of combining characters

2019-06-24 Thread Peter Eisentraut
On 2019-06-19 21:55, Tom Lane wrote: > Peter Eisentraut writes: >>> Indeed. Here is an updated script and patch. > >> committed (to master) > > Cool, but should we also put your recalculation script into git, to help > the next time we decide that we need to update this list? It's > demonstrat

Re: Update list of combining characters

2019-06-19 Thread Tom Lane
Peter Eisentraut writes: >> Indeed. Here is an updated script and patch. > committed (to master) Cool, but should we also put your recalculation script into git, to help the next time we decide that we need to update this list? It's demonstrated to be nontrivial to get it right ;-)

Re: Update list of combining characters

2019-06-19 Thread Peter Eisentraut
On 2019-06-14 11:36, Peter Eisentraut wrote: > On 2019-06-13 15:52, Alvaro Herrera wrote: >> I think there's an off-by-one bug in your script. > > Indeed. Here is an updated script and patch. committed (to master) -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Developm

Re: Update list of combining characters

2019-06-14 Thread Peter Eisentraut
%04X, 0x%04X},", $range_start, $prev_codepoint; $range_start = undef; } } } continue { $prev_codepoint = $codepoint; } print "\n\t};\n"; From da90031113908ee9869ae87a5edbf52992d16a96 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Fri, 14 Jun 2019 11:30:44

Re: Update list of combining characters

2019-06-13 Thread Alvaro Herrera
I think there's an off-by-one bug in your script. I picked one value at random to verify -- 0x0BC0. Old: > - {0x0BC0, 0x0BC0}, {0x0BCD, 0x0BCD}, {0x0C3E, 0x0C40}, New: > + {0x0BC0, 0x0BC1}, {0x0BCD, 0x0BD0}, {0x0C00, 0x0C01}, the UCD file has: 0BC0;TAMIL VOWEL SIGN

Re: Update list of combining characters

2019-06-13 Thread Tom Lane
Peter Eisentraut writes: > Any thoughts about applying this as > a) a bug fix with backpatching > b) just to master > c) wait for PG13 > d) it's all wrong? Well, it's a behavioral change, and we've not gotten field complaints, so I'm about -0.1 on back-patching. No objection to apply to master

Re: Update list of combining characters

2019-06-13 Thread Peter Eisentraut
On 2019-06-04 22:58, Peter Eisentraut wrote: > AFAICT, these Unicode definitions haven't changed since that list was > put in originally around 2006, so I wonder what's going on there. > > I have written a script that recomputes that list from the current > Unicode data. Patch and script are atta

Update list of combining characters

2019-06-04 Thread Peter Eisentraut
;\n\t\t"; } else { print " "; } printf "{0x%04X, 0x%04X},", $range_start, $codepoint; $range_start = undef; } } } print "\n\t};\n"; From a83a7e1bcc3cfee5efa24b4