Re: Order changes in PG16 since ICU introduction

2023-06-16 Thread Jeff Davis
On Fri, 2023-06-16 at 16:50 +0200, Peter Eisentraut wrote: > This looks good to me. > > Attached is small fixup patch with some documentation tweaks and > simplifying some test code (also includes pgperltidy). Thank you. Committed with your fixups. Regards, Jeff Davis

Re: Order changes in PG16 since ICU introduction

2023-06-16 Thread Peter Eisentraut
On 14.06.23 23:24, Jeff Davis wrote: On Mon, 2023-06-12 at 23:04 +0200, Peter Eisentraut wrote: Patch 0003: Makes LOCALE apply to all providers. The overall feel after this patch is that "locale" now means the collation locale, and LC_COLLATE/LC_CTYPE are for the server environment. When using

Re: Order changes in PG16 since ICU introduction

2023-06-14 Thread Jeff Davis
On Mon, 2023-06-12 at 23:04 +0200, Peter Eisentraut wrote: > I object to adding a new provider for PG16 (patch 0001). Added to July CF for 17. > > 2. Patch 0004 is possibly out of scope for 16 > Also clearly a new feature. Added to July CF for 17. Regards, Jeff Davis

Re: Order changes in PG16 since ICU introduction

2023-06-12 Thread Peter Eisentraut
On 09.06.23 02:36, Jeff Davis wrote: Patches 0001, 0002: These patches implement the built-in provider and automatically change provider=icu to provider=builtin when the locale is C. I object to adding a new provider for PG16 (patch 0001). This is clearly a new feature, which wasn't even con

Re: Order changes in PG16 since ICU introduction

2023-06-12 Thread Daniel Verite
Jeff Davis wrote: > I guess where I'm confused is: why would a user actually want their > database collation to be C.UTF-8? It's slower than C, our > implementation doesn't properly version it (as you pointed out), and > the semantics don't seem great ('Z' < 'a'). Because when LC_CTYPE=C,

Re: Order changes in PG16 since ICU introduction

2023-06-09 Thread Jeff Davis
On Fri, 2023-06-09 at 14:12 +0200, Daniel Verite wrote: > >  I implemented a compromise where initdb will > >  change C.UTF-8 to the built-in provider > > $ initdb --locale=C.UTF-8 ... > This setup is not what the user has asked for and leads to that kind > of > wrong results: > > $ psql -c "s

Re: Order changes in PG16 since ICU introduction

2023-06-09 Thread Daniel Verite
Jeff Davis wrote: > I implemented a compromise where initdb will > change C.UTF-8 to the built-in provider This handling of C.UTF-8 would be felt by users as simply broken. With the v10 patches: $ initdb --locale=C.UTF-8 initdb: using locale provider "builtin" for ICU locale "C.UTF-

Re: Order changes in PG16 since ICU introduction

2023-06-08 Thread Joe Conway
On 6/8/23 17:15, Jeff Davis wrote: On Wed, 2023-06-07 at 20:52 -0400, Joe Conway wrote: If the provider has no such thing, throw an error. Just to be clear, that implies that users (and buildfarm members) with LANG=C.UTF-8 in their environment would not be able to run a plain "initdb -D data";

Re: Order changes in PG16 since ICU introduction

2023-06-08 Thread Jeff Davis
On Wed, 2023-06-07 at 20:52 -0400, Joe Conway wrote: > If the provider has no such thing, throw an error. Just to be clear, that implies that users (and buildfarm members) with LANG=C.UTF-8 in their environment would not be able to run a plain "initdb -D data"; they'd get an error. It's hard for m

Re: Order changes in PG16 since ICU introduction

2023-06-08 Thread Daniel Verite
Jeff Davis wrote: > As I replied in that subthread, that creates a worse problem: if you > only change the provider when the locale is C, then what about when the > locale is *not* C? > > export LANG=en_US.UTF-8 > initdb -D data --locale=fr_FR.UTF-8 > ... >provider:icu >ICU

Re: Order changes in PG16 since ICU introduction

2023-06-08 Thread Daniel Verite
Tatsuo Ishii wrote: > >> Yes it's a special case but when doing initdb --locale=C, a user does > >> not need or want an ICU locale. They want the same thing than what v15 > >> does with the same arguments: a template0 database with > >> datlocprovider='c', datcollate='C', datctype='C', dat

Re: Order changes in PG16 since ICU introduction

2023-06-07 Thread Tatsuo Ishii
>> As I replied in that subthread, that creates a worse problem: if you >> only change the provider when the locale is C, then what about when the >> locale is *not* C? >> >> export LANG=en_US.UTF-8 >> initdb -D data --locale=fr_FR.UTF-8 >> ... >> provider:icu >> ICU locale: en-

Re: Order changes in PG16 since ICU introduction

2023-06-07 Thread Joe Conway
On 6/7/23 19:26, Jeff Davis wrote: * What do we do in the case where the environment has LANG=C.UTF-8 (as some buildfarm members do)? Is an error acceptable in that case? If I understand the discussion so far correctly, I think that case should fall to the provider. If it supports "C.UTF-8"

Re: Order changes in PG16 since ICU introduction

2023-06-07 Thread Tatsuo Ishii
Hi, > On Wed, 2023-06-07 at 23:50 +0200, Daniel Verite wrote: >> The simplest way to obtain that in v16 is to teach initdb that >> --locale=C without the --locale-provider option implies that >> --locale-provider=libc ([1]) > > As I replied in that subthread, that creates a worse problem: if you

Re: Order changes in PG16 since ICU introduction

2023-06-07 Thread Jeff Davis
On Thu, 2023-06-08 at 00:11 +0200, Peter Eisentraut wrote: > On 05.06.23 19:54, Jeff Davis wrote: > > New patch series attached. > > Could you clarify what here is intended for 16 and what is for later? I apologize about the patch churn here. I implemented several approaches to see what feedback

Re: Order changes in PG16 since ICU introduction

2023-06-07 Thread Jeff Davis
On Wed, 2023-06-07 at 23:50 +0200, Daniel Verite wrote: > The simplest way to obtain that in v16 is to teach initdb that > --locale=C without the --locale-provider option implies that > --locale-provider=libc ([1]) As I replied in that subthread, that creates a worse problem: if you only change th

Re: Order changes in PG16 since ICU introduction

2023-06-07 Thread Peter Eisentraut
On 05.06.23 19:54, Jeff Davis wrote: New patch series attached. Could you clarify what here is intended for 16 and what is for later? This patch set keeps expanding and changing in each iteration. There is a PG16 open item linked to this thread * The rules for choosing default ICU locale se

Re: Order changes in PG16 since ICU introduction

2023-06-07 Thread Peter Eisentraut
On 05.06.23 19:54, Jeff Davis wrote: New patch series attached. I plan to commit 0001 and 0002 soon, unless there are objections. 0001 causes the "C" and "POSIX" locales to be treated with memcmp/pg_ascii semantics in ICU, just like in libc. We also considered a new "none" provider, but it's mo

Re: Order changes in PG16 since ICU introduction

2023-06-07 Thread Daniel Verite
Jeff Davis wrote: > The locale "C" is a special case, documented as a non-locale. So, if > LOCALE/--locale apply to ICU, then either ICU needs to handle locale > "C" in the expected way (v8 patch series); or when we see locale "C" we > need to somehow change the provider into something tha

Re: Order changes in PG16 since ICU introduction

2023-06-07 Thread Peter Eisentraut
On 22.05.23 19:35, Jeff Davis wrote: On Thu, 2023-05-11 at 13:07 +0200, Peter Eisentraut wrote: Here is my proposed patch for this. The commit message makes it sound like lc_collate/ctype are completely obsolete, and I don't think that's quite right: they still represent the server environment

Re: Order changes in PG16 since ICU introduction

2023-06-06 Thread Andrew Gierth
> "Joe" == Joe Conway writes: > On 6/6/23 15:55, Tom Lane wrote: >> Robert Haas writes: >>> On Tue, Jun 6, 2023 at 3:25 PM Tom Lane wrote: Also +1, except that I find "none" a rather confusing choice of name. There *is* a provider, it's just PG itself not either libc or ICU.

Re: Order changes in PG16 since ICU introduction

2023-06-06 Thread Tom Lane
"Jonathan S. Katz" writes: > Since we're bikeshedding, "postgresql" or "builtin" could make it seem > to a (app) developer that these may be recommended options, as we're > trusting PostgreSQL to make the best choices for us. Granted, v16 is > (theoretically) defaulting to ICU, so that choice i

Re: Order changes in PG16 since ICU introduction

2023-06-06 Thread Jonathan S. Katz
On 6/6/23 3:56 PM, Joe Conway wrote: On 6/6/23 15:55, Tom Lane wrote: Robert Haas writes: On Tue, Jun 6, 2023 at 3:25 PM Tom Lane wrote: Also +1, except that I find "none" a rather confusing choice of name. There *is* a provider, it's just PG itself not either libc or ICU. I thought Joe's su

Re: Order changes in PG16 since ICU introduction

2023-06-06 Thread Joe Conway
On 6/6/23 15:55, Tom Lane wrote: Robert Haas writes: On Tue, Jun 6, 2023 at 3:25 PM Tom Lane wrote: Also +1, except that I find "none" a rather confusing choice of name. There *is* a provider, it's just PG itself not either libc or ICU. I thought Joe's suggestion of "internal" made more sense

Re: Order changes in PG16 since ICU introduction

2023-06-06 Thread Tom Lane
Robert Haas writes: > On Tue, Jun 6, 2023 at 3:25 PM Tom Lane wrote: >> Also +1, except that I find "none" a rather confusing choice of name. >> There *is* a provider, it's just PG itself not either libc or ICU. >> I thought Joe's suggestion of "internal" made more sense. > Or perhaps "builtin"

Re: Order changes in PG16 since ICU introduction

2023-06-06 Thread Robert Haas
On Tue, Jun 6, 2023 at 3:25 PM Tom Lane wrote: > Joe Conway writes: > > On 6/6/23 15:18, Jeff Davis wrote: > >> The locale "C" is a special case, documented as a non-locale. So, if > >> LOCALE/--locale apply to ICU, then either ICU needs to handle locale > >> "C" in the expected way (v8 patch ser

Re: Order changes in PG16 since ICU introduction

2023-06-06 Thread Tom Lane
Joe Conway writes: > On 6/6/23 15:18, Jeff Davis wrote: >> The locale "C" is a special case, documented as a non-locale. So, if >> LOCALE/--locale apply to ICU, then either ICU needs to handle locale >> "C" in the expected way (v8 patch series); or when we see locale "C" we >> need to somehow chan

Re: Order changes in PG16 since ICU introduction

2023-06-06 Thread Joe Conway
On 6/6/23 15:18, Jeff Davis wrote: On Tue, 2023-06-06 at 15:09 +0200, Daniel Verite wrote: FWIW I don't quite see how 0001 improve things or what problem it's trying to solve. The word "locale" is generic, so we need to make LOCALE/--locale apply to whatever provider is being used. If "locale"

Re: Order changes in PG16 since ICU introduction

2023-06-06 Thread Joe Conway
On 6/6/23 15:15, Jeff Davis wrote: On Tue, 2023-06-06 at 14:11 -0400, Joe Conway wrote: This discussion makes me wonder (though probably too late for the v16 cycle) if we shouldn't treat "C" and "POSIX" locales to be a third provider, something like "internal". That's exactly what I did in v6

Re: Order changes in PG16 since ICU introduction

2023-06-06 Thread Jeff Davis
On Tue, 2023-06-06 at 14:11 -0400, Joe Conway wrote: > This discussion makes me wonder (though probably too late for the v16 > cycle) if we shouldn't treat "C" and "POSIX" locales to be a third > provider, something like "internal". That's exactly what I did in v6 of this series: I created a "non

Re: Order changes in PG16 since ICU introduction

2023-06-06 Thread Joe Conway
On 6/6/23 09:09, Daniel Verite wrote: Jeff Davis wrote: New patch series attached. I plan to commit 0001 and 0002 soon, unless there are objections. 0001 causes the "C" and "POSIX" locales to be treated with memcmp/pg_ascii semantics in ICU, just like in libc. We also considered a new "

Re: Order changes in PG16 since ICU introduction

2023-06-06 Thread Daniel Verite
Jeff Davis wrote: > New patch series attached. I plan to commit 0001 and 0002 soon, unless > there are objections. > > 0001 causes the "C" and "POSIX" locales to be treated with > memcmp/pg_ascii semantics in ICU, just like in libc. We also > considered a new "none" provider, but it's mor

Re: Order changes in PG16 since ICU introduction

2023-05-26 Thread Daniel Verite
Jeff Davis wrote: > > #1 > > > > postgres=# create database test1 locale='fr_FR.UTF-8'; > > NOTICE: using standard form "fr-FR" for ICU locale "fr_FR.UTF-8" > > ERROR: new ICU locale (fr-FR) is incompatible with the ICU locale of > > I don't see a problem here. If you specify LOCALE to

Re: Order changes in PG16 since ICU introduction

2023-05-25 Thread Jeff Davis
On Mon, 2023-05-22 at 14:34 +0200, Peter Eisentraut wrote: > Please put blank lines between > > > > > etc., matching existing style. > > We usually don't capitalize the collation parameters like > > CREATE COLLATION mycollation1 (PROVIDER = icu, LOCALE = 'ja-JP); > > elsewhere in the documen

Re: Order changes in PG16 since ICU introduction

2023-05-24 Thread Joe Conway
On 5/24/23 11:39, Jeff Davis wrote: On Mon, 2023-05-22 at 22:09 +0200, Daniel Verite wrote: In practice we're probably getting the "und" ICU locale whereas "fr" would be appropriate. This is a good point and illustrates that ICU is not a drop-in replacement for libc in all cases. I don't see

Re: Order changes in PG16 since ICU introduction

2023-05-24 Thread Jeff Davis
On Mon, 2023-05-22 at 22:09 +0200, Daniel Verite wrote: > While I agree that the LOCALE option in CREATE DATABASE is > counter-intuitive, I think it's more than that. As Andreww Gierth pointed out: $ initdb --locale=fr_FR ... ICU locale: en-US ... Is more than just counter-intuiti

Re: Order changes in PG16 since ICU introduction

2023-05-22 Thread Daniel Verite
Jeff Davis wrote: > If we special case locale=C, but do nothing for locale=fr_FR, then I'm > not sure we've solved the problem. Andrew Gierth raised the issue here, > which he called "maximally confusing": > > https://postgr.es/m/874jp9f5jo@news-spur.riddles.org.uk > > That's why I f

Re: Order changes in PG16 since ICU introduction

2023-05-22 Thread Jeff Davis
On Mon, 2023-05-22 at 14:27 +0200, Peter Eisentraut wrote: > The rules are for setting whatever sort order you like.  Maybe you > want > to sort + before - or whatever.  It's like, if you don't like it, > build > your own. A build-your-own feature is fine, but it's not completely zero cost. The

Re: Order changes in PG16 since ICU introduction

2023-05-22 Thread Jeff Davis
On Thu, 2023-05-11 at 13:07 +0200, Peter Eisentraut wrote: > Here is my proposed patch for this. The commit message makes it sound like lc_collate/ctype are completely obsolete, and I don't think that's quite right: they still represent the server environment, which does still matter in some cases

Re: Order changes in PG16 since ICU introduction

2023-05-22 Thread Jeff Davis
On Thu, 2023-05-11 at 13:09 +0200, Peter Eisentraut wrote: > There is also the deterministic flag and the icurules setting. > Depending on what level of detail you imagine the user needs, you > really > do need to look at the whole picture, not some subset of it. (Nit: all database default colla

Re: Order changes in PG16 since ICU introduction

2023-05-22 Thread Peter Eisentraut
On 18.05.23 00:59, Jeff Davis wrote: On Tue, 2023-05-16 at 20:23 -0700, Jeff Davis wrote: Other than that, and I took your suggestions almost verbatim. Patch attached. Thank you! Attached new patch with a typo fix and a few other edits. I plan to commit soon. Some small follow-up on this pat

Re: Order changes in PG16 since ICU introduction

2023-05-22 Thread Peter Eisentraut
On 18.05.23 19:55, Jeff Davis wrote: On Wed, 2023-05-17 at 19:59 -0400, Jonathan S. Katz wrote: I did a quicker read through this time. LGTM overall. I like what you did with the explanations around sensitivity (now it makes sense). Committed, thank you. There are a few things I don't underst

Re: Order changes in PG16 since ICU introduction

2023-05-20 Thread Jeff Davis
On Fri, 2023-05-19 at 21:13 +0200, Daniel Verite wrote: > ISTM that if we want to go that route, we need the make the minimum > changes at the user interface level and not any deeper, so that when > (locale="C" OR locale="POSIX") AND the provider has not been > specified, > then the command (initdb

Re: Order changes in PG16 since ICU introduction

2023-05-19 Thread Tom Lane
Jeff Davis writes: > Committed, thank you. This commit has given the PDF docs build some indigestion: Making portrait pages on A4 paper (210mmx297mm) /home/postgres/bin/fop -fo postgres-A4.fo -pdf postgres-A4.pdf [WARN] FOUserAgent - Font "Symbol,normal,700" not found. Substituting with "Symbol

Re: Order changes in PG16 since ICU introduction

2023-05-19 Thread Daniel Verite
Jeff Davis wrote: > 2) Automatically change the provider to libc when locale=C. > > Almost works, but it's not clear how we handle the case "provider=icu > lc_collate='fr_FR.utf8' locale=C". > > If we change it to "provider=libc lc_collate=C", we've overridden the > specified lc_collate.

Re: Order changes in PG16 since ICU introduction

2023-05-18 Thread Jeff Davis
On Thu, 2023-05-18 at 20:11 +0200, Matthias van de Meent wrote: > As I complain about in [0], since 5cd1a5af --no-locale has been > broken > / bahiving outside it's description: Instead of being equivalent to > `--locale=C` it now also overrides `--locale-provider=libc`, > resulting > in undocument

Re: Order changes in PG16 since ICU introduction

2023-05-18 Thread Jeff Davis
On Thu, 2023-05-18 at 13:58 -0400, Jonathan S. Katz wrote: >  From my read of them, as an app developer I'd be very unlikely to > use > this. Maybe there is something with building out some collation rules > vis-a-vis an extension, but I have trouble imagining the use-case. I > may > also not be

Re: Order changes in PG16 since ICU introduction

2023-05-18 Thread Matthias van de Meent
On Fri, 21 Apr 2023 at 22:46, Jeff Davis wrote: > > On Fri, 2023-04-21 at 19:00 +0100, Andrew Gierth wrote: > > > > > > > > Also, somewhere along the line someone broke initdb --no-locale, > > which > > should result in C locale being the default everywhere, but when I > > just > > tested it it pi

Re: Order changes in PG16 since ICU introduction

2023-05-18 Thread Jonathan S. Katz
On 5/18/23 1:55 PM, Jeff Davis wrote: On Wed, 2023-05-17 at 19:59 -0400, Jonathan S. Katz wrote: I did a quicker read through this time. LGTM overall. I like what you did with the explanations around sensitivity (now it makes sense). Committed, thank you. \o/ There are a few things I don't

Re: Order changes in PG16 since ICU introduction

2023-05-18 Thread Jeff Davis
On Wed, 2023-05-17 at 19:59 -0400, Jonathan S. Katz wrote: > I did a quicker read through this time. LGTM overall. I like what you > did with the explanations around sensitivity (now it makes sense). Committed, thank you. There are a few things I don't understand that would be good to document be

Re: Order changes in PG16 since ICU introduction

2023-05-17 Thread Jonathan S. Katz
On 5/17/23 6:59 PM, Jeff Davis wrote: On Tue, 2023-05-16 at 20:23 -0700, Jeff Davis wrote: Other than that, and I took your suggestions almost verbatim. Patch attached. Thank you! Attached new patch with a typo fix and a few other edits. I plan to commit soon. I did a quicker read through th

Re: Order changes in PG16 since ICU introduction

2023-05-17 Thread Jeff Davis
On Tue, 2023-05-16 at 20:23 -0700, Jeff Davis wrote: > Other than that, and I took your suggestions almost verbatim. Patch > attached. Thank you! Attached new patch with a typo fix and a few other edits. I plan to commit soon. Regards, Jeff Davis From d0d2375fa55618b60f361f6bb64b2c494901

Re: Order changes in PG16 since ICU introduction

2023-05-16 Thread Jeff Davis
On Tue, 2023-05-16 at 15:35 -0400, Jonathan S. Katz wrote: > +  Sensitivity when determining equality, with > +  level1 the least sensitive and > +  identic the most sensitive. See +  linkend="icu-collation-levels"/> for details. > > This discusses equality sensiti

Re: Order changes in PG16 since ICU introduction

2023-05-16 Thread Jonathan S. Katz
On 5/5/23 8:25 PM, Jeff Davis wrote: On Fri, 2023-04-21 at 20:12 -0400, Robert Haas wrote: On Fri, Apr 21, 2023 at 5:56 PM Jeff Davis wrote: Most of the complaints seem to be complaints about v15 as well, and while those complaints may be a reason to not make ICU the default, they are also an

Re: Order changes in PG16 since ICU introduction

2023-05-16 Thread Jeff Davis
On Tue, 2023-05-16 at 19:00 +0300, Alexander Lakhin wrote: > I'm not sure about the proposed change in icu_from_uchar(). It seems > that > len_result + 1 bytes should always be enough for the result string > terminated > with NUL. If that's not true (we want to protect from some ICU bug > here), >

Re: Order changes in PG16 since ICU introduction

2023-05-16 Thread Alexander Lakhin
Hi Jeff, 16.05.2023 00:03, Jeff Davis wrote: On Sat, 2023-05-13 at 13:00 +0300, Alexander Lakhin wrote: On the current master (after 455f948b0, and before f7faa9976, of course) I get an ASAN-detected failure with the following query: CREATE COLLATION col (provider = icu, locale = '123456789012'

Re: Order changes in PG16 since ICU introduction

2023-05-15 Thread Jeff Davis
On Mon, 2023-05-08 at 14:59 -0700, Jeff Davis wrote: > The easiest thing to do is revert it for now, and after we sort out > the > memcmp() path for the ICU provider, then I can commit it again (after > that point it would just be code cleanup and should have no > functional > impact). The convers

Re: Order changes in PG16 since ICU introduction

2023-05-15 Thread Jeff Davis
On Sat, 2023-05-13 at 13:00 +0300, Alexander Lakhin wrote: > On the current master (after 455f948b0, and before f7faa9976, of > course) > I get an ASAN-detected failure with the following query: > CREATE COLLATION col (provider = icu, locale = '123456789012'); > Thank you for the report! ICU sou

Re: Order changes in PG16 since ICU introduction

2023-05-15 Thread Peter Eisentraut
On 11.05.23 23:29, Jeff Davis wrote: New patch series attached. === 0001: fix bug that allows creating hidden collations Bug: https://www.postgresql.org/message-id/051c9395cf880307865ee8b17acdbf7f838c1e39.ca...@j-davis.com This is still being debated in the other thread. Not really related t

Re: Order changes in PG16 since ICU introduction

2023-05-13 Thread Alexander Lakhin
Hello Jeff, 09.05.2023 00:59, Jeff Davis wrote: The easiest thing to do is revert it for now, and after we sort out the memcmp() path for the ICU provider, then I can commit it again (after that point it would just be code cleanup and should have no functional impact). On the current master (a

Re: Order changes in PG16 since ICU introduction

2023-05-11 Thread Jeff Davis
New patch series attached. === 0001: fix bug that allows creating hidden collations Bug: https://www.postgresql.org/message-id/051c9395cf880307865ee8b17acdbf7f838c1e39.ca...@j-davis.com === 0002: handle some kinds of libc-stlye locale strings ICU used to handle libc locale strings like 'fr_FR@e

Re: Order changes in PG16 since ICU introduction

2023-05-11 Thread Peter Eisentraut
On 09.05.23 17:09, Jeff Davis wrote: It's awkward for a user to read pg_database.datlocprovider, then depending on that, either look in datcollate or daticulocale. (It's awkward in the code, too.) Maybe some built-in function that returns a tuple of the default provider, the locale, and the vers

Re: Order changes in PG16 since ICU introduction

2023-05-11 Thread Peter Eisentraut
On 09.05.23 10:25, Alvaro Herrera wrote: On 2023-Apr-24, Peter Eisentraut wrote: The GUC settings lc_collate and lc_ctype are from a time when those locale settings were cluster-global. When we made those locale settings per-database (PG 8.4), we kept them as read-only. As of PG 15, you can u

Re: Order changes in PG16 since ICU introduction

2023-05-09 Thread Jeff Davis
On Tue, 2023-05-09 at 10:25 +0200, Alvaro Herrera wrote: > I agree with removing these in v16, since they are going to become > more > meaningless and confusing. Agreed, but it would be nice to have an alternative that does the right thing. It's awkward for a user to read pg_database.datlocprovid

Re: Order changes in PG16 since ICU introduction

2023-05-09 Thread Alvaro Herrera
On 2023-Apr-24, Peter Eisentraut wrote: > The GUC settings lc_collate and lc_ctype are from a time when those locale > settings were cluster-global. When we made those locale settings > per-database (PG 8.4), we kept them as read-only. As of PG 15, you can use > ICU as the per-database locale pr

Re: Order changes in PG16 since ICU introduction

2023-05-08 Thread Jeff Davis
On Mon, 2023-05-08 at 17:47 -0400, Tom Lane wrote: > -ERROR:  could not convert locale name "C" to language tag: > U_ILLEGAL_ARGUMENT_ERROR > +NOTICE:  using standard form "en-US-u-va-posix" for locale "C" ... > I suppose this is environment-dependent.  Sadly, the buildfarm > client does not show

Re: Order changes in PG16 since ICU introduction

2023-05-08 Thread Tom Lane
Jeff Davis writes: > === 0001: do not convert C to en-US-u-va-posix > I plan to commit this soon. Several buildfarm animals have failed since this went in. The only one showing enough info to diagnose is siskin [1]: @@ -1043,16 +1043,15 @@ ERROR: ICU locale "nonsense-nowhere" has unknown lan

Re: Order changes in PG16 since ICU introduction

2023-05-05 Thread Jeff Davis
On Fri, 2023-04-21 at 20:12 -0400, Robert Haas wrote: > On Fri, Apr 21, 2023 at 5:56 PM Jeff Davis wrote: > > Most of the complaints seem to be complaints about v15 as well, and > > while those complaints may be a reason to not make ICU the default, > > they are also an argument that we should con

Re: Order changes in PG16 since ICU introduction

2023-05-03 Thread Jeff Davis
On Fri, 2023-04-28 at 14:35 -0700, Jeff Davis wrote: > On Thu, 2023-04-27 at 14:23 +0200, Daniel Verite wrote: > > This should be pg_strcasecmp(...) == 0 > > Good catch, thank you! Fixed in updated patches. Rebased patches. === 0001: do not convert C to en-US-u-va-posix I plan to commit this so

Re: Order changes in PG16 since ICU introduction

2023-04-28 Thread Jeff Davis
On Thu, 2023-04-27 at 14:23 +0200, Daniel Verite wrote: > This should be pg_strcasecmp(...) == 0 Good catch, thank you! Fixed in updated patches. > postgres=# create database lat9 locale 'fr_FR@euro' encoding LATIN9 > template > 'template0'; > ERROR:  could not convert locale name "fr_FR@euro" to

Re: Order changes in PG16 since ICU introduction

2023-04-27 Thread Daniel Verite
Jeff Davis wrote: > Attached are a few small patches: > > 0001: don't convert C to en-US-u-va-posix > 0002: handle locale C the same regardless of the provider, as you > suggest above > 0003: make LOCALE (or --locale) apply to everything including ICU Testing this briefly I noticed

Re: Order changes in PG16 since ICU introduction

2023-04-25 Thread Jeff Davis
On Fri, 2023-04-21 at 22:35 +0100, Andrew Gierth wrote: > > > > > > Can lc_collate_is_c() be taught to check whether an ICU locale is > using > POSIX collation? Attached are a few small patches: 0001: don't convert C to en-US-u-va-posix 0002: handle locale C the same regardless of the provid

Re: Order changes in PG16 since ICU introduction

2023-04-25 Thread Tom Lane
"Daniel Verite" writes: > FTR the full text search parser still uses the libc functions > is[w]space/alpha/digit... that depend on lc_ctype, whether the db > collation provider is ICU or not. Yeah, those aren't even connected up to the collation-selection mechanisms; lots of work to do there. I

Re: Order changes in PG16 since ICU introduction

2023-04-25 Thread Daniel Verite
Jeff Davis wrote: > > (I'm not sure whether those operations can get redirected to ICU > > today > > or whether they still always go to libc, but we'll surely want to fix > > it eventually if the latter is still true.) > > Those operations do get redirected to ICU today. FTR the full te

Re: Order changes in PG16 since ICU introduction

2023-04-24 Thread Jeff Davis
On Fri, 2023-04-21 at 16:00 -0400, Tom Lane wrote: > I think I might like this idea, except for one thing: you're > imagining > that the locale doesn't control anything except string comparisons. > What about to_upper/to_lower, character classifications in regexes, > etc? If provider='libc' and LC

Re: Order changes in PG16 since ICU introduction

2023-04-24 Thread Peter Eisentraut
On 22.04.23 01:00, Jeff Davis wrote: On Fri, 2023-04-21 at 16:33 -0400, Robert Haas wrote: And the fact that "C" or "POSIX" gets transformed into "en-US-u-va-posix" I already expressed, on reflection, that we should probably just not do that. So I think we're in agreement on this point; patch

Re: Order changes in PG16 since ICU introduction

2023-04-24 Thread Peter Eisentraut
On 21.04.23 19:14, Peter Eisentraut wrote: On 21.04.23 19:09, Sandro Santilli wrote: On Fri, Apr 21, 2023 at 11:48:51AM -0400, Tom Lane wrote: "Regina Obe" writes: https://trac.osgeo.org/postgis/ticket/5375 If they actually are using locale C, I would say this is a bug. That should designa

Re: Order changes in PG16 since ICU introduction

2023-04-21 Thread Robert Haas
On Fri, Apr 21, 2023 at 5:56 PM Jeff Davis wrote: > Most of the complaints seem to be complaints about v15 as well, and > while those complaints may be a reason to not make ICU the default, > they are also an argument that we should continue to learn and try to > fix those issues because they exis

Re: Order changes in PG16 since ICU introduction

2023-04-21 Thread Jeff Davis
On Fri, 2023-04-21 at 16:33 -0400, Robert Haas wrote: > And the fact that "C" or "POSIX" gets transformed into > "en-US-u-va-posix" I already expressed, on reflection, that we should probably just not do that. So I think we're in agreement on this point; patch attached. Regards, Jeff Davi

RE: Order changes in PG16 since ICU introduction

2023-04-21 Thread Regina Obe
> > My opinion is that the switch to using ICU by default is ill-advised > > and should be reverted. > > Most of the complaints seem to be complaints about v15 as well, and while > those complaints may be a reason to not make ICU the default, they are also > an argument that we should continue to

Re: Order changes in PG16 since ICU introduction

2023-04-21 Thread Jeff Davis
On Fri, 2023-04-21 at 16:33 -0400, Robert Haas wrote: > My opinion is that the switch to using ICU by default is ill-advised > and should be reverted. Most of the complaints seem to be complaints about v15 as well, and while those complaints may be a reason to not make ICU the default, they are a

Re: Order changes in PG16 since ICU introduction

2023-04-21 Thread Andrew Gierth
> "Jeff" == Jeff Davis writes: >> Is that the right fix, though? (It forces --locale-provider=libc for >> the cluster default, which might not be desirable?) Jeff> For the "no locale" behavior (memcmp()-based) the provider needs Jeff> to be libc. Do you see an alternative? Can lc_collat

Re: Order changes in PG16 since ICU introduction

2023-04-21 Thread Jeff Davis
On Fri, 2023-04-21 at 22:08 +0100, Andrew Gierth wrote: > > > > > > Is that the right fix, though? (It forces --locale-provider=libc for > the > cluster default, which might not be desirable?) For the "no locale" behavior (memcmp()-based) the provider needs to be libc. Do you see an alternative?

Re: Order changes in PG16 since ICU introduction

2023-04-21 Thread Andrew Gierth
> "Jeff" == Jeff Davis writes: >> Also, somewhere along the line someone broke initdb --no-locale, >> which should result in C locale being the default everywhere, but >> when I just tested it it picked 'en' for an ICU locale, which is not >> the right thing. Jeff> Fixed, thank you. Is

Re: Order changes in PG16 since ICU introduction

2023-04-21 Thread Jeff Davis
On Fri, 2023-04-21 at 16:00 -0400, Tom Lane wrote: > Maybe this means we are not ready to do ICU-by-default in v16. > It certainly feels like there might be more here than we want to > start designing post-feature-freeze. I don't see how punting to the next release helps. If the CREATE DATABASE sy

Re: Order changes in PG16 since ICU introduction

2023-04-21 Thread Jeff Davis
On Fri, 2023-04-21 at 19:00 +0100, Andrew Gierth wrote: > > > > > > Also, somewhere along the line someone broke initdb --no-locale, > which > should result in C locale being the default everywhere, but when I > just > tested it it picked 'en' for an ICU locale, which is not the right > thing. Fi

Re: Order changes in PG16 since ICU introduction

2023-04-21 Thread Robert Haas
On Fri, Apr 21, 2023 at 3:25 PM Jeff Davis wrote: > I am also having second thoughts about accepting "C" or "POSIX" as an > ICU locale and transforming it to "en-US-u-va-posix" in v16. It's not > terribly useful (why not just use memcmp()?), it's not fast in my > measurements (en-US is faster), so

Re: Order changes in PG16 since ICU introduction

2023-04-21 Thread Jeff Davis
On Fri, 2023-04-21 at 13:28 -0400, Tom Lane wrote: > I am wondering however whether this doesn't mean that all our > carefully > coded fast paths for C locale just went down the drain. The code still exists. You can test it by using the built-in collation "C" which is correctly specified with coll

Re: Order changes in PG16 since ICU introduction

2023-04-21 Thread Jeff Davis
On Fri, 2023-04-21 at 21:14 +0200, Sandro Santilli wrote: > And then runs: > >   createdb --encoding=UTF-8 --template=template0 --lc-collate=C > > Should we tweak anything else to make the results predictable ? You can specify --locale-provider=libc Regards, Jeff Davis

Re: Order changes in PG16 since ICU introduction

2023-04-21 Thread Tom Lane
Jeff Davis writes: > I have a couple ideas: > 1. Introduce a "none" provider to separate the concept of C/POSIX > locales from the libc provider. It's not really using a provider > anyway, it's just using memcmp(), and I think it causes confusion to > combine them. Saying "LOCALE_PROVIDER=none" i

Re: Order changes in PG16 since ICU introduction

2023-04-21 Thread Jeff Davis
On Fri, 2023-04-21 at 14:23 -0400, Tom Lane wrote: > postgres=# CREATE DATABASE test1 TEMPLATE=template0 ENCODING = 'UTF8' > LOCALE = 'C'; ... >  test1 | postgres | UTF8 | icu | C  | > C  | en-US  |   | > (4 rows) > > Looks like the "pick en-US ev

Re: Order changes in PG16 since ICU introduction

2023-04-21 Thread Sandro Santilli
On Fri, Apr 21, 2023 at 10:27:49AM -0700, Jeff Davis wrote: > On Fri, 2023-04-21 at 19:09 +0200, Sandro Santilli wrote: > >   =# select version(); > >   PostgreSQL 16devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu > > 11.3.0-1ubuntu1~22.04) 11.3.0, 64-bit > >   =# show lc_collate; > >   C > >

Re: Order changes in PG16 since ICU introduction

2023-04-21 Thread Andrew Gierth
> "Tom" == Tom Lane writes: >> Also, somewhere along the line someone broke initdb --no-locale, >> which should result in C locale being the default everywhere, but >> when I just tested it it picked 'en' for an ICU locale, which is not >> the right thing. Tom> Confirmed: Tom> $ LANG=

Re: Order changes in PG16 since ICU introduction

2023-04-21 Thread Sandro Santilli
On Fri, Apr 21, 2023 at 07:14:13PM +0200, Peter Eisentraut wrote: > On 21.04.23 19:09, Sandro Santilli wrote: > > On Fri, Apr 21, 2023 at 11:48:51AM -0400, Tom Lane wrote: > > > "Regina Obe" writes: > > > > > > > https://trac.osgeo.org/postgis/ticket/5375 > > > > > > If they actually are using l

Re: Order changes in PG16 since ICU introduction

2023-04-21 Thread Tom Lane
"Regina Obe" writes: > CREATE DATABASE test1 TEMPLATE=template0 ENCODING = 'UTF8' LOCALE = 'C'; > Doesn't seem to work at least not under mingw64 anyway. Hmm, doesn't work for me either: $ LANG=en_US.utf8 initdb The files belonging to this database system will be owned by user "postgres". This u

RE: Order changes in PG16 since ICU introduction

2023-04-21 Thread Regina Obe
> Yeah. My recommendation is just LOCALE: > > regression=# CREATE DATABASE test1 TEMPLATE=template0 ENCODING = > 'UTF8' LOCALE = 'C'; CREATE DATABASE regression=# CREATE DATABASE test2 > TEMPLATE=template0 ENCODING = 'UTF8' ICU_LOCALE = 'C'; > NOTICE: using standard form "en-US-u-va-posix" for l

Re: Order changes in PG16 since ICU introduction

2023-04-21 Thread Tom Lane
Andrew Gierth writes: > "Peter" == Peter Eisentraut writes: > Peter> If the database is created with locale provider ICU, then > Peter> lc_collate does not apply here, > Having lc_collate return a value which is silently being ignored seems > to me rather hugely confusing. It's not *completel

Re: Order changes in PG16 since ICU introduction

2023-04-21 Thread Andrew Gierth
> "Peter" == Peter Eisentraut writes: Peter> If the database is created with locale provider ICU, then Peter> lc_collate does not apply here, Having lc_collate return a value which is silently being ignored seems to me rather hugely confusing. Also, somewhere along the line someone broke

Re: Order changes in PG16 since ICU introduction

2023-04-21 Thread Tom Lane
"Regina Obe" writes: > Okay got it was on IRC with RhodiumToad and he suggested: > CREATE DATABASE test2 TEMPLATE=template0 ENCODING = 'UTF8' LC_COLLATE = 'C' > LC_CTYPE = 'C' ICU_LOCALE='C'; > Which gives expected result: > SELECT '+' < '-' ; -- true > but gives me a notice: > NOTICE: usi

RE: Order changes in PG16 since ICU introduction

2023-04-21 Thread Regina Obe
> > CREATE DATABASE test TEMPLATE=template0 ENCODING = 'UTF8' > LC_COLLATE = 'C' > > LC_CTYPE = 'C'; > > As has been pointed out already, setting LC_COLLATE/LC_CTYPE is > meaningless when the locale provider is ICU. You need to look at what ICU > locale is being chosen, or force it with LOCALE =

  1   2   >