Re: speed up unicode decomposition and recomposition

2020-11-06 Thread Michael Paquier
On Sat, Nov 07, 2020 at 09:29:30AM +0900, Michael Paquier wrote: > Thanks John. Both look right to me. I'll apply both in a bit. Done that now. Just for the note: you forgot to run pgperltidy. -- Michael signature.asc Description: PGP signature

Re: speed up unicode decomposition and recomposition

2020-11-06 Thread Michael Paquier
On Fri, Nov 06, 2020 at 06:20:00PM -0400, John Naylor wrote: > There is a latent bug in the way code pairs for recomposition are sorted > due to a copy-pasto on my part. Makes no difference now, but it could in > the future. While looking, it seems pg_bswap.h should actually be > backend-only. Bot

Re: speed up unicode decomposition and recomposition

2020-11-06 Thread John Naylor
There is a latent bug in the way code pairs for recomposition are sorted due to a copy-pasto on my part. Makes no difference now, but it could in the future. While looking, it seems pg_bswap.h should actually be backend-only. Both fixed in the attached. -- John Naylor EnterpriseDB: http://www.ent

Re: speed up unicode decomposition and recomposition

2020-10-23 Thread Michael Paquier
On Fri, Oct 23, 2020 at 08:24:06PM -0400, Tom Lane wrote: > I'd advise not putting conv_compare() between get_code_entry() and > the header comment for get_code_entry(). Looks good otherwise. Indeed. I have adjusted the position of the comment, and applied the fix. Thanks for the report. -- Mic

Re: speed up unicode decomposition and recomposition

2020-10-23 Thread Tom Lane
Michael Paquier writes: > On Fri, Oct 23, 2020 at 04:18:13PM -0700, Mark Dilger wrote: >> On Oct 23, 2020, at 9:07 AM, Tom Lane wrote: >>> genhtml: WARNING: function data mismatch at >>> /home/postgres/pgsql/src/common/unicode_norm.c:102 > I can see the problem on Debian GID with lcov 1.14-2.

Re: speed up unicode decomposition and recomposition

2020-10-23 Thread Michael Paquier
On Fri, Oct 23, 2020 at 04:18:13PM -0700, Mark Dilger wrote: > On Oct 23, 2020, at 9:07 AM, Tom Lane wrote: >> genhtml: WARNING: function data mismatch at >> /home/postgres/pgsql/src/common/unicode_norm.c:102 >> >> I've never seen anything like that before. I suppose it means that >> something

Re: speed up unicode decomposition and recomposition

2020-10-23 Thread Mark Dilger
> On Oct 23, 2020, at 9:07 AM, Tom Lane wrote: > > I chanced to do an --enable-coverage test run today, and I got this > weird message during "make coverage-html": > > genhtml: WARNING: function data mismatch at > /home/postgres/pgsql/src/common/unicode_norm.c:102 > > I've never seen anythi

Re: speed up unicode decomposition and recomposition

2020-10-23 Thread Tom Lane
I chanced to do an --enable-coverage test run today, and I got this weird message during "make coverage-html": genhtml: WARNING: function data mismatch at /home/postgres/pgsql/src/common/unicode_norm.c:102 I've never seen anything like that before. I suppose it means that something about 783f0c

Re: speed up unicode decomposition and recomposition

2020-10-23 Thread John Naylor
On Thu, Oct 22, 2020 at 10:11 PM Michael Paquier wrote: > On Thu, Oct 22, 2020 at 05:50:52AM -0400, John Naylor wrote: > > Looks good to me. > > Thanks. Committed, then. Great work! > Thank you! -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company

Re: speed up unicode decomposition and recomposition

2020-10-22 Thread Michael Paquier
On Thu, Oct 22, 2020 at 05:50:52AM -0400, John Naylor wrote: > Looks good to me. Thanks. Committed, then. Great work! -- Michael signature.asc Description: PGP signature

Re: speed up unicode decomposition and recomposition

2020-10-22 Thread John Naylor
On Thu, Oct 22, 2020 at 12:34 AM Michael Paquier wrote: > Thanks for the updated version, that was fast. I have found a couple > of places that needed to be adjusted, like the comment at the top of > generate-unicode_norm_table.pl or some comments, an incorrect include > in the new headers and t

Re: speed up unicode decomposition and recomposition

2020-10-20 Thread Michael Paquier
On Tue, Oct 20, 2020 at 08:03:12AM -0400, John Naylor wrote: > I've confirmed that. How about a new header unicode_norm_hashfunc.h which > would include unicode_norm_table.h at the top. In unicode.c, we can include > one of these depending on frontend or backend. Sounds good to me. Looking at the

Re: speed up unicode decomposition and recomposition

2020-10-20 Thread John Naylor
On Tue, Oct 20, 2020 at 3:22 AM Michael Paquier wrote: > On Mon, Oct 19, 2020 at 10:34:33AM -0400, John Naylor wrote: > > I don't see any difference on gcc/Linux in those two files, nor in > > unicode_norm_shlib.o -- I do see a difference in unicode_norm_srv.o as > > expected. Could it depend on

Re: speed up unicode decomposition and recomposition

2020-10-20 Thread Michael Paquier
On Mon, Oct 19, 2020 at 10:34:33AM -0400, John Naylor wrote: > I don't see any difference on gcc/Linux in those two files, nor in > unicode_norm_shlib.o -- I do see a difference in unicode_norm_srv.o as > expected. Could it depend on the compiler? Hmm. My guess is that you don't have --enable-deb

Re: speed up unicode decomposition and recomposition

2020-10-19 Thread John Naylor
On Fri, Oct 16, 2020 at 2:08 PM Daniel Verite wrote: > John Naylor wrote: > > > I'd be curious how it compares to ICU now > > I've made another run of the test in [1] with your v2 patches > from this thread against icu_ext built with ICU-67.1. > The results show the times in milliseconds

Re: speed up unicode decomposition and recomposition

2020-10-19 Thread John Naylor
On Thu, Oct 15, 2020 at 11:32 PM Michael Paquier wrote: > > The binary sizes of libpgcommon_shlib.a and libpgcommon.a change > because Decomp_hash_func() gets included, impacting libpq. > I don't see any difference on gcc/Linux in those two files, nor in unicode_norm_shlib.o -- I do see a differ

Re: speed up unicode decomposition and recomposition

2020-10-16 Thread Daniel Verite
John Naylor wrote: > I'd be curious how it compares to ICU now I've made another run of the test in [1] with your v2 patches from this thread against icu_ext built with ICU-67.1. The results show the times in milliseconds to process about 10 million short strings: operation | unpatched

Re: speed up unicode decomposition and recomposition

2020-10-15 Thread Michael Paquier
On Thu, Oct 15, 2020 at 01:59:38PM -0400, John Naylor wrote: > I think I've seen a trie recommended somewhere, maybe the official website. > That said, I was able to get the hash working for recomposition (split into > a separate patch, and both of them now leave frontend alone), and I'm > pleased

Re: speed up unicode decomposition and recomposition

2020-10-14 Thread Kyotaro Horiguchi
At Wed, 14 Oct 2020 23:06:28 -0400, Tom Lane wrote in > John Naylor writes: > > With those points in mind and thinking more broadly, I'd like to try harder > > on recomposition. Even several times faster, recomposition is still orders > > of magnitude slower than ICU, as measured by Daniel Verit

Re: speed up unicode decomposition and recomposition

2020-10-14 Thread Tom Lane
John Naylor writes: > With those points in mind and thinking more broadly, I'd like to try harder > on recomposition. Even several times faster, recomposition is still orders > of magnitude slower than ICU, as measured by Daniel Verite [1]. Huh. Has anyone looked into how they do it?

Re: speed up unicode decomposition and recomposition

2020-10-14 Thread John Naylor
On Wed, Oct 14, 2020 at 8:25 PM Michael Paquier wrote: > On Wed, Oct 14, 2020 at 01:06:40PM -0400, Tom Lane wrote: > > IIUC, the only place libpq uses this is to process a password-sized > string > > or two during connection establishment. It seems quite silly to add > > 26kB in order to make th

Re: speed up unicode decomposition and recomposition

2020-10-14 Thread Michael Paquier
On Wed, Oct 14, 2020 at 01:06:40PM -0400, Tom Lane wrote: > John Naylor writes: >> Some other considerations: >> - As I alluded above, this adds ~26kB to libpq because of SASLPrep. Since >> the decomp array was reordered to optimize linear search, it can no longer >> be used for binary search. It

Re: speed up unicode decomposition and recomposition

2020-10-14 Thread Tom Lane
John Naylor writes: > Some other considerations: > - As I alluded above, this adds ~26kB to libpq because of SASLPrep. Since > the decomp array was reordered to optimize linear search, it can no longer > be used for binary search. It's possible to build two arrays, one for > frontend and one for b