Re: Improve the performance of Unicode Normalization Forms.

2025-09-19 Thread Jeff Davis
On Thu, 2025-09-11 at 20:51 +0300, Alexander Borisov wrote: > > > Hey. > > > > I've looked into these patches. > > Hi Victor, > > Thank you for reviewing the patch and testing it! Heikki, do you have thoughts on this thread? Regards, Jeff Davis

Re: Improve the performance of Unicode Normalization Forms.

2025-09-11 Thread Alexander Borisov
Hey. I've looked into these patches. Hi Victor, Thank you for reviewing the patch and testing it! [..] Description of the Sparse Array approach is done in the newly introduced GenerateSparseArray.pm module.  Perhaps it'd be valuable to add a section into the src/common/unicode/README,

Re: Improve the performance of Unicode Normalization Forms.

2025-09-10 Thread Victor Yegorov
ср, 3 сент. 2025 г. в 09:35, Alexander Borisov : > Hi, Jeff, hackers! > > As promised, refactoring the C code for Unicode Normalization Forms. > > In general terms, here's what has changed: > 1. Recursion has been removed; now data is generated using > a Perl script. > 2. Memory is no longer

Re: Improve the performance of Unicode Normalization Forms.

2025-08-14 Thread Alexander Borisov
14.08.2025 01:12, Jeff Davis wrote: On Mon, 2025-08-11 at 17:21 +0300, Alexander Borisov wrote: [..] Comments on the patch itself: The 0001 patch generalizes the two-step lookup process: first navigate branches to find the index into a partially-compacted sparse array, and then use that to

Re: Improve the performance of Unicode Normalization Forms.

2025-08-13 Thread Jeff Davis
On Mon, 2025-08-11 at 17:21 +0300, Alexander Borisov wrote: > As a result, I would not look into ICU at the moment, especially > since > we have a different approach. > I am currently working on optimizing unicode_normalize(). > I am trying to come up with an improved version of the algorithm in C

Re: Improve the performance of Unicode Normalization Forms.

2025-08-11 Thread Alexander Borisov
09.08.2025 02:17, Jeff Davis пишет: On Tue, 2025-07-08 at 22:42 +0300, Alexander Borisov wrote: Version 3 patches. In version 2 "make -s headerscheck" did not work. I ran my own performance tests. What I did was get some test data from ICU v76.1 by doing: [..] Results with perfect hashing

Re: Improve the performance of Unicode Normalization Forms.

2025-08-01 Thread Alexander Borisov
01.08.2025 23:37, Tom Lane пишет: Alexander Borisov writes: I'm new here, so please advise me: if a patch wasn't accepted at the commitfest, does that mean it's not needed (no one was interested in it), or was there not enough time? It's kind of hard to tell really --- there are many patches

Re: Improve the performance of Unicode Normalization Forms.

2025-08-01 Thread Tom Lane
Alexander Borisov writes: > I'm new here, so please advise me: if a patch wasn't accepted at the > commitfest, does that mean it's not needed (no one was interested in > it), or was there not enough time? It's kind of hard to tell really --- there are many patches in our queue and not nearly enou

Re: Improve the performance of Unicode Normalization Forms.

2025-08-01 Thread David G. Johnston
On Friday, August 1, 2025, Alexander Borisov wrote: > > I looked and saw that patches are often transferred from commitfest to > commitfest. I understand that this is normal practice? > > What is the best course of action for me? > > If you feel the patch is committable it should remain in the no

Re: Improve the performance of Unicode Normalization Forms.

2025-08-01 Thread Alexander Borisov
Hi, I'm new here, so please advise me: if a patch wasn't accepted at the commitfest, does that mean it's not needed (no one was interested in it), or was there not enough time? Please tell me how this works for this. Should I move it to the next commitfest? I'm not quite sure what to do. I looke

Re: Improve the performance of Unicode Normalization Forms.

2025-06-30 Thread Jeff Davis
On Tue, 2025-06-24 at 18:20 +0300, Alexander Borisov wrote: > That's what we're aiming for - to implement the fastest approach. Awesome! Thank you for clarifying this as a goal. Having the fastest open-source Unicode normalization would be a great thing to highlight when this is done. Regards,

Re: Improve the performance of Unicode Normalization Forms.

2025-06-24 Thread Alexander Borisov
20.06.2025 20:20, Jeff Davis wrote: On Fri, 2025-06-20 at 17:51 +0300, Alexander Borisov wrote: I don't quite see how this compares to the implementation on Rust. In the link provided, they use perfect hash, which I get rid of and get a x2 boost. If you take ICU implementations in C++, I have al

Re: Improve the performance of Unicode Normalization Forms.

2025-06-20 Thread Nico Williams
On Fri, Jun 20, 2025 at 10:15:47AM -0700, Jeff Davis wrote: > On Fri, 2025-06-20 at 11:31 -0500, Nico Williams wrote: > > In the slow path you only normalize the _current character_, so you > > only need enough buffer space for that. > > That's a clear win for UTF8 data. Also, if there are no chan

Re: Improve the performance of Unicode Normalization Forms.

2025-06-20 Thread Jeff Davis
On Fri, 2025-06-20 at 17:51 +0300, Alexander Borisov wrote: > I don't quite see how this compares to the implementation on Rust. In > the link provided, they use perfect hash, which I get rid of and get > a x2 boost. > If you take ICU implementations in C++, I have always considered them > slow, at

Re: Improve the performance of Unicode Normalization Forms.

2025-06-20 Thread Jeff Davis
On Fri, 2025-06-20 at 11:31 -0500, Nico Williams wrote: > In the slow path you only > normalize the _current character_, so you only need enough buffer > space > for that. That's a clear win for UTF8 data. Also, if there are no changes, then you can just return the input buffer and not bother allo

Re: Improve the performance of Unicode Normalization Forms.

2025-06-20 Thread Nico Williams
On Thu, Jun 19, 2025 at 10:41:57AM -0700, Jeff Davis wrote: > In addition to the lookups themselves, there are other opportunities > for optimization as well, such as: > > * reducing the need for palloc and extra buffers, perhaps by using > buffers on the stack for small strings > > * operate mor

Re: Improve the performance of Unicode Normalization Forms.

2025-06-20 Thread Alexander Borisov
19.06.2025 20:41, Jeff Davis wrote: On Tue, 2025-06-03 at 00:51 +0300, Alexander Borisov wrote: As promised, I continue to improve/speed up Unicode in Postgres. Last time, we improved the lower(), upper(), and casefold() functions. [1] Now it's time for Unicode Normalization Forms, specifically

Re: Improve the performance of Unicode Normalization Forms.

2025-06-19 Thread Jeff Davis
On Tue, 2025-06-03 at 00:51 +0300, Alexander Borisov wrote: > As promised, I continue to improve/speed up Unicode in Postgres. > Last time, we improved the lower(), upper(), and casefold() > functions. [1] > Now it's time for Unicode Normalization Forms, specifically > the normalize() function. Di

Re: Improve the performance of Unicode Normalization Forms.

2025-06-12 Thread John Naylor
On Wed, Jun 11, 2025 at 7:27 PM Alexander Borisov wrote: > > 11.06.2025 10:13, John Naylor wrote: > > On Tue, Jun 3, 2025 at 1:51 PM Alexander Borisov > > wrote: > >> 5. The server part "lost weight" in the binary, but the frontend > >> "gained weight" a little. > >> > >> I read the old com

Re: Improve the performance of Unicode Normalization Forms.

2025-06-11 Thread Alexander Borisov
11.06.2025 10:13, John Naylor wrote: On Tue, Jun 3, 2025 at 1:51 PM Alexander Borisov wrote: 5. The server part "lost weight" in the binary, but the frontend "gained weight" a little. I read the old commits, which say that the size of the frontend is very important and that speed is not i

Re: Improve the performance of Unicode Normalization Forms.

2025-06-11 Thread John Naylor
On Tue, Jun 3, 2025 at 1:51 PM Alexander Borisov wrote: > 5. The server part "lost weight" in the binary, but the frontend > "gained weight" a little. > > I read the old commits, which say that the size of the frontend is very > important and that speed is not important > (speed is important o