Re: Improve the performance of Unicode Normalization Forms.

2025-08-14 Thread Alexander Borisov
14.08.2025 01:12, Jeff Davis wrote: On Mon, 2025-08-11 at 17:21 +0300, Alexander Borisov wrote: [..] Comments on the patch itself: The 0001 patch generalizes the two-step lookup process: first navigate branches to find the index into a partially-compacted sparse array, and then use that to

Re: Improve the performance of Unicode Normalization Forms.

2025-08-11 Thread Alexander Borisov
09.08.2025 02:17, Jeff Davis пишет: On Tue, 2025-07-08 at 22:42 +0300, Alexander Borisov wrote: Version 3 patches. In version 2 "make -s headerscheck" did not work. I ran my own performance tests. What I did was get some test data from ICU v76.1 by doing: [..] Results wi

Re: Improve the performance of Unicode Normalization Forms.

2025-08-01 Thread Alexander Borisov
01.08.2025 23:37, Tom Lane пишет: Alexander Borisov writes: I'm new here, so please advise me: if a patch wasn't accepted at the commitfest, does that mean it's not needed (no one was interested in it), or was there not enough time? It's kind of hard to tell really --- th

Re: Improve the performance of Unicode Normalization Forms.

2025-08-01 Thread Alexander Borisov
sure what to do. I looked and saw that patches are often transferred from commitfest to commitfest. I understand that this is normal practice? Please understand, it's not very transparent here, the approach is not obvious. What is the best course of action for me? Thanks! -- Regards, Alexander Borisov

Re: Improve the performance of Unicode Normalization Forms.

2025-06-24 Thread Alexander Borisov
20.06.2025 20:20, Jeff Davis wrote: On Fri, 2025-06-20 at 17:51 +0300, Alexander Borisov wrote: I don't quite see how this compares to the implementation on Rust. In the link provided, they use perfect hash, which I get rid of and get a x2 boost. If you take ICU implementations in C++, I

Re: Improve the performance of Unicode Normalization Forms.

2025-06-20 Thread Alexander Borisov
19.06.2025 20:41, Jeff Davis wrote: On Tue, 2025-06-03 at 00:51 +0300, Alexander Borisov wrote: As promised, I continue to improve/speed up Unicode in Postgres. Last time, we improved the lower(), upper(), and casefold() functions. [1] Now it's time for Unicode Normalization Forms, specifi

Re: Improve the performance of Unicode Normalization Forms.

2025-06-11 Thread Alexander Borisov
11.06.2025 10:13, John Naylor wrote: On Tue, Jun 3, 2025 at 1:51 PM Alexander Borisov wrote: 5. The server part "lost weight" in the binary, but the frontend "gained weight" a little. I read the old commits, which say that the size of the frontend is very important a

Re: PG 18 release notes draft committed

2025-05-05 Thread Alexander Borisov
e in this area. But again, I'm new to the Postgres community and I'm getting to know what's going on here and how it works. Thank you for paying attention to it! -- Regards, Alexander Borisov

Re: PG 18 release notes draft committed

2025-05-04 Thread Alexander Borisov
me from the commit message nor the skimming the original thread, whether the perf improvement numbers listed by Alexander also apply to lower() and upper(), or if they only apply to casefold(): On Sun, 4 May 2025 at 00:32, Alexander Borisov wrote: ASCII by ≈10% Cyrillic by ≈80% Unicode in general by

Re: PG 18 release notes draft committed

2025-05-03 Thread Alexander Borisov
u for clarifying! Users are not interested in performance gains. Then it's not worth considering. Sorry to interrupt. -- Regards, Alexander Borisov

Re: PG 18 release notes draft committed

2025-05-03 Thread Alexander Borisov
algorithms. Because of which the functions lower(), upper(), casefold() got a significant boost. -- Regards, Alexander Borisov

Re: PG 18 release notes draft committed

2025-05-03 Thread Alexander Borisov
d want to understand. Commit: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=27bdec06841d1bb004ca7627eac97808b08a7ac7 I am now actively working on a major improvement to Unicode Normalization Forms. Thanks! -- Regards, Alexander Borisov

Re: Optimization for lower(), upper(), casefold() functions.

2025-03-15 Thread Alexander Borisov
15.03.2025 23:07, Jeff Davis wrote: On Fri, 2025-03-14 at 15:00 +0300, Alexander Borisov wrote: I tried adding a loop to create tables, and everything looks fine (v7). [...] I prefer to generalize when we have the other code in place. As it was, it was a bit confusing why the extra

Re: Optimization for lower(), upper(), casefold() functions.

2025-03-12 Thread Alexander Borisov
12.03.2025 19:55, Alexander Borisov wrote: [...] A couple questions: * Is there a reason the fast-path for codepoints < 0x80 is in unicode_case.c rather than unicode_case_func.h? Yes, this is an important optimization, below are benchmarks that [...] I forgot to add the benchm

Re: Optimization for lower(), upper(), casefold() functions.

2025-03-02 Thread Alexander Borisov
19.02.2025 01:56, Jeff Davis пишет: On Wed, 2025-02-19 at 01:54 +0300, Alexander Borisov wrote: In proposing the patch for v3, I struck a balance between improving performance and reducing binary size, without sacrificing code clarity. Fair enough. I will continue reviewing v3. Did you have

Re: Optimization for lower(), upper(), casefold() functions.

2025-02-18 Thread Alexander Borisov
19.02.2025 01:02, Jeff Davis пишет: On Tue, 2025-02-11 at 23:08 +0300, Alexander Borisov wrote: I tried the approach via a range table. The result was worse than without the table. With branching in a function, the result is better. Patch v3 — ranges binary search by branches. Patch v4

Re: Optimization for lower(), upper(), casefold() functions.

2025-02-06 Thread Alexander Borisov
06.02.2025 22:08, Jeff Davis пишет: On Thu, 2025-02-06 at 18:39 +0300, Alexander Borisov wrote: Since I started to improve Unicode Case, I used the same approach, essentially a binary search, only not by individual values, but by ranges. I considered it a 4th approach because of the generated

Re: Optimization for lower(), upper(), casefold() functions.

2025-02-06 Thread Alexander Borisov
Hi Jeff, 06.02.2025 00:46, Jeff Davis пишет: On Tue, 2025-02-04 at 23:19 +0300, Alexander Borisov wrote: I've done many different experiments and everywhere the result is within the margin of the v2 patch result. Great, thank you for working on this! There doesn't appear to be

Re: Optimization for lower(), upper(), casefold() functions.

2025-01-31 Thread Alexander Borisov
by uint8*n. Thanks, after the weekend I'll send an updated patch that takes into account the comments/advice. -- SberTech Alexander Borisov

Re: Optimization for lower(), upper(), casefold() functions.

2025-01-29 Thread Alexander Borisov
Sorry, I made a mistake in the code. It's not worth watching this patch yet. 29.01.2025 23:23, Alexander Borisov пишет: Hi, hackers! I propose to consider a simple optimization for Unicode case tables. The main changes affect the generate-unicode_case_table.pl file. Because of the mod

Re: Proposal to add a new URL data type.

2024-12-11 Thread Alexander Borisov
10.12.2024 13:59, Victor Yegorov пишет: чт, 5 дек. 2024 г. в 17:02, Alexander Borisov <mailto:lex.bori...@gmail.com>>: [..] Hey, I had a look at this patch and found its functionality mature and performant. As Peter mentioned pguri, I used it to compare with the proposed

Re: Proposal to add a new URL data type.

2024-12-09 Thread Alexander Borisov
06.12.2024 21:04, Matthias van de Meent: On Thu, 5 Dec 2024 at 15:02, Alexander Borisov wrote: [..] I'd be extremely annoyed if URLs I wrote into the database didn't return in identical manner when fetched from the database. See also how numeric has different representations o

Re: Proposal to add a new URL data type.

2024-12-06 Thread Alexander Borisov
Hi Daniel, 06.12.2024 16:46, Daniel Gustafsson пишет: On 6 Dec 2024, at 13:59, Alexander Borisov wrote: As I've written before, there is a difference between parsing URLs according to the RFC 3986 specification and WHATWG URLs. This is especially true for host. Here are a couple

Re: Proposal to add a new URL data type.

2024-12-06 Thread Alexander Borisov
05.12.2024 17:59, Peter Eisentraut пишет: On 05.12.24 15:01, Alexander Borisov wrote: Postgres users often store URLs in the database.  As an example, they provide links to their pages on the web, analyze users posts and get links for further storage and analysis.  Naturally, there is a need to