On Thu, 2025-09-11 at 20:51 +0300, Alexander Borisov wrote:
>
> > Hey.
> >
> > I've looked into these patches.
>
> Hi Victor,
>
> Thank you for reviewing the patch and testing it!
Heikki, do you have thoughts on this thread?
Regards,
Jeff Davis
Hey.
I've looked into these patches.
Hi Victor,
Thank you for reviewing the patch and testing it!
[..]
Description of the Sparse Array approach is done in the newly introduced
GenerateSparseArray.pm module. Perhaps it'd be valuable to add a
section into
the src/common/unicode/README,
ср, 3 сент. 2025 г. в 09:35, Alexander Borisov :
> Hi, Jeff, hackers!
>
> As promised, refactoring the C code for Unicode Normalization Forms.
>
> In general terms, here's what has changed:
> 1. Recursion has been removed; now data is generated using
> a Perl script.
> 2. Memory is no longer
14.08.2025 01:12, Jeff Davis wrote:
On Mon, 2025-08-11 at 17:21 +0300, Alexander Borisov wrote:
[..]
Comments on the patch itself:
The 0001 patch generalizes the two-step lookup process: first navigate
branches to find the index into a partially-compacted sparse array, and
then use that to
On Mon, 2025-08-11 at 17:21 +0300, Alexander Borisov wrote:
> As a result, I would not look into ICU at the moment, especially
> since
> we have a different approach.
> I am currently working on optimizing unicode_normalize().
> I am trying to come up with an improved version of the algorithm in C
09.08.2025 02:17, Jeff Davis пишет:
On Tue, 2025-07-08 at 22:42 +0300, Alexander Borisov wrote:
Version 3 patches. In version 2 "make -s headerscheck" did not work.
I ran my own performance tests. What I did was get some test data from
ICU v76.1 by doing:
[..]
Results with perfect hashing
01.08.2025 23:37, Tom Lane пишет:
Alexander Borisov writes:
I'm new here, so please advise me: if a patch wasn't accepted at the
commitfest, does that mean it's not needed (no one was interested in
it), or was there not enough time?
It's kind of hard to tell really --- there are many patches
Alexander Borisov writes:
> I'm new here, so please advise me: if a patch wasn't accepted at the
> commitfest, does that mean it's not needed (no one was interested in
> it), or was there not enough time?
It's kind of hard to tell really --- there are many patches in our
queue and not nearly enou
On Friday, August 1, 2025, Alexander Borisov wrote:
>
> I looked and saw that patches are often transferred from commitfest to
> commitfest. I understand that this is normal practice?
>
> What is the best course of action for me?
>
>
If you feel the patch is committable it should remain in the no
Hi,
I'm new here, so please advise me: if a patch wasn't accepted at the
commitfest, does that mean it's not needed (no one was interested in
it), or was there not enough time?
Please tell me how this works for this.
Should I move it to the next commitfest? I'm not quite sure what to do.
I looke
On Tue, 2025-06-24 at 18:20 +0300, Alexander Borisov wrote:
> That's what we're aiming for - to implement the fastest approach.
Awesome! Thank you for clarifying this as a goal. Having the fastest
open-source Unicode normalization would be a great thing to highlight
when this is done.
Regards,
20.06.2025 20:20, Jeff Davis wrote:
On Fri, 2025-06-20 at 17:51 +0300, Alexander Borisov wrote:
I don't quite see how this compares to the implementation on Rust. In
the link provided, they use perfect hash, which I get rid of and get
a x2 boost.
If you take ICU implementations in C++, I have al
On Fri, Jun 20, 2025 at 10:15:47AM -0700, Jeff Davis wrote:
> On Fri, 2025-06-20 at 11:31 -0500, Nico Williams wrote:
> > In the slow path you only normalize the _current character_, so you
> > only need enough buffer space for that.
>
> That's a clear win for UTF8 data. Also, if there are no chan
On Fri, 2025-06-20 at 17:51 +0300, Alexander Borisov wrote:
> I don't quite see how this compares to the implementation on Rust. In
> the link provided, they use perfect hash, which I get rid of and get
> a x2 boost.
> If you take ICU implementations in C++, I have always considered them
> slow, at
On Fri, 2025-06-20 at 11:31 -0500, Nico Williams wrote:
> In the slow path you only
> normalize the _current character_, so you only need enough buffer
> space
> for that.
That's a clear win for UTF8 data. Also, if there are no changes, then
you can just return the input buffer and not bother allo
On Thu, Jun 19, 2025 at 10:41:57AM -0700, Jeff Davis wrote:
> In addition to the lookups themselves, there are other opportunities
> for optimization as well, such as:
>
> * reducing the need for palloc and extra buffers, perhaps by using
> buffers on the stack for small strings
>
> * operate mor
19.06.2025 20:41, Jeff Davis wrote:
On Tue, 2025-06-03 at 00:51 +0300, Alexander Borisov wrote:
As promised, I continue to improve/speed up Unicode in Postgres.
Last time, we improved the lower(), upper(), and casefold()
functions. [1]
Now it's time for Unicode Normalization Forms, specifically
On Tue, 2025-06-03 at 00:51 +0300, Alexander Borisov wrote:
> As promised, I continue to improve/speed up Unicode in Postgres.
> Last time, we improved the lower(), upper(), and casefold()
> functions. [1]
> Now it's time for Unicode Normalization Forms, specifically
> the normalize() function.
Di
On Wed, Jun 11, 2025 at 7:27 PM Alexander Borisov wrote:
>
> 11.06.2025 10:13, John Naylor wrote:
> > On Tue, Jun 3, 2025 at 1:51 PM Alexander Borisov
> > wrote:
> >> 5. The server part "lost weight" in the binary, but the frontend
> >> "gained weight" a little.
> >>
> >> I read the old com
11.06.2025 10:13, John Naylor wrote:
On Tue, Jun 3, 2025 at 1:51 PM Alexander Borisov wrote:
5. The server part "lost weight" in the binary, but the frontend
"gained weight" a little.
I read the old commits, which say that the size of the frontend is very
important and that speed is not i
On Tue, Jun 3, 2025 at 1:51 PM Alexander Borisov wrote:
> 5. The server part "lost weight" in the binary, but the frontend
> "gained weight" a little.
>
> I read the old commits, which say that the size of the frontend is very
> important and that speed is not important
> (speed is important o
21 matches
Mail list logo