On Wed, Mar 27, 2024 at 04:37:35PM -0500, Nathan Bossart wrote:
> On Wed, Mar 27, 2024 at 05:10:13PM -0400, Tom Lane wrote:
>> LGTM otherwise, and I like the fact that the #if structure
>> gets a lot less messy.
>
> Thanks for reviewing. I've attached a v2 that I intend to commit when I
> get a c
On Wed, Mar 27, 2024 at 05:10:13PM -0400, Tom Lane wrote:
> Shouldn't "i" be declared uint32, since nelem is?
Yes, that's a mistake.
> BTW, I wonder why these functions don't declare their array
> arguments like "const uint32 *base".
They probably should. I don't see any reason not to, and my c
Nathan Bossart writes:
> Here's what I had in mind. My usual benchmark seems to indicate that this
> shouldn't impact performance.
Shouldn't "i" be declared uint32, since nelem is?
BTW, I wonder why these functions don't declare their array
arguments like "const uint32 *base".
LGTM otherwise,
On Tue, Mar 26, 2024 at 09:48:57PM -0400, Tom Lane wrote:
> Nathan Bossart writes:
>> I just did the minimal fix for now, i.e., I moved the new label into the
>> SIMD section of the function. I think it would be better stylistically to
>> move the one-by-one logic to an inline helper function, bu
Nathan Bossart writes:
> On Tue, Mar 26, 2024 at 06:55:54PM -0500, Nathan Bossart wrote:
>> On Tue, Mar 26, 2024 at 07:28:24PM -0400, Tom Lane wrote:
>>> A significant fraction of the buildfarm is issuing warnings about
>>> this.
> Done. I'll keep an eye on the farm.
Thanks.
> I just did the m
On Tue, Mar 26, 2024 at 06:55:54PM -0500, Nathan Bossart wrote:
> On Tue, Mar 26, 2024 at 07:28:24PM -0400, Tom Lane wrote:
>> A significant fraction of the buildfarm is issuing warnings about
>> this.
>
> Thanks for the heads-up. Will fix.
Done. I'll keep an eye on the farm.
I just did the mi
On Tue, Mar 26, 2024 at 07:28:24PM -0400, Tom Lane wrote:
> Nathan Bossart writes:
>> I've committed v9, and I've marked the commitfest entry as "Committed,"
>> although we may want to revisit AVX2, etc. in the future.
>
> A significant fraction of the buildfarm is issuing warnings about
> this.
Nathan Bossart writes:
> I've committed v9, and I've marked the commitfest entry as "Committed,"
> although we may want to revisit AVX2, etc. in the future.
A significant fraction of the buildfarm is issuing warnings about
this.
adder | 2024-03-26 21:04:33 |
../pgsql/src/include/port/p
I've committed v9, and I've marked the commitfest entry as "Committed,"
although we may want to revisit AVX2, etc. in the future.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Here is what I have staged for commit. One notable difference in this
version of the patch is that I've changed
+ if (nelem <= nelem_per_iteration)
+ goto one_by_one;
to
+ if (nelem < nelem_per_iteration)
+ goto one_by_one;
I realized that there's no rea
On Mon, Mar 25, 2024 at 10:03:27AM +0700, John Naylor wrote:
> Seems pretty good. It'd be good to see the results of 2- vs.
> 4-register before committing, because that might lead to some
> restructuring, but maybe it won't, and v8 is already an improvement
> over HEAD.
I tested this the other day
On Fri, Mar 22, 2024 at 12:09 AM Nathan Bossart
wrote:
>
> On Thu, Mar 21, 2024 at 11:30:30AM +0700, John Naylor wrote:
> > If this were "<=" then the for long arrays we could assume there is
> > always more than one block, and wouldn't need to check if any elements
> > remain -- first block, the
On Sun, Mar 24, 2024 at 03:53:17PM -0500, Nathan Bossart wrote:
> Here's a new version of 0001 with some added #ifdefs that cfbot revealed
> were missing.
Sorry for the noise. cfbot revealed another silly mistake (forgetting to
reset the "i" variable in the assertion path). That should be fixed
Here's a new version of 0001 with some added #ifdefs that cfbot revealed
were missing.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From cc2bc5ca5b49cd8641af8b2231a34a1aa5091bb9 Mon Sep 17 00:00:00 2001
From: Nathan Bossart
Date: Wed, 20 Mar 2024 14:20:24 -0500
Subject: [PATCH
On Thu, Mar 21, 2024 at 12:09:44PM -0500, Nathan Bossart wrote:
> On Thu, Mar 21, 2024 at 11:30:30AM +0700, John Naylor wrote:
>> Further, now that the algorithm is more SIMD-appropriate, I wonder
>> what doing 4 registers at a time is actually buying us for either SSE2
>> or AVX2. It might just be
On Thu, Mar 21, 2024 at 12:09:44PM -0500, Nathan Bossart wrote:
> It does still eventually win, although not nearly to the same extent as
> before. I extended the benchmark a bit to show this. I wouldn't be
> devastated if we only got 0001 committed for v17, given these results.
(In case it isn'
On Thu, Mar 21, 2024 at 11:30:30AM +0700, John Naylor wrote:
> I'm much happier about v5-0001. With a small tweak it would match what
> I had in mind:
>
> + if (nelem < nelem_per_iteration)
> + goto one_by_one;
>
> If this were "<=" then the for long arrays we could assume there is
> always more
On Thu, Mar 21, 2024 at 2:55 AM Nathan Bossart wrote:
>
> On Wed, Mar 20, 2024 at 09:31:16AM -0500, Nathan Bossart wrote:
> > I don't mind removing the 2-register stuff if that's what you think we
> > should do. I'm cautiously optimistic that it'd help more than the extra
> > branch prediction m
On Wed, Mar 20, 2024 at 09:31:16AM -0500, Nathan Bossart wrote:
> On Wed, Mar 20, 2024 at 01:57:54PM +0700, John Naylor wrote:
>> On Tue, Mar 19, 2024 at 11:30 PM Nathan Bossart
>> wrote:
>>> I tried to trim some of the branches, and came up with the attached patch.
>>> I don't think this is exact
On Wed, Mar 20, 2024 at 01:57:54PM +0700, John Naylor wrote:
> On Tue, Mar 19, 2024 at 11:30 PM Nathan Bossart
> wrote:
>> I tried to trim some of the branches, and came up with the attached patch.
>> I don't think this is exactly what you were suggesting, but I think it's
>> relatively close. My
On Tue, Mar 19, 2024 at 11:30 PM Nathan Bossart
wrote:
> > Sounds similar in principle, but it looks really complicated. I don't
> > think the additional loops and branches are a good way to go, either
> > for readability or for branch prediction. My sketch has one branch for
> > which loop to do,
On Tue, Mar 19, 2024 at 04:53:04PM +0700, John Naylor wrote:
> On Tue, Mar 19, 2024 at 10:16 AM Nathan Bossart
> wrote:
>> 0002 does the opposite of this. That is, after we've completed as many
>> blocks as possible, we move the iterator variable back to "end -
>> block_size" and do one final ite
On Tue, Mar 19, 2024 at 10:16 AM Nathan Bossart
wrote:
>
> On Tue, Mar 19, 2024 at 10:03:36AM +0700, John Naylor wrote:
> > I took a brief look, and 0001 isn't quite what I had in mind. I can't
> > quite tell what it's doing with the additional branches and "goto
> > retry", but I meant something
On Tue, Mar 19, 2024 at 10:03:36AM +0700, John Naylor wrote:
> I took a brief look, and 0001 isn't quite what I had in mind. I can't
> quite tell what it's doing with the additional branches and "goto
> retry", but I meant something pretty simple:
Do you mean 0002? 0001 just adds a 2-register loo
On Tue, Mar 19, 2024 at 9:03 AM Nathan Bossart wrote:
>
> On Sun, Mar 17, 2024 at 09:47:33AM +0700, John Naylor wrote:
> > I haven't looked at the patches, but the graphs look good.
>
> I spent some more time on these patches. Specifically, I reordered them to
> demonstrate the effects on systems
On Sun, Mar 17, 2024 at 09:47:33AM +0700, John Naylor wrote:
> I haven't looked at the patches, but the graphs look good.
I spent some more time on these patches. Specifically, I reordered them to
demonstrate the effects on systems without AVX2 support. I've also added a
shortcut to jump to the
On Sat, Mar 16, 2024 at 2:40 AM Nathan Bossart wrote:
>
> On Fri, Mar 15, 2024 at 12:41:49PM -0500, Nathan Bossart wrote:
> > I've also attached the results of running this benchmark on my machine at
> > HEAD, after applying 0001, and after applying both 0001 and 0002. 0001
> > appears to work pr
On Fri, Mar 15, 2024 at 12:41:49PM -0500, Nathan Bossart wrote:
> I've also attached the results of running this benchmark on my machine at
> HEAD, after applying 0001, and after applying both 0001 and 0002. 0001
> appears to work pretty well. When there is a small "tail," it regresses a
> small
On Wed, Jan 10, 2024 at 09:06:08AM +0700, John Naylor wrote:
> If we have say 25 elements, I mean (for SSE2) check the first 16, then
> the last 16. Some will be checked twice, but that's okay.
I finally got around to trying this. 0001 adds this overlapping logic.
0002 is a rebased version of the
On Tue, Jan 9, 2024 at 11:20 PM Nathan Bossart wrote:
>
> On Tue, Jan 09, 2024 at 09:20:09AM +0700, John Naylor wrote:
> > On Tue, Jan 9, 2024 at 12:37 AM Nathan Bossart
> > wrote:
> >>
> >> > I suspect that there could be a regression lurking for some inputs
> >> > that the benchmark doesn't lo
On Tue, 9 Jan 2024 at 18:20, Nathan Bossart wrote:
>
> On Tue, Jan 09, 2024 at 09:20:09AM +0700, John Naylor wrote:
> > On Tue, Jan 9, 2024 at 12:37 AM Nathan Bossart
> > wrote:
> >>
> >> > I suspect that there could be a regression lurking for some inputs
> >> > that the benchmark doesn't look
On Tue, 9 Jan 2024 at 16:03, Peter Eisentraut wrote:
> On 29.11.23 18:15, Nathan Bossart wrote:
> > Using the same benchmark as we did for the SSE2 linear searches in
> > XidInMVCCSnapshot() (commit 37a6e5d) [1] [2], I see the following:
> >
> >writerssse2avx2 %
> >2561
On Tue, Jan 09, 2024 at 09:20:09AM +0700, John Naylor wrote:
> On Tue, Jan 9, 2024 at 12:37 AM Nathan Bossart
> wrote:
>>
>> > I suspect that there could be a regression lurking for some inputs
>> > that the benchmark doesn't look at: pg_lfind32() currently needs to be
>> > able to read 4 vector
On 29.11.23 18:15, Nathan Bossart wrote:
Using the same benchmark as we did for the SSE2 linear searches in
XidInMVCCSnapshot() (commit 37a6e5d) [1] [2], I see the following:
writerssse2avx2 %
25611951188-1
512 9281054 +14
1024 633
On Tue, Jan 9, 2024 at 12:37 AM Nathan Bossart wrote:
>
> > I suspect that there could be a regression lurking for some inputs
> > that the benchmark doesn't look at: pg_lfind32() currently needs to be
> > able to read 4 vector registers worth of elements before taking the
> > fast path. There is
On Mon, Jan 08, 2024 at 02:01:39PM +0700, John Naylor wrote:
> On Thu, Nov 30, 2023 at 12:15 AM Nathan Bossart
> wrote:
>> writerssse2avx2 %
>> 25611951188-1
>> 512 9281054 +14
>> 1024 633 716 +13
>> 2048 332 420 +27
>>
On Sat, Jan 6, 2024 at 12:04 AM Nathan Bossart wrote:
> I've been thinking about the configuration option approach. ISTM that
> would be the most feasible strategy, at least for v17. A couple things
> come to mind:
>
> * This option would simply map to existing compiler flags. We already have
On Thu, Nov 30, 2023 at 12:15 AM Nathan Bossart
wrote:
> Using the same benchmark as we did for the SSE2 linear searches in
> XidInMVCCSnapshot() (commit 37a6e5d) [1] [2], I see the following:
I've been antagonistic towards the patch itself, but it'd be more
productive if I paid some nuanced att
On Fri, Jan 05, 2024 at 09:03:39AM +0700, John Naylor wrote:
> On Wed, Jan 3, 2024 at 10:29 PM Nathan Bossart
> wrote:
>> If the requirement is that normal builds use AVX2, then I fear we will be
>> waiting a long time. IIUC the current proposals (building multiple
>> binaries or adding a config
On Wed, Jan 3, 2024 at 10:29 PM Nathan Bossart wrote:
> If the requirement is that normal builds use AVX2, then I fear we will be
> waiting a long time. IIUC the current proposals (building multiple
> binaries or adding a configuration option that maps to compiler flags)
> would still be opt-in,
On Tue, Jan 02, 2024 at 10:11:23AM -0600, Nathan Bossart wrote:
> (In case it isn't clear, I'm volunteering to set up such a buildfarm
> machine.)
I set up "akepa" to run with -march=x86-64-v3.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
On Wed, Jan 03, 2024 at 09:13:52PM +0700, John Naylor wrote:
> On Tue, Jan 2, 2024 at 11:11 PM Nathan Bossart
> wrote:
>> I'm tempted to propose that we move forward with this patch as-is after
>> adding a buildfarm machine that compiles with -mavx2 or -march=x86-64-v3.
>
> That means that we wo
On Tue, Jan 2, 2024 at 11:11 PM Nathan Bossart wrote:
>
> Perhaps I was too optimistic about adding support for newer instructions...
>
> I'm tempted to propose that we move forward with this patch as-is after
> adding a buildfarm machine that compiles with -mavx2 or -march=x86-64-v3.
That means
On Tue, Jan 02, 2024 at 12:50:04PM -0500, Tom Lane wrote:
> The patch needs better comments (as in, more than "none whatsoever").
Yes, will do.
> Also, do you really want to structure the header so that USE_SSE2
> doesn't get defined? In that case you are committing to provide
> an AVX2 replacem
Nathan Bossart writes:
> I'm tempted to propose that we move forward with this patch as-is after
> adding a buildfarm machine that compiles with -mavx2 or -march=x86-64-v3.
> There is likely still follow-up work to make these improvements more
> accessible, but I'm not sure that is a strict prereq
On Mon, Jan 01, 2024 at 07:12:26PM +0700, John Naylor wrote:
> On Thu, Nov 30, 2023 at 12:15 AM Nathan Bossart
> wrote:
>> I don't intend for this patch to be
>> seriously considered until we have better support for detecting/compiling
>> AVX2 instructions and a buildfarm machine that uses them.
>
On Thu, Nov 30, 2023 at 12:15 AM Nathan Bossart
wrote:
> I don't intend for this patch to be
> seriously considered until we have better support for detecting/compiling
> AVX2 instructions and a buildfarm machine that uses them.
That's completely understandable, yet I'm confused why there is a
co
On Wed, Nov 22, 2023 at 12:49:35PM -0600, Nathan Bossart wrote:
> On Wed, Nov 22, 2023 at 02:54:13PM +0200, Ants Aasma wrote:
>> For reference, executing the page checksum 10M times on a AMD 3900X CPU:
>>
>> clang-14 -O2 4.292s (17.8 GiB/s)
>> clang-14 -O2 -msse4.12.859s (2
48 matches
Mail list logo