Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-05-01 Thread Amonson, Paul D
Hi, Comparing the current SSE4.2 implementation of the CRC32C algorithm in Postgres, to an optimized AVX-512 algorithm [0] we observed significant gains. The result was a ~6.6X average multiplier of increased performance measured on 3 different Intel products. Details below. The AVX-512 algorit

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-05-17 Thread Amonson, Paul D
Proposal for Updating CRC32C with AVX-512 Algorithm. > > Hi, > > Comparing the current SSE4.2 implementation of the CRC32C algorithm in > Postgres, to an optimized AVX-512 algorithm [0] we observed significant > gains. The result was a ~6.6X average multiplier of increased perf

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-05-20 Thread Daniel Gustafsson
> On 17 May 2024, at 18:21, Amonson, Paul D wrote: > Hi, forgive the top-post but I have not seen any response to this post? The project is currently in feature-freeze in preparation for the next major release so new development and ideas are not the top priority right now. Additionally there is

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-12 Thread Amonson, Paul D
> The project is currently in feature-freeze in preparation for the next major > release so new development and ideas are not the top priority right now. > Additionally there is a large developer meeting shortly which many are busy > preparing for. Excercise some patience, and I'm sure there will

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-12 Thread Tom Lane
"Amonson, Paul D" writes: > I had our OSS internal team, who are experts in OSS licensing, review > possible conflicts between the PostgreSQL license and the BSD-Clause 3-like > license for the CRC32C AVX-512 code, and they found no issues. Therefore, > including the new license into the Postgr

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-12 Thread Bruce Momjian
On Wed, Jun 12, 2024 at 02:08:02PM -0400, Tom Lane wrote: > "Amonson, Paul D" writes: > > I had our OSS internal team, who are experts in OSS licensing, review > > possible conflicts between the PostgreSQL license and the BSD-Clause 3-like > > license for the CRC32C AVX-512 code, and they found

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-12 Thread Andres Freund
Hi, I'm wonder if this isn't going in the wrong direction. We're using CRCs for something they're not well suited for in my understanding - and are paying a reasonably high price for it, given that even hardware accelerated CRCs aren't blazingly fast. CRCs are used for things like ethernet, iSCSI

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-12 Thread Andres Freund
Hi, On 2024-05-01 15:56:08 +, Amonson, Paul D wrote: > Comparing the current SSE4.2 implementation of the CRC32C algorithm in > Postgres, to an optimized AVX-512 algorithm [0] we observed significant > gains. The result was a ~6.6X average multiplier of increased performance > measured on 3 di

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-12 Thread Amonson, Paul D
> -Original Message- > From: Andres Freund > Sent: Wednesday, June 12, 2024 1:12 PM > To: Amonson, Paul D > FWIW, I tried the v2 patch on my Xeon Gold 5215 workstation, and dies early > on with SIGILL: Nice catch!!! I was testing the bit for the vpclmulqdq in EBX instead of the corre

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-17 Thread Amonson, Paul D
> This is extremely workload dependent, it's not hard to find workloads with > lots of very small record and very few big ones... What you observed might > have "just" been the warmup behaviour where more full page writes have to > be written. Can you tell me how to avoid capturing this "warm-up"

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-18 Thread Alvaro Herrera
On 2024-Jun-12, Amonson, Paul D wrote: > +/*- > + * > + * pg_crc32c_avx512.c > + * Compute CRC-32C checksum using Intel AVX-512 instructions. > + * > + * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Grou

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-18 Thread Amonson, Paul D
> Hmm, I wonder if the "(c) 2024 Intel" line is going to bring us trouble. > (I bet it's not really necessary anyway.) Our lawyer agrees, copyright is covered by the "PostgreSQL Global Development Group" copyright line as a contributor. > And this bit doesn't look good. The LICENSE file says: .

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-18 Thread Bruce Momjian
On Tue, Jun 18, 2024 at 05:14:08PM +, Amonson, Paul D wrote: > > And this bit doesn't look good. The LICENSE file says: > ... > > > //* Redistributions in binary form must reproduce the above > > > // copyright notice, this list of conditions and the following > > > disclaimer // in the do

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-18 Thread Bruce Momjian
On Tue, Jun 18, 2024 at 01:20:50PM -0400, Bruce Momjian wrote: > On Tue, Jun 18, 2024 at 05:14:08PM +, Amonson, Paul D wrote: > > > And this bit doesn't look good. The LICENSE file says: > > ... > > > > //* Redistributions in binary form must reproduce the above > > > > // copyright notice

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-19 Thread Bruce Momjian
On Tue, Jun 18, 2024 at 02:00:34PM -0400, Bruce Momjian wrote: > While the license we are concerned about does not have this clause, it > does have: > >2. Redistributions in binary form must reproduce the above > copyright notice, this list of conditions and the following > dis

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-25 Thread Amonson, Paul D
> It would be good to know exactly what, if any, changes the Intel lawyers want > us to make to our license if we accept this patch. I asked about this and there is nothing Intel requires here license wise. They believe that there is nothing wrong with including Clause-3 BSD like licenses under

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-25 Thread Bruce Momjian
On Tue, Jun 25, 2024 at 05:41:12PM +, Amonson, Paul D wrote: > > It would be good to know exactly what, if any, changes the Intel > > lawyers want us to make to our license if we accept this patch. > > I asked about this and there is nothing Intel requires here license > wise. They believe that

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-07-18 Thread Amonson, Paul D
> Okay, that is very interesting. Yes, we will have no problem reproducing the > exact license text in the source code. I think we can remove the license > issue > as a blocker for this patch. Hi, I was wondering if I can I get a review please. I am interested in the refactor question for the

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-08 Thread Nathan Bossart
On Wed, Jun 12, 2024 at 12:37:46PM -0700, Andres Freund wrote: > I'm wonder if this isn't going in the wrong direction. We're using CRCs for > something they're not well suited for in my understanding - and are paying a > reasonably high price for it, given that even hardware accelerated CRCs aren'

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-22 Thread Amonson, Paul D
Hi, Here are the latest patches for the accelerated CRC32c algorithm. I did the following to create these refactored patches: 1) From the main branch I moved all x86_64 hardware checks from the various locations into a single location. I did not move any ARM tests as I would have no way to tes

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-22 Thread Nathan Bossart
Thanks for the new patches. On Thu, Aug 22, 2024 at 03:14:32PM +, Amonson, Paul D wrote: > I reran all the basic tests again to make sure that the performance > numbers were within the margin of error when compared to my original > finding. This step showed similar numbers (see origin post) ar

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-22 Thread Amonson, Paul D
> Upthread [0], Andres suggested dispatching to a different implementation for > compile-time-known small lengths. Have you looked into that? In your > original post, you noted a 14% regression for records smaller than 256 bytes, > which is not an uncommon case for Postgres. IMO we should try to

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-26 Thread Amonson, Paul D
> Upthread [0], Andres suggested dispatching to a different implementation for > compile-time-known small lengths. Have you looked into that? In your > original post, you noted a 14% regression for records smaller than 256 bytes, > which is not an uncommon case for Postgres. IMO we should try to

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-26 Thread Nathan Bossart
On Mon, Aug 26, 2024 at 05:09:35PM +, Amonson, Paul D wrote: > Ok I added a patch that exposed a new macro CRC32C_COMP_SMALL for > targeted fixed size < 256 use cases in Postgres. As for mitigating the > regression in general, I have not been able to work up a fallback (i.e. > <256 bytes) that

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-26 Thread Amonson, Paul D
> I'm curious about where exactly the regression is coming from. Is it possible > that your build for the SSE 4.2 tests was using it unconditionally, i.e., > optimizing away the function pointer? I am calling the SSE 4.2 implementation directly; I am not even building the pg_sse42_*_choose.c fil

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-26 Thread Nathan Bossart
On Mon, Aug 26, 2024 at 06:44:55PM +, Amonson, Paul D wrote: >> I'm curious about where exactly the regression is coming from. Is it >> possible >> that your build for the SSE 4.2 tests was using it unconditionally, i.e., >> optimizing away the function pointer? > > I am calling the SSE 4.2

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-26 Thread Amonson, Paul D
> And this still shows the ~14% regression in your original post? At the small buffer sizes the margin of error or "noise" is larger, 7-11%. My average could be just bad luck. It will take me a while to re-setup for full data collection runs but I can try it again if you like. Paul

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-26 Thread Nathan Bossart
On Mon, Aug 26, 2024 at 06:54:58PM +, Amonson, Paul D wrote: >> And this still shows the ~14% regression in your original post? > > At the small buffer sizes the margin of error or "noise" is larger, > 7-11%. My average could be just bad luck. It will take me a while to > re-setup for full dat

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-26 Thread Amonson, Paul D
> IMHO that would be useful to establish the current state of the patch set from > a performance standpoint, especially since you've added code intended to > mitigate the regression. Ok. > +#define COMP_CRC32C_SMALL(crc, data, len) \ > + ((crc) = pg_comp_crc32c_sse42((crc), (data), (len))) >

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-26 Thread Nathan Bossart
On Mon, Aug 26, 2024 at 07:15:47PM +, Amonson, Paul D wrote: >> +#define COMP_CRC32C_SMALL(crc, data, len) \ >> +((crc) = pg_comp_crc32c_sse42((crc), (data), (len))) >> >> My interpretation of Andres's upthread suggestion is that we'd add the length >> check within the macro instead of int

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-27 Thread Amonson, Paul D
> Things like sizeof() and offsetof() are known at compile time, so the compiler > will recognize when a condition is always true or false and optimize it out > accordingly. In cases where the value cannot be known at compile time, > checking the length in the macro and dispatching to a different