RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-10-18 Thread Devulapalli, Raghuveer
> I've proposed a patch to move the existing AVX-512 code in Postgres to use > __attribute__((target("..."))) instead of per-translation-unit compiler flags > [0]. We > should likely do something similar for this one. > > [0] https://postgr.es/m/ZxAqRG1-8fJLMRUY%40nathan I assume this will be

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-10-18 Thread Nathan Bossart
On Tue, Oct 08, 2024 at 08:19:27PM +, Devulapalli, Raghuveer wrote: > Hi all, I'm currently in the process of reviewing and analyzing Paul's > patch. In the meantime, I'm open to addressing any questions or feedback > you may have. I've proposed a patch to move the existing AVX-512 code in Pos

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-10-08 Thread Devulapalli, Raghuveer
Thank you for the introduction, Paul. Hi all, I'm currently in the process of reviewing and analyzing Paul's patch. In the meantime, I'm open to addressing any questions or feedback you may have. > Hi all, > > I will be retiring from Intel at the end of this week. I wanted to introduce > the

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-09-24 Thread Amonson, Paul D
Hi all, I will be retiring from Intel at the end of this week. I wanted to introduce the engineer who will be taking over the CRC32c proposal and commit fest entry. Devulapalli, Raghuveer I have brought him up to speed and he will be the go-to for technical review comments and questions. Plea

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-27 Thread Amonson, Paul D
> Things like sizeof() and offsetof() are known at compile time, so the compiler > will recognize when a condition is always true or false and optimize it out > accordingly. In cases where the value cannot be known at compile time, > checking the length in the macro and dispatching to a different

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-26 Thread Nathan Bossart
On Mon, Aug 26, 2024 at 07:15:47PM +, Amonson, Paul D wrote: >> +#define COMP_CRC32C_SMALL(crc, data, len) \ >> +((crc) = pg_comp_crc32c_sse42((crc), (data), (len))) >> >> My interpretation of Andres's upthread suggestion is that we'd add the length >> check within the macro instead of int

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-26 Thread Amonson, Paul D
> IMHO that would be useful to establish the current state of the patch set from > a performance standpoint, especially since you've added code intended to > mitigate the regression. Ok. > +#define COMP_CRC32C_SMALL(crc, data, len) \ > + ((crc) = pg_comp_crc32c_sse42((crc), (data), (len))) >

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-26 Thread Nathan Bossart
On Mon, Aug 26, 2024 at 06:54:58PM +, Amonson, Paul D wrote: >> And this still shows the ~14% regression in your original post? > > At the small buffer sizes the margin of error or "noise" is larger, > 7-11%. My average could be just bad luck. It will take me a while to > re-setup for full dat

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-26 Thread Amonson, Paul D
> And this still shows the ~14% regression in your original post? At the small buffer sizes the margin of error or "noise" is larger, 7-11%. My average could be just bad luck. It will take me a while to re-setup for full data collection runs but I can try it again if you like. Paul

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-26 Thread Nathan Bossart
On Mon, Aug 26, 2024 at 06:44:55PM +, Amonson, Paul D wrote: >> I'm curious about where exactly the regression is coming from. Is it >> possible >> that your build for the SSE 4.2 tests was using it unconditionally, i.e., >> optimizing away the function pointer? > > I am calling the SSE 4.2

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-26 Thread Amonson, Paul D
> I'm curious about where exactly the regression is coming from. Is it possible > that your build for the SSE 4.2 tests was using it unconditionally, i.e., > optimizing away the function pointer? I am calling the SSE 4.2 implementation directly; I am not even building the pg_sse42_*_choose.c fil

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-26 Thread Nathan Bossart
On Mon, Aug 26, 2024 at 05:09:35PM +, Amonson, Paul D wrote: > Ok I added a patch that exposed a new macro CRC32C_COMP_SMALL for > targeted fixed size < 256 use cases in Postgres. As for mitigating the > regression in general, I have not been able to work up a fallback (i.e. > <256 bytes) that

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-26 Thread Amonson, Paul D
> Upthread [0], Andres suggested dispatching to a different implementation for > compile-time-known small lengths. Have you looked into that? In your > original post, you noted a 14% regression for records smaller than 256 bytes, > which is not an uncommon case for Postgres. IMO we should try to

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-22 Thread Amonson, Paul D
> Upthread [0], Andres suggested dispatching to a different implementation for > compile-time-known small lengths. Have you looked into that? In your > original post, you noted a 14% regression for records smaller than 256 bytes, > which is not an uncommon case for Postgres. IMO we should try to

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-22 Thread Nathan Bossart
Thanks for the new patches. On Thu, Aug 22, 2024 at 03:14:32PM +, Amonson, Paul D wrote: > I reran all the basic tests again to make sure that the performance > numbers were within the margin of error when compared to my original > finding. This step showed similar numbers (see origin post) ar

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-22 Thread Amonson, Paul D
Hi, Here are the latest patches for the accelerated CRC32c algorithm. I did the following to create these refactored patches: 1) From the main branch I moved all x86_64 hardware checks from the various locations into a single location. I did not move any ARM tests as I would have no way to tes

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-08-08 Thread Nathan Bossart
On Wed, Jun 12, 2024 at 12:37:46PM -0700, Andres Freund wrote: > I'm wonder if this isn't going in the wrong direction. We're using CRCs for > something they're not well suited for in my understanding - and are paying a > reasonably high price for it, given that even hardware accelerated CRCs aren'

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-07-18 Thread Amonson, Paul D
> Okay, that is very interesting. Yes, we will have no problem reproducing the > exact license text in the source code. I think we can remove the license > issue > as a blocker for this patch. Hi, I was wondering if I can I get a review please. I am interested in the refactor question for the

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-25 Thread Bruce Momjian
On Tue, Jun 25, 2024 at 05:41:12PM +, Amonson, Paul D wrote: > > It would be good to know exactly what, if any, changes the Intel > > lawyers want us to make to our license if we accept this patch. > > I asked about this and there is nothing Intel requires here license > wise. They believe that

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-25 Thread Amonson, Paul D
> It would be good to know exactly what, if any, changes the Intel lawyers want > us to make to our license if we accept this patch. I asked about this and there is nothing Intel requires here license wise. They believe that there is nothing wrong with including Clause-3 BSD like licenses under

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-19 Thread Bruce Momjian
On Tue, Jun 18, 2024 at 02:00:34PM -0400, Bruce Momjian wrote: > While the license we are concerned about does not have this clause, it > does have: > >2. Redistributions in binary form must reproduce the above > copyright notice, this list of conditions and the following > dis

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-18 Thread Bruce Momjian
On Tue, Jun 18, 2024 at 01:20:50PM -0400, Bruce Momjian wrote: > On Tue, Jun 18, 2024 at 05:14:08PM +, Amonson, Paul D wrote: > > > And this bit doesn't look good. The LICENSE file says: > > ... > > > > //* Redistributions in binary form must reproduce the above > > > > // copyright notice

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-18 Thread Bruce Momjian
On Tue, Jun 18, 2024 at 05:14:08PM +, Amonson, Paul D wrote: > > And this bit doesn't look good. The LICENSE file says: > ... > > > //* Redistributions in binary form must reproduce the above > > > // copyright notice, this list of conditions and the following > > > disclaimer // in the do

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-18 Thread Amonson, Paul D
> Hmm, I wonder if the "(c) 2024 Intel" line is going to bring us trouble. > (I bet it's not really necessary anyway.) Our lawyer agrees, copyright is covered by the "PostgreSQL Global Development Group" copyright line as a contributor. > And this bit doesn't look good. The LICENSE file says: .

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-18 Thread Alvaro Herrera
On 2024-Jun-12, Amonson, Paul D wrote: > +/*- > + * > + * pg_crc32c_avx512.c > + * Compute CRC-32C checksum using Intel AVX-512 instructions. > + * > + * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Grou

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-17 Thread Amonson, Paul D
> This is extremely workload dependent, it's not hard to find workloads with > lots of very small record and very few big ones... What you observed might > have "just" been the warmup behaviour where more full page writes have to > be written. Can you tell me how to avoid capturing this "warm-up"

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-12 Thread Amonson, Paul D
> -Original Message- > From: Andres Freund > Sent: Wednesday, June 12, 2024 1:12 PM > To: Amonson, Paul D > FWIW, I tried the v2 patch on my Xeon Gold 5215 workstation, and dies early > on with SIGILL: Nice catch!!! I was testing the bit for the vpclmulqdq in EBX instead of the corre

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-12 Thread Andres Freund
Hi, On 2024-05-01 15:56:08 +, Amonson, Paul D wrote: > Comparing the current SSE4.2 implementation of the CRC32C algorithm in > Postgres, to an optimized AVX-512 algorithm [0] we observed significant > gains. The result was a ~6.6X average multiplier of increased performance > measured on 3 di

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-12 Thread Andres Freund
Hi, I'm wonder if this isn't going in the wrong direction. We're using CRCs for something they're not well suited for in my understanding - and are paying a reasonably high price for it, given that even hardware accelerated CRCs aren't blazingly fast. CRCs are used for things like ethernet, iSCSI

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-12 Thread Bruce Momjian
On Wed, Jun 12, 2024 at 02:08:02PM -0400, Tom Lane wrote: > "Amonson, Paul D" writes: > > I had our OSS internal team, who are experts in OSS licensing, review > > possible conflicts between the PostgreSQL license and the BSD-Clause 3-like > > license for the CRC32C AVX-512 code, and they found

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-12 Thread Tom Lane
"Amonson, Paul D" writes: > I had our OSS internal team, who are experts in OSS licensing, review > possible conflicts between the PostgreSQL license and the BSD-Clause 3-like > license for the CRC32C AVX-512 code, and they found no issues. Therefore, > including the new license into the Postgr

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-06-12 Thread Amonson, Paul D
> The project is currently in feature-freeze in preparation for the next major > release so new development and ideas are not the top priority right now. > Additionally there is a large developer meeting shortly which many are busy > preparing for. Excercise some patience, and I'm sure there will

Re: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-05-20 Thread Daniel Gustafsson
> On 17 May 2024, at 18:21, Amonson, Paul D wrote: > Hi, forgive the top-post but I have not seen any response to this post? The project is currently in feature-freeze in preparation for the next major release so new development and ideas are not the top priority right now. Additionally there is

RE: Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-05-17 Thread Amonson, Paul D
Proposal for Updating CRC32C with AVX-512 Algorithm. > > Hi, > > Comparing the current SSE4.2 implementation of the CRC32C algorithm in > Postgres, to an optimized AVX-512 algorithm [0] we observed significant > gains. The result was a ~6.6X average multiplier of increased perf

Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-05-01 Thread Amonson, Paul D
Hi, Comparing the current SSE4.2 implementation of the CRC32C algorithm in Postgres, to an optimized AVX-512 algorithm [0] we observed significant gains. The result was a ~6.6X average multiplier of increased performance measured on 3 different Intel products. Details below. The AVX-512 algorit