> I've proposed a patch to move the existing AVX-512 code in Postgres to use
> __attribute__((target("..."))) instead of per-translation-unit compiler flags
> [0]. We
> should likely do something similar for this one.
>
> [0] https://postgr.es/m/ZxAqRG1-8fJLMRUY%40nathan
I assume this will be
On Tue, Oct 08, 2024 at 08:19:27PM +, Devulapalli, Raghuveer wrote:
> Hi all, I'm currently in the process of reviewing and analyzing Paul's
> patch. In the meantime, I'm open to addressing any questions or feedback
> you may have.
I've proposed a patch to move the existing AVX-512 code in Pos
Thank you for the introduction, Paul.
Hi all, I'm currently in the process of reviewing and analyzing Paul's patch.
In the meantime, I'm open to addressing any questions or feedback you may have.
> Hi all,
>
> I will be retiring from Intel at the end of this week. I wanted to introduce
> the
Hi all,
I will be retiring from Intel at the end of this week. I wanted to introduce
the engineer who will be taking over the CRC32c proposal and commit fest entry.
Devulapalli, Raghuveer
I have brought him up to speed and he will be the go-to for technical review
comments and questions. Plea
> Things like sizeof() and offsetof() are known at compile time, so the compiler
> will recognize when a condition is always true or false and optimize it out
> accordingly. In cases where the value cannot be known at compile time,
> checking the length in the macro and dispatching to a different
On Mon, Aug 26, 2024 at 07:15:47PM +, Amonson, Paul D wrote:
>> +#define COMP_CRC32C_SMALL(crc, data, len) \
>> +((crc) = pg_comp_crc32c_sse42((crc), (data), (len)))
>>
>> My interpretation of Andres's upthread suggestion is that we'd add the length
>> check within the macro instead of int
> IMHO that would be useful to establish the current state of the patch set from
> a performance standpoint, especially since you've added code intended to
> mitigate the regression.
Ok.
> +#define COMP_CRC32C_SMALL(crc, data, len) \
> + ((crc) = pg_comp_crc32c_sse42((crc), (data), (len)))
>
On Mon, Aug 26, 2024 at 06:54:58PM +, Amonson, Paul D wrote:
>> And this still shows the ~14% regression in your original post?
>
> At the small buffer sizes the margin of error or "noise" is larger,
> 7-11%. My average could be just bad luck. It will take me a while to
> re-setup for full dat
> And this still shows the ~14% regression in your original post?
At the small buffer sizes the margin of error or "noise" is larger, 7-11%. My
average could be just bad luck. It will take me a while to re-setup for full
data collection runs but I can try it again if you like.
Paul
On Mon, Aug 26, 2024 at 06:44:55PM +, Amonson, Paul D wrote:
>> I'm curious about where exactly the regression is coming from. Is it
>> possible
>> that your build for the SSE 4.2 tests was using it unconditionally, i.e.,
>> optimizing away the function pointer?
>
> I am calling the SSE 4.2
> I'm curious about where exactly the regression is coming from. Is it possible
> that your build for the SSE 4.2 tests was using it unconditionally, i.e.,
> optimizing away the function pointer?
I am calling the SSE 4.2 implementation directly; I am not even building the
pg_sse42_*_choose.c fil
On Mon, Aug 26, 2024 at 05:09:35PM +, Amonson, Paul D wrote:
> Ok I added a patch that exposed a new macro CRC32C_COMP_SMALL for
> targeted fixed size < 256 use cases in Postgres. As for mitigating the
> regression in general, I have not been able to work up a fallback (i.e.
> <256 bytes) that
> Upthread [0], Andres suggested dispatching to a different implementation for
> compile-time-known small lengths. Have you looked into that? In your
> original post, you noted a 14% regression for records smaller than 256 bytes,
> which is not an uncommon case for Postgres. IMO we should try to
> Upthread [0], Andres suggested dispatching to a different implementation for
> compile-time-known small lengths. Have you looked into that? In your
> original post, you noted a 14% regression for records smaller than 256 bytes,
> which is not an uncommon case for Postgres. IMO we should try to
Thanks for the new patches.
On Thu, Aug 22, 2024 at 03:14:32PM +, Amonson, Paul D wrote:
> I reran all the basic tests again to make sure that the performance
> numbers were within the margin of error when compared to my original
> finding. This step showed similar numbers (see origin post) ar
Hi,
Here are the latest patches for the accelerated CRC32c algorithm. I did the
following to create these refactored patches:
1) From the main branch I moved all x86_64 hardware checks from the various
locations into a single location. I did not move any ARM tests as I would have
no way to tes
On Wed, Jun 12, 2024 at 12:37:46PM -0700, Andres Freund wrote:
> I'm wonder if this isn't going in the wrong direction. We're using CRCs for
> something they're not well suited for in my understanding - and are paying a
> reasonably high price for it, given that even hardware accelerated CRCs aren'
> Okay, that is very interesting. Yes, we will have no problem reproducing the
> exact license text in the source code. I think we can remove the license
> issue
> as a blocker for this patch.
Hi,
I was wondering if I can I get a review please. I am interested in the refactor
question for the
On Tue, Jun 25, 2024 at 05:41:12PM +, Amonson, Paul D wrote:
> > It would be good to know exactly what, if any, changes the Intel
> > lawyers want us to make to our license if we accept this patch.
>
> I asked about this and there is nothing Intel requires here license
> wise. They believe that
> It would be good to know exactly what, if any, changes the Intel lawyers want
> us to make to our license if we accept this patch.
I asked about this and there is nothing Intel requires here license wise. They
believe that there is nothing wrong with including Clause-3 BSD like licenses
under
On Tue, Jun 18, 2024 at 02:00:34PM -0400, Bruce Momjian wrote:
> While the license we are concerned about does not have this clause, it
> does have:
>
>2. Redistributions in binary form must reproduce the above
> copyright notice, this list of conditions and the following
> dis
On Tue, Jun 18, 2024 at 01:20:50PM -0400, Bruce Momjian wrote:
> On Tue, Jun 18, 2024 at 05:14:08PM +, Amonson, Paul D wrote:
> > > And this bit doesn't look good. The LICENSE file says:
> > ...
> > > > //* Redistributions in binary form must reproduce the above
> > > > // copyright notice
On Tue, Jun 18, 2024 at 05:14:08PM +, Amonson, Paul D wrote:
> > And this bit doesn't look good. The LICENSE file says:
> ...
> > > //* Redistributions in binary form must reproduce the above
> > > // copyright notice, this list of conditions and the following
> > > disclaimer // in the do
> Hmm, I wonder if the "(c) 2024 Intel" line is going to bring us trouble.
> (I bet it's not really necessary anyway.)
Our lawyer agrees, copyright is covered by the "PostgreSQL Global Development
Group" copyright line as a contributor.
> And this bit doesn't look good. The LICENSE file says:
.
On 2024-Jun-12, Amonson, Paul D wrote:
> +/*-
> + *
> + * pg_crc32c_avx512.c
> + * Compute CRC-32C checksum using Intel AVX-512 instructions.
> + *
> + * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Grou
> This is extremely workload dependent, it's not hard to find workloads with
> lots of very small record and very few big ones... What you observed might
> have "just" been the warmup behaviour where more full page writes have to
> be written.
Can you tell me how to avoid capturing this "warm-up"
> -Original Message-
> From: Andres Freund
> Sent: Wednesday, June 12, 2024 1:12 PM
> To: Amonson, Paul D
> FWIW, I tried the v2 patch on my Xeon Gold 5215 workstation, and dies early
> on with SIGILL:
Nice catch!!! I was testing the bit for the vpclmulqdq in EBX instead of the
corre
Hi,
On 2024-05-01 15:56:08 +, Amonson, Paul D wrote:
> Comparing the current SSE4.2 implementation of the CRC32C algorithm in
> Postgres, to an optimized AVX-512 algorithm [0] we observed significant
> gains. The result was a ~6.6X average multiplier of increased performance
> measured on 3 di
Hi,
I'm wonder if this isn't going in the wrong direction. We're using CRCs for
something they're not well suited for in my understanding - and are paying a
reasonably high price for it, given that even hardware accelerated CRCs aren't
blazingly fast.
CRCs are used for things like ethernet, iSCSI
On Wed, Jun 12, 2024 at 02:08:02PM -0400, Tom Lane wrote:
> "Amonson, Paul D" writes:
> > I had our OSS internal team, who are experts in OSS licensing, review
> > possible conflicts between the PostgreSQL license and the BSD-Clause 3-like
> > license for the CRC32C AVX-512 code, and they found
"Amonson, Paul D" writes:
> I had our OSS internal team, who are experts in OSS licensing, review
> possible conflicts between the PostgreSQL license and the BSD-Clause 3-like
> license for the CRC32C AVX-512 code, and they found no issues. Therefore,
> including the new license into the Postgr
> The project is currently in feature-freeze in preparation for the next major
> release so new development and ideas are not the top priority right now.
> Additionally there is a large developer meeting shortly which many are busy
> preparing for. Excercise some patience, and I'm sure there will
> On 17 May 2024, at 18:21, Amonson, Paul D wrote:
> Hi, forgive the top-post but I have not seen any response to this post?
The project is currently in feature-freeze in preparation for the next major
release so new development and ideas are not the top priority right now.
Additionally there is
Proposal for Updating CRC32C with AVX-512 Algorithm.
>
> Hi,
>
> Comparing the current SSE4.2 implementation of the CRC32C algorithm in
> Postgres, to an optimized AVX-512 algorithm [0] we observed significant
> gains. The result was a ~6.6X average multiplier of increased perf
Hi,
Comparing the current SSE4.2 implementation of the CRC32C algorithm in
Postgres, to an optimized AVX-512 algorithm [0] we observed significant gains.
The result was a ~6.6X average multiplier of increased performance measured on
3 different Intel products. Details below. The AVX-512 algorit
35 matches
Mail list logo