Re: use ARM intrinsics in pg_lfind32() where available

2022-08-29 Thread John Naylor
On Tue, Aug 30, 2022 at 12:17 AM Nathan Bossart wrote: > Thanks! I've attached a follow-up patch with a couple of small > suggestions. Pushed, thanks! -- John Naylor EDB: http://www.enterprisedb.com

Re: use ARM intrinsics in pg_lfind32() where available

2022-08-29 Thread Nathan Bossart
On Mon, Aug 29, 2022 at 05:49:46PM +0700, John Naylor wrote: > Bowerbird just reported the same error, so I went ahead and pushed a > fix with this. Thanks! I've attached a follow-up patch with a couple of small suggestions. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com diff

Re: use ARM intrinsics in pg_lfind32() where available

2022-08-29 Thread John Naylor
On Mon, Aug 29, 2022 at 4:28 PM John Naylor wrote: > > Here's the simplest fix I can think of: > > /* > * Exactly like vector8_is_highbit_set except for the input type, so > it still looks > * at each _byte_ separately. > * > * XXX x86 uses the same underlying type for vectors with 8-bit, >

Re: use ARM intrinsics in pg_lfind32() where available

2022-08-29 Thread John Naylor
On Mon, Aug 29, 2022 at 3:19 PM John Naylor wrote: > > It turns out MSVC animal drongo doesn't like this cast -- on x86 they > are the same underlying type. Will look into that as more results come > in. Here's the simplest fix I can think of: /* * Exactly like vector8_is_highbit_set except

Re: use ARM intrinsics in pg_lfind32() where available

2022-08-29 Thread John Naylor
On Mon, Aug 29, 2022 at 11:25 AM John Naylor wrote: > +static inline bool > +vector32_is_highbit_set(const Vector32 v) > +{ > +#ifdef USE_SSE2 > + return (_mm_movemask_epi8(v) & 0x) != 0; > +#endif > +} > > I'm not sure why we need this function -- AFAICS it just adds more > work on x86 for

Re: use ARM intrinsics in pg_lfind32() where available

2022-08-29 Thread John Naylor
On Mon, Aug 29, 2022 at 12:44 PM Nathan Bossart wrote: > [v6] Pushed with a couple comment adjustments, let's see what the build farm thinks... -- John Naylor EDB: http://www.enterprisedb.com

Re: use ARM intrinsics in pg_lfind32() where available

2022-08-28 Thread Nathan Bossart
On Mon, Aug 29, 2022 at 11:25:50AM +0700, John Naylor wrote: > + uint32 nelem_per_vector = sizeof(Vector32) / sizeof(uint32); > + uint32 nelem_per_iteration = 4 * nelem_per_vector; > > Using local #defines would be my style. I don't have a reason to > object to this way, but adding const makes

Re: use ARM intrinsics in pg_lfind32() where available

2022-08-28 Thread Tom Lane
John Naylor writes: > I wonder if we should explain briefly what saturating arithmetic is. I > had never encountered it outside of a SIMD programming context. +1, it's at least worth a sentence to define the term. regards, tom lane

Re: use ARM intrinsics in pg_lfind32() where available

2022-08-28 Thread John Naylor
On Sun, Aug 28, 2022 at 10:58 AM Nathan Bossart wrote: > > Here is a new patch set in which I've attempted to address all feedback. Looks in pretty good shape. Some more comments: + uint32 nelem_per_vector = sizeof(Vector32) / sizeof(uint32); + uint32 nelem_per_iteration = 4 * nelem_per_vector;

Re: use ARM intrinsics in pg_lfind32() where available

2022-08-27 Thread Nathan Bossart
Here is a new patch set in which I've attempted to address all feedback. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From a5f381097819db05b6e47418597cd56bab411fad Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Thu, 25 Aug 2022 22:18:30 -0700 Subject: [PATCH v5 1/2]

Re: use ARM intrinsics in pg_lfind32() where available

2022-08-27 Thread Nathan Bossart
On Sun, Aug 28, 2022 at 10:39:09AM +1200, Thomas Munro wrote: > On Sun, Aug 28, 2022 at 10:12 AM Nathan Bossart > wrote: >> Yup. The problem is that AFAICT there's no equivalent to >> _mm_movemask_epi8() on aarch64, so you end up with something like >> >> vmaxvq_u8(vandq_u8(v,

Re: use ARM intrinsics in pg_lfind32() where available

2022-08-27 Thread Thomas Munro
On Sun, Aug 28, 2022 at 10:12 AM Nathan Bossart wrote: > Yup. The problem is that AFAICT there's no equivalent to > _mm_movemask_epi8() on aarch64, so you end up with something like > > vmaxvq_u8(vandq_u8(v, vector8_broadcast(0x80))) != 0 > > But for pg_lfind32(), we really just want to

Re: use ARM intrinsics in pg_lfind32() where available

2022-08-27 Thread Nathan Bossart
On Sat, Aug 27, 2022 at 05:18:34PM -0400, Tom Lane wrote: > In short, I think the critical part of 0002 needs to look more like > this: > > +#elif defined(__aarch64__) && defined(__ARM_NEON) > +/* > + * We use the Neon instructions if the compiler provides access to them > + * (as indicated by

Re: use ARM intrinsics in pg_lfind32() where available

2022-08-27 Thread Nathan Bossart
Thanks for taking a look. On Sat, Aug 27, 2022 at 01:59:06PM +0700, John Naylor wrote: > I don't forsee any use of emulating vector registers with uint64 if > they only hold two ints. I wonder if it'd be better if all vector32 > functions were guarded with #ifndef NO_USE_SIMD. (I wonder if >

Re: use ARM intrinsics in pg_lfind32() where available

2022-08-27 Thread Tom Lane
I spent a bit more time researching the portability implications of this patch. I think that we should check __ARM_NEON before #including ; there is authoritative documentation out there telling you to, eg [1], and I can see no upside at all to not checking. We cannot check *only* __ARM_NEON,

Re: use ARM intrinsics in pg_lfind32() where available

2022-08-27 Thread John Naylor
On Sat, Aug 27, 2022 at 1:24 AM Nathan Bossart wrote: > > Here is a rebased patch set that applies to HEAD. 0001: #define USE_NO_SIMD typedef uint64 Vector8; +typedef uint64 Vector32; #endif I don't forsee any use of emulating vector registers with uint64 if they only hold two ints. I

Re: use ARM intrinsics in pg_lfind32() where available

2022-08-26 Thread Nathan Bossart
On Thu, Aug 25, 2022 at 11:13:47PM -0700, Nathan Bossart wrote: > Here is a new patch set that applies on top of v9-0001 in the > json_lex_string patch set [0] and v3 of the is_valid_ascii patch [1]. Here is a rebased patch set that applies to HEAD. -- Nathan Bossart Amazon Web Services:

Re: use ARM intrinsics in pg_lfind32() where available

2022-08-26 Thread Nathan Bossart
Here is a new patch set that applies on top of v9-0001 in the json_lex_string patch set [0] and v3 of the is_valid_ascii patch [1]. [0] https://postgr.es/m/CAFBsxsFV4v802idV0-Bo%3DV7wLMHRbOZ4er0hgposhyGCikmVGA%40mail.gmail.com [1]

Re: use ARM intrinsics in pg_lfind32() where available

2022-08-25 Thread Nathan Bossart
On Fri, Aug 26, 2022 at 10:45:10AM +0700, John Naylor wrote: > On Thu, Aug 25, 2022 at 11:57 AM Nathan Bossart > wrote: >> The ARM literature appears to indicate that Neon support is pretty standard >> on aarch64, and AFAICT it's pretty common to just assume it's available. > > This doesn't

Re: use ARM intrinsics in pg_lfind32() where available

2022-08-25 Thread John Naylor
On Thu, Aug 25, 2022 at 11:57 AM Nathan Bossart wrote: > > On Thu, Aug 25, 2022 at 10:38:34AM +0700, John Naylor wrote: > > On Thu, Aug 25, 2022 at 1:01 AM Nathan Bossart > > wrote: > >> On Wed, Aug 24, 2022 at 11:07:03AM +0700, John Naylor wrote: > >> > - Can a user on ARM64 ever get a runtime

Re: use ARM intrinsics in pg_lfind32() where available

2022-08-24 Thread Nathan Bossart
On Thu, Aug 25, 2022 at 10:38:34AM +0700, John Naylor wrote: > On Thu, Aug 25, 2022 at 1:01 AM Nathan Bossart > wrote: >> On Wed, Aug 24, 2022 at 11:07:03AM +0700, John Naylor wrote: >> > - Can a user on ARM64 ever get a runtime fault if the machine attempts >> > to execute NEON instructions? >>

Re: use ARM intrinsics in pg_lfind32() where available

2022-08-24 Thread John Naylor
On Thu, Aug 25, 2022 at 1:01 AM Nathan Bossart wrote: > > On Wed, Aug 24, 2022 at 11:07:03AM +0700, John Naylor wrote: > > The important thing is: if we compile with __aarch64__ as a target: > > - Will the compiler emit the intended instructions from the intrinsics > > without extra flags? > > My

Re: use ARM intrinsics in pg_lfind32() where available

2022-08-24 Thread Nathan Bossart
On Wed, Aug 24, 2022 at 11:07:03AM +0700, John Naylor wrote: > The important thing is: if we compile with __aarch64__ as a target: > - Will the compiler emit the intended instructions from the intrinsics > without extra flags? My testing with GCC and Clang did not require any extra flags. GCC

Re: use ARM intrinsics in pg_lfind32() where available

2022-08-23 Thread John Naylor
On Tue, Aug 23, 2022 at 4:15 AM Nathan Bossart wrote: > > On Mon, Aug 22, 2022 at 11:50:35AM +0700, John Naylor wrote: > > Is this also ever defined on 32-bit? If so, is it safe, meaning the > > compiler will not emit these instructions without additional flags? > > I'm wondering if __aarch64__

Re: use ARM intrinsics in pg_lfind32() where available

2022-08-22 Thread Nathan Bossart
On Mon, Aug 22, 2022 at 11:50:35AM +0700, John Naylor wrote: > On Sat, Aug 20, 2022 at 5:28 AM Nathan Bossart > wrote: >> Thanks for the pointer. GCC, Clang, and the Arm compiler all seem to >> define __ARM_NEON, so here is a patch that uses that instead. > > Is this also ever defined on

Re: use ARM intrinsics in pg_lfind32() where available

2022-08-21 Thread John Naylor
On Sat, Aug 20, 2022 at 5:28 AM Nathan Bossart wrote: > > On Fri, Aug 19, 2022 at 02:26:02PM -0700, Andres Freund wrote: > > Are you sure there's not an appropriate define for us to use here instead > > of a > > configure test? E.g. > > > > echo|cc -dM -P -E -|grep -iE 'arm|aarch' > > ... > >

Re: use ARM intrinsics in pg_lfind32() where available

2022-08-19 Thread Nathan Bossart
On Fri, Aug 19, 2022 at 02:26:02PM -0700, Andres Freund wrote: > Are you sure there's not an appropriate define for us to use here instead of a > configure test? E.g. > > echo|cc -dM -P -E -|grep -iE 'arm|aarch' > ... > #define __AARCH64_SIMD__ 1 > ... > #define __ARM_NEON 1 > #define

Re: use ARM intrinsics in pg_lfind32() where available

2022-08-19 Thread Andres Freund
Hi, On 2022-08-19 13:08:29 -0700, Nathan Bossart wrote: > I've tested the patch on a recent macOS (M1 Pro) and Amazon Linux > (Graviton2), and I've confirmed that the instructions aren't used on a > Linux/Intel machine. I did add a new configure check to see if the > relevant intrinsics are

use ARM intrinsics in pg_lfind32() where available

2022-08-19 Thread Nathan Bossart
Hi hackers, This is a follow-up for recent changes that optimized [sub]xip lookups in XidInMVCCSnapshot() on Intel hardware [0] [1]. I've attached a patch that uses ARM Advanced SIMD (Neon) intrinsic functions where available to speed up the search. The approach is nearly identical to the SSE2