Re: Extend fold_vec_perm to fold VEC_PERM_EXPR in VLA manner

2022-12-06 Thread Richard Sandiford via Gcc-patches
26 Oct 2022 at 21:07, Richard Sandiford >> > > wrote: >> > >> >> > >> Sorry for the slow response. I wanted to find some time to think >> > >> about this a bit more. >> > >> >> > >> Prathamesh Kulkarni writes:

Re: [PATCH 5/8]AArch64 aarch64: Make existing V2HF be usable.

2022-12-06 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: >> -Original Message- >> From: Richard Sandiford >> Sent: Tuesday, December 6, 2022 10:28 AM >> To: Tamar Christina >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw >> ; Marcus Shawcroft >> ; Kyrylo Tkachov >> Subject: Re: [PATCH 5/8]AArch64 aarch64: Make

Re: [PATCH 5/8]AArch64 aarch64: Make existing V2HF be usable.

2022-12-06 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: > Hi, > > >> This name might cause confusion with the SVE iterators, where FULL means >> "every bit of the register is used". How about something like VMOVE >> instead? >> >> With this change, I guess VALL_F16 represents "The set of all modes for >> which the vld1

Re: [PATCH] Add a new conversion for conditional ternary set into ifcvt [PR106536]

2022-12-06 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches writes: > On Thu, Nov 24, 2022 at 8:25 AM HAO CHEN GUI wrote: >> >> Hi Richard, >> >> >> 在 2022/11/24 4:06, Richard Biener 写道: >> > Wouldn't we usually either add an optab or try to recog a canonical >> > RTL form instead of adding a new target hook for things like

Re: [aarch64] PR107920 - Fix incorrect handling of virtual operands in svld1rq_impl::fold

2022-12-05 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni writes: > On Tue, 6 Dec 2022 at 00:08, Richard Sandiford > wrote: >> >> Prathamesh Kulkarni writes: >> > Hi, >> > The following test: >> > >> > #include "arm_sve.h" >> > >> > svint8_t >> > test_s8(int8_t *x) >> > { >> > return svld1rq_s8 (svptrue_b8 (), [0]); >> > } >> >

Re: [PATCH] libgcc: Fix uninitialized RA signing on AArch64 [PR107678]

2022-12-05 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra writes: > A recent change only initializes the regs.how[] during Dwarf unwinding > which resulted in an uninitialized offset used in return address signing > and random failures during unwinding. The fix is to use REG_SAVED_OFFSET > as the state where the return address signing

Re: [aarch64] PR107920 - Fix incorrect handling of virtual operands in svld1rq_impl::fold

2022-12-05 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni writes: > Hi, > The following test: > > #include "arm_sve.h" > > svint8_t > test_s8(int8_t *x) > { > return svld1rq_s8 (svptrue_b8 (), [0]); > } > > ICE's with -march=armv8.2-a+sve -O1 -fno-tree-ccp -fno-tree-forwprop: > during GIMPLE pass: fre > pr107920.c: In function

Re: [PATCH 2/2]AArch64 Support new tbranch optab.

2022-12-05 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: > Hi, > > I hadn't received any reply so I had implemented various ways to do this > (about 8 of them in fact). > > The conclusion is that no, we cannot emit one big RTL for the final > instruction immediately. > The reason that all comparisons in the AArch64 backend

Re: [PATCH][AArch64] Cleanup move immediate code

2022-12-05 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra writes: > Hi Richard, > >> -  scalar_int_mode imode = (mode == HFmode >> -    ? SImode >> -    : int_mode_for_mode (mode).require ()); >> +  machine_mode imode = (mode == DFmode) ? DImode : SImode; > >> It looks like this

Re: [PATCH 1/2]middle-end: Add new tbranch optab to add support for bit-test-and-branch operations

2022-12-05 Thread Richard Sandiford via Gcc-patches
Richard Sandiford via Gcc-patches writes: > Tamar Christina via Gcc-patches writes: >>> > +/* Check to see if the supplied comparison in PTEST can be performed as a >>> > + bit-test-and-branch instead. VAL must contain the original tree >>> > + e

Re: [PATCH 1/2]middle-end: Add new tbranch optab to add support for bit-test-and-branch operations

2022-12-05 Thread Richard Sandiford via Gcc-patches
Tamar Christina via Gcc-patches writes: >> > +/* Check to see if the supplied comparison in PTEST can be performed as a >> > + bit-test-and-branch instead. VAL must contain the original tree >> > + expression of the non-zero operand which will be used to rewrite the >> > + comparison in

Re: [PATCH]AArch64 Fix vector re-interpretation between partial SIMD modes

2022-12-05 Thread Richard Sandiford via Gcc-patches
t;> Subject: Re: [PATCH]AArch64 Fix vector re-interpretation between partial >> SIMD modes >> >> Richard Sandiford via Gcc-patches writes: >> > Tamar Christina writes: >> >> Hi All, >> >> >> >> While writing a patch serie

Re: AArch64: Add UNSPECV_PATCHABLE_AREA [PR98776]

2022-12-05 Thread Richard Sandiford via Gcc-patches
"Pop, Sebastian" writes: > Hi, > > Currently patchable area is at the wrong place on AArch64. It is placed > immediately after function label, before .cfi_startproc. This patch > adds UNSPECV_PATCHABLE_AREA for pseudo patchable area instruction and > modifies

Re: [aarch64] Use dup and zip1 for interleaving elements in initializing vector

2022-12-05 Thread Richard Sandiford via Gcc-patches
Richard Sandiford via Gcc-patches writes: > Prathamesh Kulkarni writes: >> Hi, >> For the following test-case: >> >> int16x8_t foo(int16_t x, int16_t y) >> { >> return (int16x8_t) { x, y, x, y, x, y, x, y }; >> } >> >> Code gen at -O3:

Re: [aarch64] Use dup and zip1 for interleaving elements in initializing vector

2022-12-05 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni writes: > Hi, > For the following test-case: > > int16x8_t foo(int16_t x, int16_t y) > { > return (int16x8_t) { x, y, x, y, x, y, x, y }; > } > > Code gen at -O3: > foo: > dupv0.8h, w0 > ins v0.h[1], w1 > ins v0.h[3], w1 > ins

Re: [PATCH] [testsuite] [arm/aarch64] -fno-short-enums for auto-init-[12].c

2022-12-04 Thread Richard Sandiford via Gcc-patches
Alexandre Oliva via Gcc-patches writes: > On arm-eabi, and possibly on other platforms, -fshort-enums is enabled > by default, which breaks some tests' expectations as to enum sizes > with DEFERRED_INIT. Disable short enums so that the expectations are > met. > > Regstraped on x86_64-linux-gnu,

Re: RFC: Make builtin types only valid for some target features

2022-12-04 Thread Richard Sandiford via Gcc
"Kewen.Lin" writes: > Hi, > > I'm working to find one solution for PR106736, which requires us to > make some built-in types only valid for some target features, and > emit error messages for the types when the condition isn't satisfied. > A straightforward idea is to guard the registry of

Re: [PATCH 2/2]AArch64 Perform more late folding of reg moves and shifts which arrive after expand

2022-12-01 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: >> -Original Message- >> From: Richard Sandiford >> Sent: Monday, November 14, 2022 9:59 PM >> To: Tamar Christina >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw >> ; Marcus Shawcroft >> ; Kyrylo Tkachov >> Subject: Re: [PATCH 2/2]AArch64 Perform more

Re: [PATCH] varasm: Fix type confusion bug

2022-12-01 Thread Richard Sandiford via Gcc-patches
Alex Coplan via Gcc-patches writes: > Hi, > > This patch fixes a type confusion bug in varasm.cc:assemble_variable. > The problem is that the current code calls: > > sect = get_variable_section (decl, false); > > and then accesses sect->named.name without checking whether the section > is in

Re: [PATCH] vect: Fold LEN_{LOAD, STORE} if it's for the whole vector [PR107412]

2022-12-01 Thread Richard Sandiford via Gcc-patches
"Kewen.Lin" writes: > Hi Richard, > > on 2022/11/24 17:24, Richard Sandiford wrote: >> "Kewen.Lin" writes: >>> Hi, >>> >>> As the test case in PR107412 shows, we can fold IFN .LEN_{LOAD, >>> STORE} into normal vector load/store if the given length is known >>> to be equal to the length of the

Re: [PATCH][AArch64] Cleanup move immediate code

2022-12-01 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra writes: > Hi Richard, > >> Just to make sure I understand: isn't it really just MOVN?  I would have >> expected a 32-bit MOVZ to be equivalent to (and add no capabilities over) >> a 64-bit MOVZ. > > The 32-bit MOVZ immediates are equivalent, MOVN never overlaps, and > MOVI has some

Re: [PATCH] sync libsframe toplevel from binutils-gdb

2022-11-25 Thread Richard Sandiford via Gcc-patches
Richard Earnshaw via Gcc-patches writes: > This pulls in the toplevel portion of this binutils-gdb commit: >19e559f1c91bfaedbd2f91d85ee161f3f03fda3c libsframe: add the SFrame library > > ChangeLog: > * Makefile.def: Add libsframe as new module with its dependencies. > *

Re: [PATCH]AArch64 sve2: Fix expansion of division [PR107830]

2022-11-24 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: >> -Original Message- >> From: Richard Sandiford >> Sent: Wednesday, November 23, 2022 4:18 PM >> To: Tamar Christina >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw >> ; Marcus Shawcroft >> ; Kyrylo Tkachov >> Subject: Re: [PATCH]AArch64 sve2: Fix

Re: [PATCH][AArch64] Cleanup move immediate code

2022-11-24 Thread Richard Sandiford via Gcc-patches
Sorry for the very long delay in reviewing this. Wilco Dijkstra writes: > Hi Richard, > > Here is the immediate cleanup splitoff from the previous patch: > > Simplify, refactor and improve various move immediate functions. > Allow 32-bit MOVZ/N as a valid 64-bit immediate which removes special >

Re: [PATCH] vect: Fold LEN_{LOAD, STORE} if it's for the whole vector [PR107412]

2022-11-24 Thread Richard Sandiford via Gcc-patches
"Kewen.Lin" writes: > Hi, > > As the test case in PR107412 shows, we can fold IFN .LEN_{LOAD, > STORE} into normal vector load/store if the given length is known > to be equal to the length of the whole vector. It would help to > improve overall cycles as normally the latency of vector access >

Re: [PATCH] AArch64: Add fma_reassoc_width [PR107413]

2022-11-23 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra writes: > Hi Richard, > >>> A smart reassociation pass could form more FMAs while also increasing >>> parallelism, but the way it currently works always results in fewer FMAs. >> >> Yeah, as Richard said, that seems the right long-term fix. >> It would also avoid the hack of

Re: [PATCH]AArch64 sve2: Fix expansion of division [PR107830]

2022-11-23 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: > Hi All, > > SVE has an actual division optab, and when using -Os we don't > optimize the division away. This means that we need to distinguish > between a div which we can optimize and one we cannot even during > expansion. > > Bootstrapped Regtested on

Re: PING^2 [PATCH] Adjust the symbol for SECTION_LINK_ORDER linked_to section [PR99889]

2022-11-22 Thread Richard Sandiford via Gcc-patches
"Kewen.Lin" writes: > Hi Richard, > > Many thanks for your review comments! > on 2022/8/24 16:17, Kewen.Lin via Gcc-patches wrote: > Hi, > > As discussed in PR98125, -fpatchable-function-entry with > SECTION_LINK_ORDER support doesn't work well on powerpc64 > ELFv1

Re: [PATCH] AArch64: Add fma_reassoc_width [PR107413]

2022-11-22 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra writes: > Hi Richard, > >> I guess an obvious question is: if 1 (rather than 2) was the right value >> for cores with 2 FMA pipes, why is 4 the right value for cores with 4 FMA >> pipes? It would be good to clarify how, conceptually, the core property >> should map to the

Re: [PATCH 2/2]AArch64 Support new tbranch optab.

2022-11-22 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: >> -Original Message- >> From: Richard Sandiford >> Sent: Tuesday, November 15, 2022 11:34 AM >> To: Tamar Christina >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw >> ; nd ; Marcus Shawcroft >> >> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab. >>

Re: [PATCH] maintainer-scripts/gcc_release: compress xz in parallel

2022-11-22 Thread Richard Sandiford via Gcc-patches
Sam James via Gcc-patches writes: >> On 8 Nov 2022, at 07:14, Sam James wrote: >> >> 1. This should speed up decompression for folks, as parallel xz >> creates a different archive which can be decompressed in parallel. >> >> Note that this different method is enabled by default in a new >>

Re: [PATCH] aarch64: Fix test_dfp_17.c for big-endian [PR 107604]

2022-11-22 Thread Richard Sandiford via Gcc-patches
Richard Earnshaw via Gcc-patches writes: > On 22/11/2022 09:01, Christophe Lyon via Gcc-patches wrote: >> gcc.target/aarch64/aapcs64/test_dfp_17.c has been failing on >> big-endian, because the _Decimal32 on-stack argument is not padded in >> the same direction depending on endianness. >> >>

Re: [PATCH 1/8]middle-end: Recognize scalar reductions from bitfields and array_refs

2022-11-22 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: >> -Original Message- >> From: Richard Biener >> Sent: Tuesday, November 22, 2022 10:59 AM >> To: Richard Sandiford >> Cc: Tamar Christina via Gcc-patches ; Tamar >> Christina ; Richard Biener >> ; nd >> Subject: Re: [PATCH 1/8]middle-end: Recognize scalar

Re: [PATCH] aarch64: Fix test_dfp_17.c for big-endian [PR 107604]

2022-11-22 Thread Richard Sandiford via Gcc-patches
Christophe Lyon via Gcc-patches writes: > gcc.target/aarch64/aapcs64/test_dfp_17.c has been failing on > big-endian, because the _Decimal32 on-stack argument is not padded in > the same direction depending on endianness. > > This patch fixes the testcase so that it expects the argument in the >

Re: [PATCH 1/8]middle-end: Recognize scalar reductions from bitfields and array_refs

2022-11-22 Thread Richard Sandiford via Gcc-patches
Tamar Christina via Gcc-patches writes: >> So it's not easily possible the within current infrastructure. But it does >> look >> like ARM might eventually benefit from something like STV on x86? >> > > I'm not sure. The problem with trying to do this in RTL is that you'd have > to be > able

Re: [PATCH] AArch64: Add fma_reassoc_width [PR107413]

2022-11-21 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra writes: > Add a reassocation width for FMAs in per-CPU tuning structures. Keep the > existing setting for cores with 2 FMA pipes, and use 4 for cores with 4 > FMA pipes. This improves SPECFP2017 on Neoverse V1 by ~1.5%. > > Passes regress/bootstrap, OK for commit? > > gcc/ >

Re: PING^2 [PATCH] Adjust the symbol for SECTION_LINK_ORDER linked_to section [PR99889]

2022-11-21 Thread Richard Sandiford via Gcc-patches
"Kewen.Lin" writes: > Hi, > > Gentle ping: https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600190.html > > Any comments are highly appreciated. > > BR, > Kewen > > on 2022/9/28 13:41, Kewen.Lin via Gcc-patches wrote: >> Hi, >> >> Gentle ping: >>

Re: [PATCH 2/2] aarch64: Add support for widening LDAPR instructions

2022-11-21 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)" writes: > Sorry for the late reply on this. I was wondering though why the check > made sense. The way I see it, SI -> SI mode is either wrong or useless. > So why not: > if it is wrong, error (gcc_assert?) so we know it was generated wrongly > somehow and fix it; > if

[PATCH] gomp: Various fixes for SVE types [PR101018]

2022-11-18 Thread Richard Sandiford via Gcc-patches
[I posted this late in stage 4 as an RFC, but it wasn't suitable for GCC 12 at that point. I kind-of dropped the ball after that, sorry.] Various parts of the omp code checked whether the size of a decl was an INTEGER_CST in order to determine whether the decl was variable-sized or not. If it

Re: [PATCH] Allow prologues and epilogues to be inserted later

2022-11-18 Thread Richard Sandiford via Gcc-patches
Jeff Law via Gcc-patches writes: > On 11/11/22 09:21, Richard Sandiford via Gcc-patches wrote: >> Arm's SME adds a new processor mode called streaming mode. >> This mode enables some new (matrix-oriented) instructions and >> disables several existing groups of instr

Re: [PATCH]AArch64 Fix vector re-interpretation between partial SIMD modes

2022-11-18 Thread Richard Sandiford via Gcc-patches
Richard Sandiford via Gcc-patches writes: > Tamar Christina writes: >> Hi All, >> >> While writing a patch series I started getting incorrect codegen out from >> VEC_PERM on partial struct types. >> >> It turns out that this was happening b

Re: [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector

2022-11-18 Thread Richard Sandiford via Gcc-patches
Hongtao Liu writes: > On Thu, Nov 17, 2022 at 9:59 PM Richard Sandiford > wrote: >> >> Hongtao Liu writes: >> > On Thu, Nov 17, 2022 at 5:39 PM Richard Sandiford >> > wrote: >> >> >> >> Hongtao Liu writes: >> >> > On Wed, Nov 16, 2022 at 1:39 AM Richard Sandiford >> >> > wrote: >> >> >> >>

Re: [PATCH]AArch64 Fix vector re-interpretation between partial SIMD modes

2022-11-17 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: > Hi All, > > While writing a patch series I started getting incorrect codegen out from > VEC_PERM on partial struct types. > > It turns out that this was happening because the TARGET_CAN_CHANGE_MODE_CLASS > implementation has a slight bug in it. The hook only checked for

Re: [PATCH] AArch64: Add support for -mdirect-extern-access

2022-11-17 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra writes: > Hi Richard, > >> Can you go into more detail about: >> >>Use :option:`-mdirect-extern-access` either in shared libraries or in >>executables, but not in both. Protected symbols used both in a shared >>library and executable may cause linker errors or fail to

Re: [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector

2022-11-17 Thread Richard Sandiford via Gcc-patches
Hongtao Liu writes: > On Thu, Nov 17, 2022 at 5:39 PM Richard Sandiford > wrote: >> >> Hongtao Liu writes: >> > On Wed, Nov 16, 2022 at 1:39 AM Richard Sandiford >> > wrote: >> >> >> >> Tamar Christina writes: >> >> >> -Original Message- >> >> >> From: Hongtao Liu >> >> >> Sent:

Re: [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector

2022-11-17 Thread Richard Sandiford via Gcc-patches
Hongtao Liu writes: > On Wed, Nov 16, 2022 at 1:39 AM Richard Sandiford > wrote: >> >> Tamar Christina writes: >> >> -Original Message- >> >> From: Hongtao Liu >> >> Sent: Tuesday, November 15, 2022 9:37 AM >> >> To: Tamar Christina >> >> Cc: Richard Sandiford ; Tamar Christina via >>

Re: [PATCH] AArch64: Add support for -mdirect-extern-access

2022-11-16 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra writes: > Add a new option -mdirect-extern-access similar to other targets. This > removes > GOT indirections on external symbols with -fPIE, resulting in significantly > better code quality. With -fPIC it only affects protected symbols, allowing > for more efficient shared

Re: [AArch64] Enable generation of FRINTNZ instructions

2022-11-15 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)" writes: > On 07/11/2022 11:05, Richard Biener wrote: >> On Fri, 4 Nov 2022, Andre Vieira (lists) wrote: >> >>> Sorry for the delay, just been reminded I still had this patch outstanding >>> from last stage 1. Hopefully since it has been mostly reviewed it could go >>> in

Re: [PATCH 2/2] aarch64: Add support for widening LDAPR instructions

2022-11-15 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)" writes: > Updated version of the patch to account for the testsuite changes in the > first patch. > > On 10/11/2022 11:20, Andre Vieira (lists) via Gcc-patches wrote: >> Hi, >> >> This patch adds support for the widening LDAPR instructions. >> >> Bootstrapped and

Re: [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector

2022-11-15 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: >> -Original Message- >> From: Hongtao Liu >> Sent: Tuesday, November 15, 2022 9:37 AM >> To: Tamar Christina >> Cc: Richard Sandiford ; Tamar Christina via >> Gcc-patches ; nd ; >> rguent...@suse.de >> Subject: Re: [PATCH 3/8]middle-end: Support extractions of

Re: [PATCH]middle-end: replace GET_MODE_WIDER_MODE with GET_MODE_NEXT_MODE

2022-11-15 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: >> -Original Message- >> From: Richard Sandiford >> Sent: Tuesday, November 15, 2022 11:59 AM >> To: Tamar Christina via Gcc-patches >> Cc: Tamar Christina ; nd ; >> rguent...@suse.de; j...@ventanamicro.com >> Subject: Re: [PATCH]middle-end: replace

Re: [PATCH]middle-end: replace GET_MODE_WIDER_MODE with GET_MODE_NEXT_MODE

2022-11-15 Thread Richard Sandiford via Gcc-patches
Tamar Christina via Gcc-patches writes: > Hi All, > > After the fix to the addsub patch yesterday for bootstrap I had only > regtested on x86. > While looking today it seemed the new tests were failing, this was caused > by a change in the behavior of the GET_MODE_WIDER_MODE macro on trunk. > >

Re: [PATCH 2/2]AArch64 Support new tbranch optab.

2022-11-15 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: >> -Original Message- >> From: Richard Sandiford >> Sent: Tuesday, November 15, 2022 11:15 AM >> To: Tamar Christina >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw >> ; nd ; Marcus Shawcroft >> >> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab. >>

Re: [PATCH 2/2]AArch64 Support new tbranch optab.

2022-11-15 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: >> -Original Message- >> From: Richard Sandiford >> Sent: Tuesday, November 15, 2022 10:51 AM >> To: Tamar Christina >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw >> ; nd ; Marcus Shawcroft >> >> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab. >>

Re: [PATCH]AArch64 Extend umov and sbfx patterns.

2022-11-15 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: > Hi, > >> > --- a/gcc/config/aarch64/aarch64-simd.md >> > +++ b/gcc/config/aarch64/aarch64-simd.md >> > @@ -4259,7 +4259,7 @@ (define_insn >> "*aarch64_get_lane_zero_extend" >> > ;; Extracting lane zero is split into a simple move when it is >> > between SIMD ;;

Re: [PATCH 2/2]AArch64 Support new tbranch optab.

2022-11-15 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: >> -Original Message- >> From: Richard Sandiford >> Sent: Tuesday, November 15, 2022 10:36 AM >> To: Tamar Christina >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw >> ; nd ; Marcus Shawcroft >> >> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab. >>

Re: [PATCH v2] aarch64: Add support for Ampere-1A (-mcpu=ampere1a) CPU

2022-11-15 Thread Richard Sandiford via Gcc-patches
Philipp Tomsich writes: > Richard, > > is this OK for backport to GCC-12 and GCC-11? The fusion part seems potentially risky for a stable branch, but since it's conditional on the new flag (and thus new CPU), I think it should be OK. So yeah, OK for both, thanks. Richard > Thanks, > Philipp.

Re: [PATCH 2/2]AArch64 Support new tbranch optab.

2022-11-15 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: > Hello, > > Ping and updated patch. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (*tb1): Rename to... > (*tb1): ... this. > (tbranch4):

Re: [PATCH 2/2]AArch64 Perform more late folding of reg moves and shifts which arrive after expand

2022-11-14 Thread Richard Sandiford via Gcc-patches
(Sorry, immediately following up to myself for a second time recently.) Richard Sandiford writes: > Tamar Christina writes: >>> >>> The same thing ought to work for smov, so it would be good to do both. >>> That would also make the split between the original and new patterns more >>> obvious:

Re: [PATCH 2/2]AArch64 Perform more late folding of reg moves and shifts which arrive after expand

2022-11-14 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: >> >> The same thing ought to work for smov, so it would be good to do both. >> That would also make the split between the original and new patterns more >> obvious: left shift for the old pattern, right shift for the new pattern. >> > > Done, though because umov can do

Re: [PATCH] libatomic: Add support for LSE and LSE2

2022-11-14 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra via Gcc-patches writes: > Add support for AArch64 LSE and LSE2 to libatomic. Disable outline atomics, > and use LSE ifuncs for 1-8 byte atomics and LSE2 ifuncs for 16-byte atomics. > On Neoverse V1, 16-byte atomics are ~4x faster due to avoiding locks. > > Note this is safe since

Re: [PATCH] aarch64: Add support for Ampere-1A (-mcpu=ampere1a) CPU

2022-11-14 Thread Richard Sandiford via Gcc-patches
Philipp Tomsich writes: > This patch adds support for Ampere-1A CPU: > - recognize the name of the core and provide detection for -mcpu=native, > - updated extra_costs, > - adds a new fusion pair for (A+B+1 and A-B-1). > > Ampere-1A and Ampere-1 have more timing difference than the extra >

[PATCH 12/16] aarch64: Tweaks to function_resolver::resolve_to

2022-11-13 Thread Richard Sandiford via Gcc-patches
This patch adds a new interface to function_resolver::resolve_to in which the mode suffix stays the same (which is the common case). It then moves the handling of explicit first type suffixes from function_resolver::resolve_unary to this new function. This makes things slightly simpler for

[PATCH 15/16] aarch64: Enforce inlining restrictions for SME

2022-11-13 Thread Richard Sandiford via Gcc-patches
A function that has local ZA state cannot be inlined into its caller, since we only support managing ZA switches at function scope. A function whose body requires a particular PSTATE.SM setting can only be inlined into a function body that guarantees that PSTATE.SM setting. (The callee's function

[PATCH 09/16] aarch64: Make AARCH64_FL_SVE requirements explicit

2022-11-13 Thread Richard Sandiford via Gcc-patches
So far, all intrinsics covered by the aarch64-sve-builtins* framework have (naturally enough) required at least SVE. However, arm_sme.h defines a couple of intrinsics that can be called by any code. It's therefore necessary to make the implicit SVE requirement explicit. gcc/ *

[PATCH 11/16] aarch64: Generalise _m rules for SVE intrinsics

2022-11-13 Thread Richard Sandiford via Gcc-patches
In SVE there was a simple rule that unary merging (_m) intrinsics had a separate initial argument to specify the values of inactive lanes, whereas other merging functions took inactive lanes from the first operand to the operation. That rule began to break down in SVE2, and it continues to do so

[PATCH 06/16] aarch64: Add support for SME ZA attributes

2022-11-13 Thread Richard Sandiford via Gcc-patches
SME has an array called ZA that can be enabled and disabled separately from streaming mode. A status bit called PSTATE.ZA indicates whether ZA is currently enabled or not. In C and C++, the state of PSTATE.ZA is controlled using function attributes. If a function's type has an arm_shared_za

[PATCH 16/16] aarch64: Update sibcall handling for SME

2022-11-13 Thread Richard Sandiford via Gcc-patches
We only support tail calls between functions with the same PSTATE.ZA setting ("private-ZA" to "private-ZA" and "shared-ZA" to "shared-ZA"). Only a normal non-streaming function can tail-call another non-streaming function, and only a streaming function can tail-call another streaming function.

[PATCH 10/16] aarch64: Generalise unspec_based_function_base

2022-11-13 Thread Richard Sandiford via Gcc-patches
Until now, SVE intrinsics that map directly to unspecs have always used type suffix 0 to distinguish between signed integers, unsigned integers, and floating-point values. SME adds functions that need to use type suffix 1 instead. This patch generalises the classes accordingly. gcc/ *

[PATCH 14/16] aarch64: Add support for arm_locally_streaming

2022-11-13 Thread Richard Sandiford via Gcc-patches
This patch adds support for the arm_locally_streaming attribute, which allows a function to use SME internally without changing the function's ABI. The attribute is valid but redundant for arm_streaming functions. gcc/ * config/aarch64/aarch64.cc (aarch64_attribute_table): Add

[PATCH 05/16] aarch64: Switch PSTATE.SM around calls

2022-11-13 Thread Richard Sandiford via Gcc-patches
This patch adds support for switching to the appropriate SME mode for each call. Switching to streaming mode requires an SMSTART SM instruction and switching to non-streaming mode requires an SMSTOP SM instruction. If the call is being made from streaming-compatible code, these switches are

[PATCH 07/16] aarch64: Add a register class for w12-w15

2022-11-13 Thread Richard Sandiford via Gcc-patches
Some SME instructions use w12-w15 to index ZA. This patch adds a register class for that range. gcc/ * config/aarch64/aarch64.h (ZA_INDEX_REGNUM_P): New macro. (ZA_INDEX_REGS): New register class. (REG_CLASS_NAMES, REG_CLASS_CONTENTS): Add entries for it. *

[PATCH 08/16] aarch64: Add a VNx1TI mode

2022-11-13 Thread Richard Sandiford via Gcc-patches
Although TI isn't really a native SVE element mode, it's convenient for SME if we define VNx1TI anyway, so that it can be used to distinguish .Q ZA operations from others. It's purely an RTL convenience and isn't (yet) a valid storage mode. gcc/ * config/aarch64/aarch64-modes.def: Add

[PATCH 03/16] aarch64: Distinguish streaming-compatible AdvSIMD insns

2022-11-13 Thread Richard Sandiford via Gcc-patches
The vast majority of Advanced SIMD instructions are not available in streaming mode, but some of the load/store/move instructions are. This patch adds a new target feature macro called TARGET_BASE_SIMD for this streaming-compatible subset. The vector-to-vector move instructions are not

[PATCH 02/16] aarch64: Add +sme

2022-11-13 Thread Richard Sandiford via Gcc-patches
This patch adds the +sme ISA feature and requires it to be present when compiling arm_streaming code. (arm_streaming_compatible code does not necessarily assume the presence of SME. It just has to work when SME is present and streaming mode is enabled.) gcc/ *

[PATCH 01/16] aarch64: Add arm_streaming(_compatible) attributes

2022-11-13 Thread Richard Sandiford via Gcc-patches
This patch adds support for recognising the SME arm_streaming and arm_streaming_compatible attributes. These attributes respectively describe whether the processor is definitely in "streaming mode" (PSTATE.SM==1), whether the processor is definitely not in streaming mode (PSTATE.SM==0), or

[PATCH 00/16] aarch64: Add support for SME

2022-11-13 Thread Richard Sandiford via Gcc-patches
This series adds support for the Armv9-A Scalable Matrix Extension (SME). Details about the extension are available here: https://developer.arm.com/documentation/ddi0616/aa/?lang=en The ABI and ACLE documentation is available on github:

[PATCH] builtins: Commonise default handling of nonlocal_goto

2022-11-13 Thread Richard Sandiford via Gcc-patches
expand_builtin_longjmp and expand_builtin_nonlocal_goto both emit nonlocal gotos. They first try to use a target-provided pattern and fall back to generic code otherwise. These pieces of generic code are almost identical, and having them inline like this makes it difficult to define a

Re: [PATCH 4/4]AArch64 sve2: rewrite pack + NARROWB + NARROWB to NARROWB + NARROWT

2022-11-12 Thread Richard Sandiford via Gcc-patches
Richard Sandiford writes: > Tamar Christina writes: >> Hi All, >> >> This adds an RTL pattern for when two NARROWB instructions are being combined >> with a PACK. The second NARROWB is then transformed into a NARROWT. >> >> For the example: >> >> void draw_bitmap1(uint8_t* restrict pixel,

Re: [PATCH 4/4]AArch64 sve2: rewrite pack + NARROWB + NARROWB to NARROWB + NARROWT

2022-11-12 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: > Hi All, > > This adds an RTL pattern for when two NARROWB instructions are being combined > with a PACK. The second NARROWB is then transformed into a NARROWT. > > For the example: > > void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) > { > for (int i =

Re: [PATCH 3/4]AArch64 Add SVE2 implementation for pow2 bitmask division

2022-11-12 Thread Richard Sandiford via Gcc-patches
Sorry for the slow review, been snowed under with stage1 stuff. Tamar Christina writes: > Hi All, > > In plenty of image and video processing code it's common to modify pixel > values > by a widening operation and then scale them back into range by dividing by > 255. > > This patch adds an

[PATCH] aarch64: Use SVE's RDVL instruction

2022-11-11 Thread Richard Sandiford via Gcc-patches
We didn't previously use SVE's RDVL instruction, since the CNT* forms are preferred and provide most of the range. However, there are some cases that RDVL can handle and CNT* can't, and using RDVL-like instructions becomes important for SME. Tested on aarch64-linux-gnu. I plan to apply this

[PATCH] Allow targets to add USEs to asms

2022-11-11 Thread Richard Sandiford via Gcc-patches
Arm's SME has an array called ZA that for inline asm purposes is effectively a form of special-purpose memory. It doesn't have an associated storage type and so can't be passed and returned in normal C/C++ objects. We'd therefore like "za" in a clobber list to mean that an inline asm can read

[PATCH] Add a new target hook: TARGET_START_CALL_ARGS

2022-11-11 Thread Richard Sandiford via Gcc-patches
We have the following two hooks into the call expansion code: - TARGET_CALL_ARGS is called for each argument before arguments are moved into hard registers. - TARGET_END_CALL_ARGS is called after the end of the call sequence (specifically, after any return value has been moved to a

[PATCH] Add a target hook for sibcall epilogues

2022-11-11 Thread Richard Sandiford via Gcc-patches
Epilogues for sibling calls are generated using the sibcall_epilogue pattern. One disadvantage of this approach is that the target doesn't know which call the epilogue is for, even though the code that generates the pattern has the call to hand. Although call instructions are currently rtxes,

[PATCH] Allow prologues and epilogues to be inserted later

2022-11-11 Thread Richard Sandiford via Gcc-patches
Arm's SME adds a new processor mode called streaming mode. This mode enables some new (matrix-oriented) instructions and disables several existing groups of instructions, such as most Advanced SIMD vector instructions and a much smaller set of SVE instructions. It can also change the current

[PATCH] Handle epilogues that contain jumps

2022-11-11 Thread Richard Sandiford via Gcc-patches
The prologue/epilogue pass allows the prologue sequence to contain jumps. The sequence is then partitioned into basic blocks using find_many_sub_basic_blocks. This patch treats epilogues in the same way. It's needed for a follow-on aarch64 patch that adds conditional code to both the prologue

Re: [PATCH 8/8]AArch64: Have reload not choose to do add on the scalar side if both values exist on the SIMD side.

2022-11-01 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: > Hi All, > > Currently we often times generate an r -> r add even if it means we need two > reloads to perform it, i.e. in the case that the values are on the SIMD side. > > The pairwise operations expose these more now and so we get suboptimal > codegen. > > Normally I

Re: [PATCH 5/8]AArch64 aarch64: Make existing V2HF be usable.

2022-11-01 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: > Hi All, > > The backend has an existing V2HFmode that is used by pairwise operations. > This mode was however never made fully functional. Amongst other things it > was > never declared as a vector type which made it unusable from the mid-end. > > It's also lacking an

Re: [PATCH 4/8]AArch64 aarch64: Implement widening reduction patterns

2022-11-01 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: > Hi All, > > This implements the new widening reduction optab in the backend. > Instead of introducing a duplicate definition for the same thing I have > renamed the intrinsics defintions to use the same optab. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no

Re: [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector

2022-11-01 Thread Richard Sandiford via Gcc-patches
Tamar Christina via Gcc-patches writes: > Hi All, > > The current vector extract pattern can only extract from a vector when the > position to extract is a multiple of the vector bitsize as a whole. > > That means extract something like a V2SI from a V4SI vector from position 32 > isn't possible

Re: [PATCH]AArch64 Extend umov and sbfx patterns.

2022-10-31 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: > Hi All, > > Our zero and sign extend and extract patterns are currently very limited and > only work for the original register size of the instructions. i.e. limited by > GPI patterns. However these instructions extract bits and extend. This means > that any register

Re: Extend fold_vec_perm to fold VEC_PERM_EXPR in VLA manner

2022-10-31 Thread Richard Sandiford via Gcc-patches
2022 at 21:38, Richard Sandiford >> > wrote: >> >> >> >> Richard Sandiford via Gcc-patches writes: >> >> > Prathamesh Kulkarni writes: >> >> >> Sorry to ask a silly question but in which case shall we select 2nd >> >

[pushed] aarch64: Reinstate some uses of CONSTEXPR

2022-10-27 Thread Richard Sandiford via Gcc-patches
In 9482a5e4eac8d696129ec2854b331e1bb5dbab42 I'd replaced uses of CONSTEXPR with direct uses of constexpr. However, it turns out that we still have CONSTEXPR for a reason: GCC 4.8 doesn't implement constexpr properly, and for example rejects things like: extern const int x; constexpr int x =

Re: Extend fold_vec_perm to fold VEC_PERM_EXPR in VLA manner

2022-10-26 Thread Richard Sandiford via Gcc-patches
Sorry for the slow response. I wanted to find some time to think about this a bit more. Prathamesh Kulkarni writes: > On Fri, 30 Sept 2022 at 21:38, Richard Sandiford > wrote: >> >> Richard Sandiford via Gcc-patches writes: >> > Prathamesh Kulkarni writes: >>

Re: [PATCH] Aarch64: Do not define DONT_USE_BUILTIN_SETJMP

2022-10-26 Thread Richard Sandiford via Gcc-patches
Eric Botcazou via Gcc-patches writes: > Hi, > > we have been using an Ada compiler for the Aarch64 architecture configured > with SJLJ exceptions as for the other architectures for some time, and have > not run into any problems so far so the setting looks obsolete now. > > OK for the mainline?

Re: [PATCH 1/2] Add a parameter for the builtin function of prefetch to align with LLVM

2022-10-24 Thread Richard Sandiford via Gcc-patches
Segher Boessenkool writes: > On Thu, Oct 20, 2022 at 07:34:13AM +, Jiang, Haochen wrote: >> > > + /* Argument 3 must be either zero or one. */ >> > > + if (INTVAL (op3) != 0 && INTVAL (op3) != 1) >> > > +{ >> > > + warning (0, "invalid fourth argument to %<__builtin_prefetch%>;"

Re: [PATCH][AArch64] Improve immediate expansion [PR106583]

2022-10-21 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra writes: > Hi Richard, > >> Can you do the aarch64_mov_imm changes as a separate patch? It's difficult >> to review the two changes folded together like this. > > Sure, I'll send a separate patch. So here is version 2 again: I still think we should move the functions to avoid the

Re: [PATCH v2] aarch64: update Ampere-1 core definition

2022-10-20 Thread Richard Sandiford via Gcc-patches
Richard Sandiford writes: > Philipp Tomsich writes: >> This brings the extensions detected by -mcpu=native on Ampere-1 systems >> in sync with the defaults generated for -mcpu=ampere1. >> >> Note that some early kernel versions on Ampere1 may misreport the >> presence of PAUTH and PREDRES (i.e.,

[pushed] aarch64: Use using directives to inherit constructors

2022-10-20 Thread Richard Sandiford via Gcc-patches
Now that the codebase is C++11, we can use using directives to inherit constructors from base classes. Tested on aarch64-linux-gnu & pushed. Richard gcc/ * config/aarch64/aarch64-sve-builtins-functions.h (quiet) (rtx_code_function, rtx_code_function_rotated,

<    3   4   5   6   7   8   9   10   11   12   >