[pushed] aarch64: Replace CONSTEXPR with constexpr

2022-10-20 Thread Richard Sandiford via Gcc-patches
Move away from the pre-C++11 compatibility macro CONSTEXPR. Tested on aarch64-linux-gnu & pushed. Richard gcc/ * config/aarch64/aarch64-sve-builtins-base.cc: Replace CONSTEXPR with constexpr throughout. * config/aarch64/aarch64-sve-builtins-functions.h: Likewise.

[pushed] aarch64: Commonise some folding code

2022-10-20 Thread Richard Sandiford via Gcc-patches
Add an aarch64_sve::gimple_folder helper for folding calls to integer constants. SME will make more use of this. Tested on aarch64-linux-gnu & pushed. Richard gcc/ * config/aarch64/aarch64-sve-builtins.h (gimple_folder::fold_to_cstu): New member function. *

[pushed] aarch64: Prevent generation of /M BRKAS and BRKBS

2022-10-20 Thread Richard Sandiford via Gcc-patches
Bit of a brown-paper-bag bug, but: GCC was generating non-existent merging forms of BRKAS and BRKBS. Those instructions only support zero predication (although BRKA and BRKB support both). Tested on aarch64-linux-gnu & pushed to trunk so far. I'll backport to release branches soon. Richard

[pushed] aarch64: Fix matching of BRKNS

2022-10-20 Thread Richard Sandiford via Gcc-patches
Unlike other flag-setting SVE instructions, BRKNS sets the flags based on an all-true governing predicate, rather than the GP operand. Tested on aarch64-linux-gnu & pushed to trunk so far. I'll backport to release branches soon. Richard gcc/ * config/aarch64/iterators.md (SVE_BRKP):

Re: [PATCH][AArch64] Improve immediate expansion [PR106583]

2022-10-20 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra writes: > ping > > > > Hi Richard, > Sounds good, but could you put it before the mode version, to avoid the forward declaration? >>> >>> I can swap them around but the forward declaration is still required as >>> aarch64_check_bitmask is 5000 lines before

Re: [PATCH] pch: Fix streaming of strings with embedded null bytes

2022-10-19 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek writes: > On Wed, Oct 19, 2022 at 12:54:11PM +0100, Richard Sandiford via Gcc-patches > wrote: >> Lewis Hyatt via Gcc-patches writes: >> > When a GTY'ed struct is streamed to PCH, any plain char* pointers it >> > contains >> > (wh

Re: [PATCH] pch: Fix streaming of strings with embedded null bytes

2022-10-19 Thread Richard Sandiford via Gcc-patches
Lewis Hyatt via Gcc-patches writes: > When a GTY'ed struct is streamed to PCH, any plain char* pointers it contains > (whether they live in GC-controlled memory or not) will be marked for PCH > output by the routine gt_pch_note_object in ggc-common.cc. This routine > special-cases plain char*

Re: [PATCH][RFT] Vectorization of first-order recurrences

2022-10-17 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Tue, 11 Oct 2022, Richard Sandiford wrote: > >> Richard Biener writes: >> > On Mon, 10 Oct 2022, Andrew Stubbs wrote: >> >> On 10/10/2022 12:03, Richard Biener wrote: >> >> > The following picks up the prototype by Ju-Zhe Zhong for vectorizing >> >> > first order

Re: [PATCH] libgcc: Move cfa_how into potential padding in struct frame_state_reg_info

2022-10-14 Thread Richard Sandiford via Gcc-patches
Florian Weimer via Gcc-patches writes: > On many architectures, there is a padding gap after the how array > member, and cfa_how can be moved there. This reduces the size of the > struct and the amount of memory that uw_frame_state_for has to clear. > > There is no measurable performance benefit

Re: [DOCS] Python Language Conventions

2022-10-13 Thread Richard Sandiford via Gcc-patches
Martin Liška writes: > On 10/13/22 12:03, Richard Sandiford wrote: >> Martin Liška writes: >>> I think we should add how Python scripts should be formatted. I noticed >>> that while reading the Modula-2 patchset where it follows the C/C++ style >>> when it comes to Python files. >>> >>> Ready to

Re: [DOCS] Python Language Conventions

2022-10-13 Thread Richard Sandiford via Gcc-patches
Martin Liška writes: > I think we should add how Python scripts should be formatted. I noticed > that while reading the Modula-2 patchset where it follows the C/C++ style > when it comes to Python files. > > Ready to be installed? > Thanks, > Martin Did you consider requiring black formatting

Re: [PATCH][AArch64] Improve bit tests [PR105773]

2022-10-12 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra writes: > Hi Richard, > >> Realise this is awkward, but: CC_NZmode is for operations that set only >> the N and Z flags to useful values. If we want to take advantage of V >> being zero then I think we need a different mode. >> >> We can't go all the way to CCmode because the

Re: [PATCH] machmode, v2: Introduce GET_MODE_NEXT_MODE with previous GET_MODE_WIDER_MODE meaning, add new GET_MODE_WIDER_MODE

2022-10-12 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek writes: > On Wed, Oct 12, 2022 at 11:15:40AM +0100, Richard Sandiford wrote: >> Looks good to me, just some minor comments below. > > Here is an updated patch. > >> How robust is the mechanism that guarantees HF comes before BF, >> and so is the mode that appears in the (new) wider

Re: [PATCH] machmode: Introduce GET_MODE_NEXT_MODE with previous GET_MODE_WIDER_MODE meaning, add new GET_MODE_WIDER_MODE

2022-10-12 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek writes: > On Wed, Oct 05, 2022 at 04:02:25PM -0400, Jason Merrill wrote: >> > > > @@ -5716,7 +5716,13 @@ emit_store_flag_1 (rtx target, enum rtx_ >> > > >{ >> > > > machine_mode optab_mode = mclass == MODE_CC ? CCmode : >> > > > compare_mode; >> > > > icode

Re: [PATCH][RFT] Vectorization of first-order recurrences

2022-10-12 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > + /* First-order recurrence autovectorization needs to handle permutation > + with indices = [nunits-1, nunits, nunits+1, ...]. */ > + vec_perm_builder sel (nunits, 1, 3); > + for (int i = 0; i < 3; ++i) > +sel.quick_push (nunits - dist + i); > +

Re: [PATCH][RFT] Vectorization of first-order recurrences

2022-10-11 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Mon, 10 Oct 2022, Andrew Stubbs wrote: >> On 10/10/2022 12:03, Richard Biener wrote: >> > The following picks up the prototype by Ju-Zhe Zhong for vectorizing >> > first order recurrences. That solves two TSVC missed optimization PRs. >> > >> > There's a new scalar

Re: [PATCH][AArch64] Improve immediate expansion [PR106583]

2022-10-07 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra writes: > Hi Richard, > >>> Yes, with a more general search loop we can get that case too - >>> it doesn't trigger much though. The code that checks for this is >>> now refactored into a new function. Given there are now many >>> more calls to aarch64_bitmask_imm, I added a

Re: [PATCH][AArch64] Improve immediate expansion [PR106583]

2022-10-07 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra via Gcc-patches writes: > Hi Richard, > >> Did you consider handling the case where the movks aren't for >> consecutive bitranges?  E.g. the patch handles: > >> but it looks like it would be fairly easy to extend it to: >> >>  0x12345678 > > Yes, with a more general search

Re: [PATCH v2] aarch64: update Ampere-1 core definition

2022-10-06 Thread Richard Sandiford via Gcc-patches
Philipp Tomsich writes: > This brings the extensions detected by -mcpu=native on Ampere-1 systems > in sync with the defaults generated for -mcpu=ampere1. > > Note that some early kernel versions on Ampere1 may misreport the > presence of PAUTH and PREDRES (i.e., -mcpu=native will add 'nopauth' >

Re: [PATCH v2] aarch64: fix off-by-one in reading cpuinfo

2022-10-06 Thread Richard Sandiford via Gcc-patches
Philipp Tomsich writes: > Fixes: 341573406b39 > > Don't subtract one from the result of strnlen() when trying to point > to the first character after the current string. This issue would > cause individual characters (where the 128 byte buffers are stitched > together) to be lost. > >

Re: [PATCH][AArch64] Improve bit tests [PR105773]

2022-10-05 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra writes: > Since AArch64 sets all flags on logical operations, comparisons with zero > can be combined into an AND even if the condition is LE or GT. > > Passes regress, OK for commit? > > gcc: > PR target/105773 > * config/aarch64/aarch64.cc

Re: [PATCH][AArch64] Improve immediate expansion [PR106583]

2022-10-05 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra writes: > Improve immediate expansion of immediates which can be created from a > bitmask immediate and 2 MOVKs. This reduces the number of 4-instruction > immediates in SPECINT/FP by 10-15%. > > Passes regress, OK for commit? > > gcc/ChangeLog: > > PR target/106583 >

Re: [PATCH] aarch64: update Ampere-1 core definition

2022-10-04 Thread Richard Sandiford via Gcc-patches
Philipp Tomsich writes: > This brings the extensions detected by -mcpu=native on Ampere-1 systems > in sync with the defaults generated for -mcpu=ampere1. > > Note that some kernel versions may misreport the presence of PAUTH and > PREDRES (i.e., -mcpu=native will add 'nopauth' and 'nopredres').

[PATCH] aarch64: Define __ARM_FEATURE_RCPC

2022-10-04 Thread Richard Sandiford via Gcc-patches
https://github.com/ARM-software/acle/pull/199 adds a new feature macro for RCPC, for use in things like inline assembly. This patch adds the associated support to GCC. Also, RCPC is required for Armv8.3-A and later, but the armv8.3-a entry didn't include it. This was probably harmless in

Re: [PATCH] aarch64: fix off-by-one in reading cpuinfo

2022-10-04 Thread Richard Sandiford via Gcc-patches
Philipp Tomsich writes: > Fixes: 341573406b39 > > Don't subtract one from the result of strnlen() when trying to point > to the first character after the current string. This issue would > cause individual characters (where the 128 byte buffers are stitched > together) to be lost. > >

Re: [PATCH] arm, aarch64, csky: Fix C++ ICEs with _Float16 and __fp16 [PR107080]

2022-09-30 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek writes: > On Fri, Sep 30, 2022 at 09:54:49AM -0400, Jason Merrill wrote: >> > Note, there is one further problem on aarch64/arm, types with HFmode >> > (_Float16 and __fp16) are there mangled as Dh (which is standard >> > Itanium mangling: >> > ::= Dh # IEEE 754r

Re: [PATCH v3] testsuite: Only run test on target if VMA == LMA

2022-09-30 Thread Richard Sandiford via Gcc-patches
Torbjörn SVENSSON writes: > Checking that the triplet matches arm*-*-eabi (or msp430-*-*) is not > enough to know if the execution will enter an endless loop, or if it > will give a meaningful result. As the execution test only work when > VMA and LMA are equal, make sure that this condition is

Re: PING^1 [PATCH] testsuite: Do not prefix linker script with "-Wl, "

2022-09-30 Thread Richard Sandiford via Gcc-patches
Torbjorn SVENSSON writes: > Hi, > > Ping: https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601831.html OK, thanks. Richard > Kind regards, > Torbjörn > > On 2022-09-19 18:57, Torbjörn SVENSSON wrote: >> The linker script should not be prefixed with "-Wl," - it's not an >> input file

Re: [PATCH v2] testsuite: Only run test on target if VMA == LMA

2022-09-30 Thread Richard Sandiford via Gcc-patches
Sorry for the slow reply. Torbjorn SVENSSON writes: > Hi Richard, > > Thanks for your review. > Comments below. > > On 2022-09-23 19:34, Richard Sandiford wrote: >> Torbjörn SVENSSON via Gcc-patches writes: >>> Checking that the triplet matches arm*-*-eabi (or msp430-*-*) is not >>> enough to

Re: Extend fold_vec_perm to fold VEC_PERM_EXPR in VLA manner

2022-09-30 Thread Richard Sandiford via Gcc-patches
Richard Sandiford via Gcc-patches writes: > Prathamesh Kulkarni writes: >> Sorry to ask a silly question but in which case shall we select 2nd vector ? >> For num_poly_int_coeffs == 2, >> a1 /trunc n1 == (a1 + 0x) / (n1.coeffs[0] + n1.coeffs[1]*x) >> If a1/trunc n1 s

Re: Extend fold_vec_perm to fold VEC_PERM_EXPR in VLA manner

2022-09-30 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni writes: > On Tue, 27 Sept 2022 at 01:59, Richard Sandiford > wrote: >> >> Prathamesh Kulkarni writes: >> > On Fri, 23 Sept 2022 at 21:33, Richard Sandiford >> > wrote: >> >> >> >> Prathamesh Kulkarni writes: >> >> > On Tue, 20 Sept 2022 at 18:09, Richard Sandiford >> >> >

Re: [PATCH 1/2]middle-end: RFC: On expansion of conditional branches, give hint if argument is a truth type to backend

2022-09-30 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: >> -Original Message- >> From: Richard Biener >> Sent: Friday, September 30, 2022 12:53 PM >> To: Tamar Christina >> Cc: Richard Sandiford ; Tamar Christina via >> Gcc-patches ; nd ; Jeff Law >> >> Subject: RE: [PATCH 1/2]middle-end: RFC: On expansion of

Re: [RFA] Avoid unnecessary load-immediate in coremark

2022-09-30 Thread Richard Sandiford via Gcc-patches
Jeff Law writes: > This is another minor improvement to coremark.   I suspect this only > improves code size as the load-immediate was likely issuing with the ret > statement on multi-issue machines. > > > Basically we're failing to utilize conditional equivalences during the > post-reload CSE

Re: [PATCH] Optimize nested permutation to single VEC_PERM_EXPR [PR54346]

2022-09-30 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Mon, Sep 26, 2022 at 8:58 AM Liwei Xu wrote: >> >> This patch implemented the optimization in PR 54346, which Merges >> >> c = VEC_PERM_EXPR ; >> d = VEC_PERM_EXPR ; >> to >> d = VEC_PERM_EXPR ; >> >>

Re: [PATCH 1/2]middle-end: RFC: On expansion of conditional branches, give hint if argument is a truth type to backend

2022-09-30 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: >> -Original Message- >> From: Richard Sandiford >> Sent: Friday, September 30, 2022 9:29 AM >> To: Tamar Christina >> Cc: Richard Biener ; Tamar Christina via Gcc-patches >> ; nd ; Jeff Law >> >> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of

Re: [PATCH 1/2]middle-end: RFC: On expansion of conditional branches, give hint if argument is a truth type to backend

2022-09-30 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: >> -Original Message- >> From: Gcc-patches > bounces+tamar.christina=arm@gcc.gnu.org> On Behalf Of Richard >> Biener via Gcc-patches >> Sent: Thursday, September 29, 2022 12:09 PM >> To: Tamar Christina via Gcc-patches >> Cc: Richard Sandiford ; nd >>

Re: [Unfinished PATCH] Add first-order recurrence autovectorization

2022-09-29 Thread Richard Sandiford via Gcc-patches
Thanks for posting the patch. juzhe.zh...@rivai.ai writes: > From: Ju-Zhe Zhong > > gcc/ChangeLog: > > * tree-vect-loop.cc (vect_phi_first_order_recurrence_p): New function. > (vect_analyze_scalar_cycles_1): Classify first-order recurrence phi. >

[pushed] data-ref: Fix ranges_maybe_overlap_p test

2022-09-29 Thread Richard Sandiford via Gcc-patches
dr_may_alias_p rightly used poly_int_tree_p to guard a use of ranges_maybe_overlap_p, but used the non-poly extractors. This caused a few failures in the SVE ACLE asm tests. Tested on aarch64-linux-gnu and x86_64-linux-gnu. Pushed as obvious. Richard gcc/ * tree-data-ref.cc

[PATCH 17/17] aarch64: Remove redundant TARGET_* checks

2022-09-29 Thread Richard Sandiford via Gcc-patches
After previous patches, it's possible to remove TARGET_* options that are redundant due to (IMO) obvious dependencies. gcc/ * config/aarch64/aarch64.h (TARGET_CRYPTO, TARGET_SHA3, TARGET_SM4) (TARGET_DOTPROD): Don't depend on TARGET_SIMD. (TARGET_AES, TARGET_SHA2):

[PATCH 13/17] aarch64: Tweak constness of option-related data

2022-09-29 Thread Richard Sandiford via Gcc-patches
Some of the option structures have all-const member variables. That doesn't seem necessary: we can just use const on the objects that are supposed to be read-only. Also, with the new, more C++-heavy option handling, it seems better to use constexpr for the static data, to make sure that we're not

[PATCH 11/17] aarch64: Simplify generation of .arch strings

2022-09-29 Thread Richard Sandiford via Gcc-patches
aarch64-common.cc has two arrays, one maintaining the original definition order and one sorted by population count. Sorting by population count was a way of ensuring topological ordering, taking advantage of the fact that the entries are partially ordered by the subset relation. However, the

[PATCH 16/17] aarch64: Tweak handling of -mgeneral-regs-only

2022-09-29 Thread Richard Sandiford via Gcc-patches
-mgeneral-regs-only is effectively "+nofp for the compiler without changing the assembler's ISA flags". Currently that's implemented by making TARGET_FLOAT, TARGET_SIMD and TARGET_SVE depend on !TARGET_GENERAL_REGS_ONLY and then making any feature that needs FP registers depend (directly or

[PATCH 14/17] aarch64: Make more use of aarch64_feature_flags

2022-09-29 Thread Richard Sandiford via Gcc-patches
A previous patch added a aarch64_feature_flags typedef, to abstract the representation of the feature flags. This patch makes existing code use the typedef too. Hope I've caught them all! gcc/ * common/config/aarch64/aarch64-common.cc: Use aarch64_feature_flags for feature flags

[PATCH 10/17] aarch64: Simplify feature definitions

2022-09-29 Thread Richard Sandiford via Gcc-patches
Currently the aarch64-option-extensions.def entries, the aarch64-cores.def entries, and the AARCH64_FL_FOR_* macros have a transitive closure of dependencies that is maintained by hand. This is a bit error-prone and is becoming less tenable as more features are added. The main point of this patch

[PATCH 15/17] aarch64: Tweak contents of flags_on/off fields

2022-09-29 Thread Richard Sandiford via Gcc-patches
After previous changes, it's more convenient if the flags_on and flags_off fields of all_extensions include the feature flag itself. gcc/ * common/config/aarch64/aarch64-common.cc (all_extensions): Include the feature flag in flags_on and flags_off.

[PATCH 08/17] aarch64: Fix transitive closure of features

2022-09-29 Thread Richard Sandiford via Gcc-patches
aarch64-option-extensions.def requires us to maintain the transitive closure of options by hand. This patch fixes a few cases where a flag was missed. +noaes and +nosha2 now disable +crypto, which IMO makes more sense and is consistent with the Clang behaviour. gcc/ *

[PATCH 12/17] aarch64: Avoid std::string in static data

2022-09-29 Thread Richard Sandiford via Gcc-patches
Just a minor patch to avoid having to construct std::strings in static data. gcc/ * common/config/aarch64/aarch64-common.cc (processor_name_to_arch) (arch_to_arch_name): Use const char * instead of std::string. --- gcc/common/config/aarch64/aarch64-common.cc | 4 ++-- 1 file

[PATCH 09/17] aarch64: Reorder an entry in aarch64-option-extensions.def

2022-09-29 Thread Richard Sandiford via Gcc-patches
aarch64-option-extensions.def was topologically sorted except for one case: crypto came before its aes and sha2 dependencies. This patch moves crypto after sha2 instead. gcc/ * config/aarch64/aarch64-option-extensions.def: Move crypto after sha2. gcc/testsuite/ *

[PATCH 06/17] aarch64: Avoid redundancy in aarch64-cores.def

2022-09-29 Thread Richard Sandiford via Gcc-patches
The flags fields of the aarch64-cores.def always start with AARCH64_FL_FOR_. After previous changes, is always identical to the previous field, so we can drop the explicit AARCH64_FL_FOR_ and derive it programmatically. This isn't a big saving in itself, but it helps with later patches. gcc/

[PATCH 07/17] aarch64: Remove AARCH64_FL_RCPC8_4 [PR107025]

2022-09-29 Thread Richard Sandiford via Gcc-patches
AARCH64_FL_RCPC8_4 is an odd-one-out in that it has no associated entry in aarch64-option-extensions.def. This means that, although it is internally separated from AARCH64_FL_V8_4A, there is no mechanism for turning it on and off individually, independently of armv8.4-a. The only place that the

[PATCH 05/17] aarch64: Small config.gcc cleanups

2022-09-29 Thread Richard Sandiford via Gcc-patches
The aarch64-option-extensions.def parsing in config.gcc had some code left over from when it tried to parse the whole macro definition. Also, config.gcc now only looks at the first fields of the aarch64-arches.def entries. gcc/ * config.gcc: Remove dead aarch64-option-extensions.def

[PATCH 03/17] aarch64: Rename AARCH64_FL_FOR_ARCH macros

2022-09-29 Thread Richard Sandiford via Gcc-patches
This patch renames AARCH64_FL_FOR_ARCH* macros to follow the same V names that we (now) use elsewhere. The names are only temporary -- a later patch will move the information to the .def file instead. However, it helps with the sequencing to do this first. gcc/ *

[PATCH 04/17] aarch64: Add "V" to aarch64-arches.def names

2022-09-29 Thread Richard Sandiford via Gcc-patches
This patch completes the renaming of architecture-level related things by adding "V" to the name of the architecture in aarch64-arches.def. Since the "V" is predictable, we can easily drop it when we don't need it (as when matching /proc/cpuinfo). Having a valid C identifier is necessary for

[PATCH 02/17] aarch64: Rename AARCH64_FL architecture-level macros

2022-09-29 Thread Richard Sandiford via Gcc-patches
Following on from the previous AARCH64_ISA patch, this one adds the profile name directly to the end of architecture-level AARCH64_FL_* macros. gcc/ * config/aarch64/aarch64.h (AARCH64_FL_V8_1, AARCH64_FL_V8_2) (AARCH64_FL_V8_3, AARCH64_FL_V8_4, AARCH64_FL_V8_5, AARCH64_FL_V8_6)

[PATCH 01/17] aarch64: Rename AARCH64_ISA architecture-level macros

2022-09-29 Thread Richard Sandiford via Gcc-patches
All AARCH64_ISA_* architecture-level macros except AARCH64_ISA_V8_R are for the A profile: they cause __ARM_ARCH_PROFILE to be set to 'A' and they are associated with architecture names like armv8.4-a. It's convenient for later patches if we make this explicit by adding an "A" to the name. Also,

[PATCH 00/17] Rework aarch64 feature macro definitions

2022-09-29 Thread Richard Sandiford via Gcc-patches
This series of patches supposedly cleans up the definition of the AArch64 ISA features. The main aims are: - to make the naming more consistent - to reduce the amount of boilerplate needed - to avoid the need to maintain transitive closures by hand - to enforce a sensible (topological) order on

Re: [PATCH] vect: while_ult for integer mask

2022-09-29 Thread Richard Sandiford via Gcc-patches
Andrew Stubbs writes: > On 29/09/2022 10:24, Richard Sandiford wrote: >> Otherwise: >> >>operand0[0] = operand1 < operand2; >>for (i = 1; i < operand3; i++) >> operand0[i] = operand0[i - 1] && (operand1 + i < operand2); >> >> looks like a "length and mask" operation, which IIUC is

Re: [PATCH 1/2]middle-end: RFC: On expansion of conditional branches, give hint if argument is a truth type to backend

2022-09-29 Thread Richard Sandiford via Gcc-patches
Jeff Law writes: > On 9/28/22 09:04, Richard Sandiford wrote: >> Tamar Christina writes: Maybe the target could use (subreg:SI (reg:BI ...)) as argument. Heh. >>> But then I'd still need to change the expansion code. I suppose this could >>> prevent the issue with changes to code on other

Re: [PATCH] vect: while_ult for integer mask

2022-09-29 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches writes: > On Wed, Sep 28, 2022 at 5:06 PM Andrew Stubbs wrote: >> >> This patch is a prerequisite for some amdgcn patches I'm working on to >> support shorter vector lengths (having fixed 64 lanes tends to miss >> optimizations, and masking is not supported

Re: [PATCH 1/2]middle-end: RFC: On expansion of conditional branches, give hint if argument is a truth type to backend

2022-09-28 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: >> Maybe the target could use (subreg:SI (reg:BI ...)) as argument. Heh. > > But then I'd still need to change the expansion code. I suppose this could > prevent the issue with changes to code on other targets. > >> > > We have undocumented addcc, negcc, etc. patterns,

[PATCH] Add OPTIONS_H_EXTRA to GTFILES

2022-09-28 Thread Richard Sandiford via Gcc-patches
I have a patch that adds a typedef to aarch64's -opts.h. The typedef is used for a TargetVariable in the .opt file, which means that it is covered by PCH and so needs to be visible to gengtype. -opts.h is not included directly in tm.h, but indirectly by target headers (in this case aarch64.h).

Re: Extend fold_vec_perm to fold VEC_PERM_EXPR in VLA manner

2022-09-26 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni writes: > On Fri, 23 Sept 2022 at 21:33, Richard Sandiford > wrote: >> >> Prathamesh Kulkarni writes: >> > On Tue, 20 Sept 2022 at 18:09, Richard Sandiford >> > wrote: >> >> >> >> Prathamesh Kulkarni writes: >> >> > On Mon, 12 Sept 2022 at 19:57, Richard Sandiford >> >> >

Re: [PATCH v2] testsuite: Only run test on target if VMA == LMA

2022-09-23 Thread Richard Sandiford via Gcc-patches
Torbjörn SVENSSON via Gcc-patches writes: > Checking that the triplet matches arm*-*-eabi (or msp430-*-*) is not > enough to know if the execution will enter an endless loop, or if it > will give a meaningful result. As the execution test only work when > VMA and LMA are equal, make sure that

Re: [PATCH v2] testsuite: Skip intrinsics test if arm

2022-09-23 Thread Richard Sandiford via Gcc-patches
Torbjörn SVENSSON via Gcc-patches writes: > In the test cases, it's clearly written that intrinsics is not > implemented on arm*. A simple xfail does not help since there are > link error and that would cause an UNRESOLVED testcase rather than > XFAIL. > By changing to dg-skip-if, the entire test

Re: Extend fold_vec_perm to fold VEC_PERM_EXPR in VLA manner

2022-09-23 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni writes: > On Tue, 20 Sept 2022 at 18:09, Richard Sandiford > wrote: >> >> Prathamesh Kulkarni writes: >> > On Mon, 12 Sept 2022 at 19:57, Richard Sandiford >> > wrote: >> >> >> >> Prathamesh Kulkarni writes: >> >> >> The VLA encoding encodes the first N patterns

Re: [PATCH 2/2]AArch64 Perform more late folding of reg moves and shifts which arrive after expand

2022-09-23 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: > Hi All, > > Similar to the 1/2 patch but adds additional back-end specific folding for if > the register sequence was created as a result of RTL optimizations. > > Concretely: > > #include > > unsigned int foor (uint32x4_t x) > { > return x[1] >> 16; > } > >

Re: [PATCH 2/2]AArch64 Add support for neg on v1df

2022-09-23 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: >> -Original Message- >> From: Richard Sandiford >> Sent: Friday, September 23, 2022 6:04 AM >> To: Tamar Christina >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw >> ; Marcus Shawcroft >> ; Kyrylo Tkachov >> Subject: Re: [PATCH 2/2]AArch64 Add support for

Re: [PATCH 2/2]AArch64 Add support for neg on v1df

2022-09-23 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: >> -Original Message- >> From: Richard Sandiford >> Sent: Friday, September 23, 2022 5:30 AM >> To: Tamar Christina >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw >> ; Marcus Shawcroft >> ; Kyrylo Tkachov >> Subject: Re: [PATCH 2/2]AArch64 Add support for

Re: [PATCH 2/2]AArch64 Extend tbz pattern to allow SI to SI extensions.

2022-09-23 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: > Hi All, > > This adds additional recognition of & 1 into the tbz/tbnz pattern. > > Concretely with the mid-end changes this changes > > void g1(bool x) > { > if (__builtin_expect (x, 0)) > h (); > } > > from > > tst w0, 255 > bne .L7 > > to

Re: [PATCH][testsuite]: make check-functions-body dump expected and seen cases on failure.

2022-09-23 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: > Hi All, > > Often times when a check_function_body check fails it can be quite hard to > figure out why as no additional information is provided. > > This changes it so that on failures it prints out the regex expression it's > using and the text it's comparing against

Re: [PATCH 2/2]AArch64 Add support for neg on v1df

2022-09-23 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: > Hi All, > > This adds support for using scalar fneg on the V1DF type. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? Why just this one operation though? Couldn't we extend iterators like GPF_F16 to include V1DF, avoiding the need

[pushed] aarch64: Fix GTY markup for arm_sve.h [PR106491]

2022-09-20 Thread Richard Sandiford via Gcc-patches
It turns out that GTY(()) markers in definitions like: GTY(()) tree scalar_types[NUM_VECTOR_TYPES]; are not effective and are silently ignored. The GTY(()) has to come after an extern or static. The externs associated with the SVE ACLE GTY variables are in aarch64-sve-builtins.h. This file

Re: Extend fold_vec_perm to fold VEC_PERM_EXPR in VLA manner

2022-09-20 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni writes: > On Mon, 12 Sept 2022 at 19:57, Richard Sandiford > wrote: >> >> Prathamesh Kulkarni writes: >> >> The VLA encoding encodes the first N patterns explicitly. The >> >> npatterns/nelts_per_pattern values then describe how to extend that >> >> initial sequence to an

Re: [PATCH] genrecog.cc (print_nonbool_test): Fix type error of SUBREG_BYTE

2022-09-20 Thread Richard Sandiford via Gcc-patches
Jojo R via Gcc-patches writes: > * gcc/genrecog.cc (print_nonbool_test): Fix type error of > SUBREG_BYTE We can't do this here. The code has done nothing to prove that the subreg offset is a compile-time constant. Thanks, Richard > --- > gcc/genrecog.cc | 1 + > 1 file changed, 1

Re: [PATCH] Rewrite NAN and sign handling in frange

2022-09-16 Thread Richard Sandiford via Gcc-patches
Aldy Hernandez via Gcc-patches writes: > On Thu, Sep 15, 2022 at 9:06 AM Richard Biener > wrote: >> >> On Thu, Sep 15, 2022 at 7:41 AM Aldy Hernandez wrote: >> > >> > Hi Richard. Hi all. >> > >> > The attatched patch rewrites the NAN and sign handling, dropping both >> > tristates in favor of

[PATCH] vect: Fix SLP layout handling of masked loads [PR106794]

2022-09-16 Thread Richard Sandiford via Gcc-patches
PR106794 shows that I'd forgotten about masked loads when doing the SLP layout changes. These loads can't currently be permuted independently of their mask input, so during construction they never get a load permutation. (If we did support permuting masked loads in future, the mask would need to

[PATCH] vect: Fix missed gather load opportunity

2022-09-16 Thread Richard Sandiford via Gcc-patches
While writing a testcase for PR106794, I noticed that we failed to vectorise the testcase in the patch for SVE. The code that recognises gather loads tries to optimise the point at which the offset is calculated, to avoid unnecessary extensions or truncations: /* Don't include the

Re: Basic REG_EQUIV comprehension question

2022-09-15 Thread Richard Sandiford via Gcc-patches
Robin Dapp via Gcc-patches writes: > Hi, > > I have been working on making better use of s390's vzero instruction. > Currently we rather zero a vector register once and load it into other > registers via vlr instead of emitting multiple vzeros. > > At IRA/reload point we e.g. have > > (insn 8 5

Re: [PATCH] Move void_list_node init to common code

2022-09-15 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches writes: > All frontends replicate this, so move it. > > Bootstrap and regtest running for all languages on > x86_64-unknown-linux-gnu. > > OK if that succeeds? LGTM FWIW. Thanks, Richard > Thanks, > Richard. > > gcc/ > * tree.cc (build_common_tree_nodes):

[pushed] aarch64: Vector move fixes for +nosimd

2022-09-13 Thread Richard Sandiford via Gcc-patches
This patch fixes various issues around the handling of vectors and (particularly) vector structures with +nosimd. Previously, passing and returning structures would trigger an ICE, since: * we didn't allow the structure modes to be stored in FPRs * we didn't provide +nosimd move patterns *

[pushed] aarch64: Disassociate ls64 from simd

2022-09-13 Thread Richard Sandiford via Gcc-patches
The ls64-related move expanders and splits required TARGET_SIMD. That isn't necessary, since the 64-byte values are stored entirely in GPRs. (The associated define_insn was already correct.) I wondered about moving the patterns to aarch64.md, but it wasn't clear-cut. Tested on aarch64-linux-gnu

Re: [PATCH] testsuite: gluefile file need to be prefixed

2022-09-12 Thread Richard Sandiford via Gcc-patches
Yvan Roux writes: > Hi Richard, > On Mon, Sep 12, 2022 at 12:56:52PM +0100, Richard Sandiford via Gcc-patches > wrote: >> Torbjörn SVENSSON via Gcc-patches writes: >> > PR/95720 >> > When the status wrapper is used, the gluefile need to be prefixed with >

Re: [PATCH] PR rtl-optimization/106594: Preserve zero_extend when cheap.

2022-09-12 Thread Richard Sandiford via Gcc-patches
"Roger Sayle" writes: > Hi Richard, > >> "Roger Sayle" writes: >> > This patch addresses PR rtl-optimization/106594, a significant >> > performance regression affecting aarch64 recently introduced (exposed) >> > by one of my recent RTL simplification improvements. Firstly many >> > thanks to

Re: Extend fold_vec_perm to fold VEC_PERM_EXPR in VLA manner

2022-09-12 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni writes: > On Mon, 5 Sept 2022 at 15:51, Richard Sandiford > wrote: >> >> Sorry for the slow reply. I wrote a response a couple of weeks ago >> but I think it get lost in a machine outage. >> >> Prathamesh Kulkarni writes: >> > Hi, >> > The attached prototype patch extends

Re: [PATCH] testsuite: gluefile file need to be prefixed

2022-09-12 Thread Richard Sandiford via Gcc-patches
Torbjörn SVENSSON via Gcc-patches writes: > PR/95720 > When the status wrapper is used, the gluefile need to be prefixed with > -Wl, in order for the test cases to have the dump files with the > expected names. > > gcc/testsuite/ChangeLog: > > * gcc/testsuite/lib/g++.exp: Moved gluefile

Re: [PATCH] PR rtl-optimization/106594: Preserve zero_extend when cheap.

2022-09-12 Thread Richard Sandiford via Gcc-patches
"Roger Sayle" writes: > This patch addresses PR rtl-optimization/106594, a significant performance > regression affecting aarch64 recently introduced (exposed) by one of my > recent RTL simplification improvements. Firstly many thanks to > Tamar Christina for confirming that the core of this

[pushed] vect: Fix scalar stmt typo in vect_optimize_slp_pass [PR106886]

2022-09-08 Thread Richard Sandiford via Gcc-patches
Fix a stupid typo in my vect_optimize_slp_pass patch. Tested on aarch64-linux-gnu, pushed as obvious. Richard gcc/ PR tree-optimization/106886 * tree-vect-slp.cc (vect_optimize_slp_pass::get_result_with_layout): Fix copying of scalar stmts. gcc/testsuite/ PR

Re: [PING^2][PATCH][gdb/build] Fix build breaker with --enabled-shared

2022-09-07 Thread Richard Sandiford via Gcc-patches
Tom de Vries via Gcc-patches writes: > On 7/12/22 15:42, Tom de Vries wrote: >> [ dropped gdb-patches, since already applied there. ] >> >> On 6/27/22 15:38, Tom de Vries wrote: >>> On 6/27/22 15:03, Tom de Vries wrote: Hi, When building gdb with --enabled-shared, I run into:

Re: [RFC] database with API information

2022-09-07 Thread Richard Sandiford via Gcc
Ulrich Drepper via Gcc writes: > I talked to Jonathan the other day about adding all the C++ library APIs to > the name hint file now that the size of the table is not really a concern > anymore. > > Jonathan mentioned that he has to create and maintain a similar file for > the module support.

[pushed] aarch64: Fix +nosimd handling of FPR moves

2022-09-07 Thread Richard Sandiford via Gcc-patches
8-bit and 16-bit FPR moves would ICE for +nosimd+fp, and some other moves would handle FPR<-zero inefficiently. This is very much a niche case at the moment, but something like it becomes more important with SME streaming mode. The si, di and vector tests already passed, they're just included

[pushed] aarch64: Prevent FPR register asms for +nofp

2022-09-07 Thread Richard Sandiford via Gcc-patches
+nofp disabled the automatic allocation of FPRs, but it didn't stop users from explicitly putting register variables in FPRs. We'd then either report an ICE or generate unsupported instructions. It's still possible (and deliberately redundant) to specify FPRs in clobber lists. Tested on

[pushed] aarch64: Remove lazy SIMD builtin initialisation

2022-09-05 Thread Richard Sandiford via Gcc-patches
At one time the aarch64 port registered the Advanced SIMD builtins lazily, when we first encountered a set of target flags that includes +simd. These days we always initialise them at start-up, temporarily forcing a conducive set of flags if necessary. This patch removes some vestiges of the old

Re: Extend fold_vec_perm to fold VEC_PERM_EXPR in VLA manner

2022-09-05 Thread Richard Sandiford via Gcc-patches
Sorry for the slow reply. I wrote a response a couple of weeks ago but I think it get lost in a machine outage. Prathamesh Kulkarni writes: > Hi, > The attached prototype patch extends fold_vec_perm to fold VEC_PERM_EXPR > in VLA manner, and currently handles the following cases: > (a) fixed

Re: [PATCH 0/3] picolibc: Add picolibc linking help

2022-09-02 Thread Richard Sandiford via Gcc
Keith Packard via Gcc-patches writes: > Picolibc is a C library for embedded systems based on code from newlib > and avr libc. To connect some system-dependent picolibc functions > (like stdio) to an underlying platform, the platform may provide an OS > library. > > This OS library must follow

Re: [PATCH 0/3] picolibc: Add picolibc linking help

2022-09-02 Thread Richard Sandiford via Gcc-patches
Keith Packard via Gcc-patches writes: > Picolibc is a C library for embedded systems based on code from newlib > and avr libc. To connect some system-dependent picolibc functions > (like stdio) to an underlying platform, the platform may provide an OS > library. > > This OS library must follow

[PATCH] vect: Use better fallback costs in layout subpass

2022-09-02 Thread Richard Sandiford via Gcc-patches
vect_optimize_slp_pass always treats the starting layout as valid, to avoid having to "optimise" when every possible choice is invalid. But it gives the starting layout a high cost if it seems like the target might reject it, in the hope that this will encourage other (valid) layouts. The

[PATCH] vect: Ensure SLP nodes don't end up in multiple BB partitions [PR106787]

2022-09-02 Thread Richard Sandiford via Gcc-patches
In the PR we have two REDUC_PLUS SLP instances that share a common load of stride 4. Each instance also has a unique contiguous load. Initially all three loads are out of order, so have a nontrivial load permutation. The layout pass puts them in order instead, For the two contiguous loads it is

[PATCH] vect: Try to remove single-vector permutes from SLP graph

2022-09-01 Thread Richard Sandiford via Gcc-patches
This patch extends the SLP layout optimisation pass so that it tries to remove layout changes that are brought about by permutes of existing vectors. This fixes the bb-slp-pr54400.c regression on x86_64 and also means that we can remove the permutes in cases like: typedef float v4sf

[pushed] aarch64: Update sizeless tests for recent GNU C changes

2022-08-31 Thread Richard Sandiford via Gcc-patches
The tests for sizeless SVE types include checks that the types are handled for initialisation purposes in the same way as scalars. GNU C and C2x now allow scalars to be initialised using empty braces, so this patch updates the SVE tests to match. Tested on aarch64-linux-gnu & pushed. Richard

[pushed] vect: Fix stray argument in call to dump_printf_loc

2022-08-31 Thread Richard Sandiford via Gcc-patches
One call to dump_printf_loc had a stray left-over argument from an earlier version of the patch. This went unnoticed on aarch64-linux-gnu and x86_64-linux-gnu since the parameters that actually mattered were passed in FPRs rather than GPRs, but I assume this is the reason for the i686-linux-gnu

<    4   5   6   7   8   9   10   11   12   13   >