[PATCH] tree-optimization/117050 - fix ICE with non-grouped .MASK_LOAD SLP

2024-10-10 Thread Richard Biener
The following temporarily reverts the support of permuted .MASK_LOAD for the case of non-grouped accesses. Bootstrap and regtest running on x86_64-unknown-linux-gnu. PR tree-optimization/117050 * tree-vect-slp.cc (vect_build_slp_tree_2): Do not support permutes of non-grou

Re: [PATCH] phiopt: Remove candorest variable return instead

2024-10-10 Thread Richard Biener
> Am 10.10.2024 um 17:23 schrieb Andrew Pinski : > > After r15-3560-gb081e6c860eb9688d24365d39, the setting of candorest > with the break can just change to a return since this is inside a lambda now. > > Bootstrapped and tested on x86_64-linux-gnu. Ok Richard > gcc/ChangeLog: > >* t

Re: Fix PR116650: check all regs in regrename targets

2024-10-10 Thread Richard Biener
> Am 10.10.2024 um 16:56 schrieb Michael Matz : > > (this came up for m68k vs. LRA, but is a generic problem) > > Regrename wants to use new registers for certain def-use chains. > For validity of replacements it needs to check that the selected > candidates are unused up to then. That's don

[PATCH] tree-optimization/117060 - fix oversight in vect_build_slp_tree_1

2024-10-10 Thread Richard Biener
We are failing to match call vs. non-call when dealing with matching loads or stores. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. PR tree-optimization/117060 * tree-vect-slp.cc (vect_build_slp_tree_1): When comparing calls also fail if the first isn't a ca

Re: [PATCH] vect: Avoid divide by zero for permutes of extern VLA vectors

2024-10-10 Thread Richard Biener
ny lanes per vector. */ > - if (children.length () == 1 > - && known_eq (SLP_TREE_LANES (child) * nunits, > -SLP_TREE_LANES (node) * op_nunits * 2)) > + else if (children.length () == 1 > +&& known_eq (SLP_TREE_LANES (chi

[PATCH] Fix possible wrong-code with masked store-lanes

2024-10-10 Thread Richard Biener
When we're doing masked store-lanes one mask element applies to all loads of one struct element. This requires uniform masks for all of the SLP lanes, something we already compute into STMT_VINFO_SLP_VECT_ONLY but fail to check when doing SLP store-lanes. The following corrects this. The followi

[PATCH 2/2] tree-optimization/117050 - fix ICE with non-grouped .MASK_LOAD SLP

2024-10-10 Thread Richard Biener
The following fixes an oversight when handling permuted non-grouped .MASK_LOAD SLP discovery. Bootstrapped and tested on x86_64-unknown-linux-gnu. This requires 1/2. PR tree-optimization/117050 * tree-vect-slp.cc (vect_build_slp_tree_2): Properly handle non-grouped masked

[PATCH 1/2] Remove SLP_INSTANCE_UNROLLING_FACTOR, compute VF in vect_make_slp_decision

2024-10-10 Thread Richard Biener
The following prepares us for SLP instances with a non-uniform number of lanes. We already have this with load permutation lowering, but we managed to keep that within the constraints of the per SLP instance computed VF based on its max_nunits (with a vector type fixed for each node) and the insta

Re: [PATCH] fold fold_truth_andor field merging into ifcombine was: [PATCH] assorted improvements for fold_truth_andor_1)

2024-10-10 Thread Richard Biener
On Thu, Sep 26, 2024 at 10:49 AM Alexandre Oliva wrote: > > > This patch introduces various improvements to the logic that merges > field compares, moving it into ifcombine. > > Before the patch, we could merge: > > (a.x1 EQNE b.x1) ANDOR (a.y1 EQNE b.y1) > > into something like: > > (((type

Re: [PATCH] [PR116831] match.pd: Check trunc_mod vector obtap before folding.

2024-10-10 Thread Richard Biener
On Wed, 9 Oct 2024, Jennifer Schmitz wrote: > > > On 8 Oct 2024, at 10:31, Richard Biener wrote: > > > > External email: Use caution opening links or attachments > > > > > > On Fri, 4 Oct 2024, Jennifer Schmitz wrote: > > > >> As in ht

[PATCH] Allow SLP store of mixed external and costant

2024-10-09 Thread Richard Biener
vect_build_slp_tree_1 rejected this during SLP discovery because it ran into the rhs code comparison code for stores. The following skips that completely for loads and stores as those are handled later anyway. This needs a heuristic adjustment in vect_get_and_check_slp_defs to avoid fallout with

[PATCH] Clear DR_GROUP_NEXT_ELEMENT upon group dissolving

2024-10-09 Thread Richard Biener
I've tried to sanitize DR_GROUP_NEXT_ELEMENT accesses but there are too many so the following instead makes sure DR_GROUP_NEXT_ELEMENT is never non-NULL for !STMT_VINFO_GROUPED_ACCESS. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. * tree-vect-data-refs.cc (vect_analyze_data

[PATCH] tree-optimization/117041 - fix load classification of former grouped load

2024-10-09 Thread Richard Biener
When we first detect a grouped load but later dis-associate it we only set DR_GROUP_FIRST_ELEMENT to NULL, indicating it is not a STMT_VINFO_GROUPED_ACCESS but leave DR_GROUP_NEXT_ELEMENT set. This causes a stray DR_GROUP_NEXT_ELEMENT access in get_group_load_store_type to go wrong, indicating a l

Re: [PATCH v3 0/2] ia64: enable LRA and un-obsolete ia64*-*-linux

2024-10-09 Thread Richard Biener
On Wed, 9 Oct 2024, Frank Scheiner wrote: > On 09.10.24 10:26, Richard Biener wrote: > > On Wed, 9 Oct 2024, Richard Biener wrote: > > > >> On Tue, 8 Oct 2024, Frank Scheiner wrote: > >> > >>> With stage 3 of GCC 15 approaching, to save me some time b

Re: [PATCH v3 0/2] ia64: enable LRA and un-obsolete ia64*-*-linux

2024-10-09 Thread Richard Biener
On Wed, 9 Oct 2024, Richard Biener wrote: > On Tue, 8 Oct 2024, Frank Scheiner wrote: > > > With stage 3 of GCC 15 approaching, to save me some time by finally > > dropping the non-LRA testcase from my cross builds of GCC and Linux and > > as I had the time, I updated

Re: [PATCH v3 0/2] ia64: enable LRA and un-obsolete ia64*-*-linux

2024-10-09 Thread Richard Biener
them is attached. > > Can this be brought forward now as is? I'll push this for you. Thanks, Richard. > Cheers, > Frank > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

RE: [PATCH]middle-end: support SLP early break

2024-10-09 Thread Richard Biener
On Tue, 8 Oct 2024, Tamar Christina wrote: > > -Original Message- > > From: Richard Biener > > Sent: Wednesday, October 2, 2024 1:50 PM > > To: Tamar Christina > > Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com > > Subject: Re: [PATCH

Re: [PATCH v3 2/2] Adjust testcase after relax O2 vectorization.

2024-10-09 Thread Richard Biener
On Wed, Oct 9, 2024 at 3:27 AM liuhongt wrote: > > Update in V3. > >The testcase looks bogus: > > > > b[i+k] = b[i+k-5] + 2; > > > >accesses b[-3], can you instead adjust the inner loop to start with k == 4? > > Changed, also adjust b[100] to b[200] to avoid array out of bound. > > >Please r

Re: [PATCH v3 1/2] Enable vectorization for unknown tripcount in very cheap cost model but disable epilog vectorization.

2024-10-09 Thread Richard Biener
On Wed, Oct 9, 2024 at 3:27 AM liuhongt wrote: > > >We'd also need to update the documentation: > > >... The @samp{very-cheap} model only > >allows vectorization if the vector code would entirely replace the > >scalar code that is being vectorized. For example, if each iteration > >of a vectorize

[PATCH] tree-optimization/116575 - handle SLP of permuted masked loads

2024-10-08 Thread Richard Biener
The following handles SLP discovery of permuted masked loads which was prohibited (because wrongly handled) for PR114375. In particular with single-lane SLP at the moment all masked group loads appear permuted and we fail to use masked load lanes as well. The following addresses parts of the issu

[PATCH] Fix memory leak in vect_cse_slp_nodes

2024-10-08 Thread Richard Biener
The following avoids copying scalar stmts again for the re-lookup of the slot to replace the NULL guard with node. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. * tree-vect-slp.cc (vect_cse_slp_nodes): Fix memory leak. --- gcc/tree-vect-slp.cc | 2 +- 1 file ch

[PATCH] tree-optimization/116974 - Handle single-lane SLP for OMP scan store

2024-10-08 Thread Richard Biener
The following massages the GIMPLE matching way of handling scan stores to work with single-lane SLP. I do not fully understand all the cases that can happen and the stmt matching at vectorizable_store time is less than ideal - but the following gets me all the testcases to pass with and without fo

Re: [PATCH v2 2/2] Adjust testcase after relax O2 vectorization.

2024-10-08 Thread Richard Biener
On Tue, Oct 8, 2024 at 11:14 AM Hongtao Liu wrote: > > On Tue, Oct 8, 2024 at 4:56 PM Richard Biener > wrote: > > > > On Tue, Oct 8, 2024 at 10:36 AM liuhongt wrote: > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.dg/fstac

Re: [PATCH v1 1/3] Match: Support form 3 and form 4 for scalar signed integer SAT_SUB

2024-10-08 Thread Richard Biener
On Tue, Oct 8, 2024 at 3:23 AM wrote: > > From: Pan Li > > This patch would like to support the form 3 and form 4 of the scalar signed > integer SAT_SUB. Aka below example: > > Form 3: > #define DEF_SAT_S_ADD_FMT_3(T, UT, MIN, MAX) \ > T __attribute__((noinline))

Re: [PATCH v1 1/4] Match: Support form 1 for scalar signed integer SAT_TRUNC

2024-10-08 Thread Richard Biener
On Tue, Oct 8, 2024 at 10:34 AM wrote: > > From: Pan Li > > This patch would like to support the form 1 of the scalar signed > integer SAT_TRUNC. Aka below example: > > Form 1: > #define DEF_SAT_S_TRUNC_FMT_1(NT, WT, NT_MIN, NT_MAX) \ > NT __attribute__((noinline)) \

Re: [PATCH v1 2/4] Widening-Mul: Fix one bug of consume after phi node released

2024-10-08 Thread Richard Biener
On Tue, Oct 8, 2024 at 10:34 AM wrote: > > From: Pan Li > > When try to matching saturation related pattern on PHI node, we may have > to try each pattern for all phi node of bb. Aka: > > for each PHI node in bb: > gphi *phi = xxx; > try_match_sat_add (, phi); > try_match_sat_sub (, phi);

Re: [PATCH v2 2/2] Adjust testcase after relax O2 vectorization.

2024-10-08 Thread Richard Biener
On Tue, Oct 8, 2024 at 10:36 AM liuhongt wrote: > > gcc/testsuite/ChangeLog: > > * gcc.dg/fstack-protector-strong.c: Adjust > scan-assembler-times. > * gcc.dg/graphite/scop-6.c: Add > -Wno-aggressive-loop-optimizations. > * gcc.dg/graphite/scop-9.c: Ditto. >

Re: [PATCH v2 1/2] Enable vectorization for unknown tripcount in very cheap cost model but disable epilog vectorization.

2024-10-08 Thread Richard Biener
On Tue, Oct 8, 2024 at 10:36 AM liuhongt wrote: > > >So should we adjust very-cheap to allow niter peeling as proposed or > >should we switch the default at -O2 to cheap? > I prefer the former. > > Update in V2: > Adjust testcase after relax O2 vectorization. > > Ok for trunk? OK. Thanks, Richar

Re: [PATCH] [PR86710][PR116826] match.pd: Fold logarithmic identities.

2024-10-08 Thread Richard Biener
On Thu, 3 Oct 2024, Jennifer Schmitz wrote: > > > > On 1 Oct 2024, at 14:27, Richard Biener wrote: > > > > External email: Use caution opening links or attachments > > > > > > On Tue, 1 Oct 2024, Jennifer Schmitz wrote: > > > >> Th

Re: [PATCH] [PR116831] match.pd: Check trunc_mod vector obtap before folding.

2024-10-08 Thread Richard Biener
heck for > mod optab support. > > gcc/testsuite/ > PR tree-optimization/116831 > * gcc.dg/torture/pr116831.c: New test. > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

[PATCH] tree-optimization/117000 - elide .REDUC_IOR with compare against zero

2024-10-08 Thread Richard Biener
The following adds a pattern to elide a .REDUC_IOR operation when the result is compared against zero with a cbranch. I've resorted to using can_compare_p since that's what RTL expansion eventually checks - while GIMPLE allowed whole vector equality compares for long I'll notice vector lowering wo

Re: [PATCH] ssa-math-opts, i386: Handle most unordered values rather than just 2 [PR116896]

2024-10-08 Thread Richard Biener
On Tue, 8 Oct 2024, Jakub Jelinek wrote: > On Mon, Oct 07, 2024 at 10:32:57AM +0200, Richard Biener wrote: > > > They are implementation defined, -1, 0, 1, 2 is defined by libstdc++: > > > using type = signed char; > > > enum class _Ord : type { equivale

Re: [PR middle-end/114635] Set OMP safelen handling to INT_MAX when the pragma didn’t provide one.

2024-10-08 Thread Richard Biener
On Mon, Aug 5, 2024 at 7:05 AM Kugan Vivekanandarajah wrote: > > > > > On 15 Jul 2024, at 5:18 pm, Jakub Jelinek wrote: > > > > External email: Use caution opening links or attachments > > > > > > On Mon, Jul 15, 2024 at 12:39:22AM +, Kugan Vivekanandarajah wrote: > >> OMP safelen handling is

RE: Re-compute TYPE_MODE and DECL_MODE while streaming in for accelerator

2024-10-07 Thread Richard Biener
.org > > Subject: RE: Re-compute TYPE_MODE and DECL_MODE while streaming in for > > accelerator > > > > External email: Use caution opening links or attachments > > > > > > > -Original Message- > > > From: Richard Biener > > &

[PATCH] tree-optimization/116990 - missed control flow check in vect_analyze_loop_form

2024-10-07 Thread Richard Biener
The following fixes checking for unsupported control flow in vectorization to also cover the outer loop body. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. PR tree-optimization/116990 * tree-vect-loop.cc (vect_analyze_loop_form): Check the current loop body

[PATCH] tree-optimization/116982 - analyze scalar loop exit early

2024-10-07 Thread Richard Biener
The following makes sure to discover the scalar loop IV exit during analysis as failure to do so (if DCE and friends are disabled this can happen due to if-conversion doing DCE and FRE on the if-converted loop) would ICE later. I refrained from larger refactoring to be able to eventually backport.

Re: [PATCH v2] Add -ftime-report-wall

2024-10-07 Thread Richard Biener
On Sat, Oct 5, 2024 at 10:17 AM Andi Kleen wrote: > > From: Andi Kleen > > Time vars normally use times(2) to get the user/sys/wall time, which is > always a > system call. I don't think the system time is very useful because most > overhead > is in user time. If we only use the wall (or monoto

Re: [PATCH] middle-end: reorder masking priority of math functions

2024-10-07 Thread Richard Biener
On Wed, Oct 2, 2024 at 6:26 PM Victor Do Nascimento wrote: > > Given the categorization of math built-in functions as `ECF_CONST', > when if-converting their uses, their calls are not masked and are thus > called with an all-true predicate. > > This, however, is not appropriate where built-ins hav

Re: [PATCH 0/4] Support more VLA SLP permutations

2024-10-07 Thread Richard Biener
On Fri, 4 Oct 2024, Richard Sandiford wrote: > This series should fix the target-independent parts of PR116583. > (We also need some target-specific patches, to be posted separately.) > > The explanations are in the individual commit messages, but I've > attached a -b diff below in case my attemp

Re: [PATCH] ssa-math-opts, i386: Improve spaceship expansion [PR116896]

2024-10-07 Thread Richard Biener
On Mon, 7 Oct 2024, Jakub Jelinek wrote: > On Mon, Oct 07, 2024 at 08:59:56AM +0200, Richard Biener wrote: > > The forwprop added optmization looks like it would match PHI-opt better, > > but I'm fine with leaving it in forwprop. I do wonder whether instead > > of addin

Re: [PATCH] libstdc++: Unroll loop in load_bytes function

2024-10-07 Thread Richard Biener
On Fri, Oct 4, 2024 at 3:07 PM Jonathan Wakely wrote: > > On Fri, 4 Oct 2024 at 13:53, Dmitry Ilvokhin wrote: > > > > On Fri, Oct 04, 2024 at 10:20:27AM +0100, Jonathan Wakely wrote: > > > On Fri, 4 Oct 2024 at 10:19, Jonathan Wakely wrote: > > > > >

Re: [PATCH] ssa-math-opts, i386: Improve spaceship expansion [PR116896]

2024-10-07 Thread Richard Biener
"TARGET_80387 && (TARGET_CMOVE || (TARGET_SAHF && TARGET_USE_SAHF))" > > { > > - ix86_expand_fp_spaceship (operands[0], operands[1], operands[2]); > > + ix86_expand_fp_spaceship (operands[0], operands[1], operands[2], > > + operands[3]); > > + DONE; > > +}) > > + > > +(define_expand "spaceship4" > > + [(match_operand:SI 0 "register_operand") > > + (match_operand:SWI 1 "nonimmediate_operand") > > + (match_operand:SWI 2 "") > > + (match_operand:SI 3 "const_int_operand")] > > + "" > > +{ > > + ix86_expand_int_spaceship (operands[0], operands[1], operands[2], > > +operands[3]); > >DONE; > > }) > > > > --- gcc/doc/md.texi.jj 2024-10-01 09:38:58.035961557 +0200 > > +++ gcc/doc/md.texi 2024-10-02 20:16:22.502329039 +0200 > > @@ -8568,11 +8568,15 @@ inclusive and operand 1 exclusive. > > If this pattern is not defined, a call to the library function > > @code{__clear_cache} is used. > > > > -@cindex @code{spaceship@var{m}3} instruction pattern > > -@item @samp{spaceship@var{m}3} > > +@cindex @code{spaceship@var{m}4} instruction pattern > > +@item @samp{spaceship@var{m}4} > > Initialize output operand 0 with mode of integer type to -1, 0, 1 or 2 > > if operand 1 with mode @var{m} compares less than operand 2, equal to > > operand 2, greater than operand 2 or is unordered with operand 2. > > +Operand 3 should be @code{const0_rtx} if the result is used in comparisons, > > +@code{const1_rtx} if the result is used as integer value and the comparison > > +is signed, @code{const2_rtx} if the result is used as integer value and > > +the comparison is unsigned. > > @var{m} should be a scalar floating point mode. > > > > This pattern is not allowed to @code{FAIL}. > > --- gcc/testsuite/g++.target/i386/pr116896-1.C.jj 2024-10-03 > > 14:40:27.071813336 +0200 > > +++ gcc/testsuite/g++.target/i386/pr116896-1.C 2024-10-03 > > 15:52:06.243819660 +0200 > > @@ -0,0 +1,35 @@ > > +// PR middle-end/116896 > > +// { dg-do compile { target c++20 } } > > +// { dg-options "-O2 -masm=att -fno-stack-protector" } > > +// { dg-final { scan-assembler-times "\tjp\t" 1 } } > > +// { dg-final { scan-assembler-not "\tj\[^mp\]\[a-z\]*\t" } } > > +// { dg-final { scan-assembler-times "\tsbb\[bl\]\t\\\$0, " 3 } } > > +// { dg-final { scan-assembler-times "\tseta\t" 3 } } > > +// { dg-final { scan-assembler-times "\tsetg\t" 1 } } > > +// { dg-final { scan-assembler-times "\tsetl\t" 1 } } > > + > > +#include > > + > > +[[gnu::noipa]] auto > > +foo (float x, float y) > > +{ > > + return x <=> y; > > +} > > + > > +[[gnu::noipa, gnu::optimize ("fast-math")]] auto > > +bar (float x, float y) > > +{ > > + return x <=> y; > > +} > > + > > +[[gnu::noipa]] auto > > +baz (int x, int y) > > +{ > > + return x <=> y; > > +} > > + > > +[[gnu::noipa]] auto > > +qux (unsigned x, unsigned y) > > +{ > > + return x <=> y; > > +} > > --- gcc/testsuite/g++.target/i386/pr116896-2.C.jj 2024-10-03 > > 14:40:37.203674018 +0200 > > +++ gcc/testsuite/g++.target/i386/pr116896-2.C 2024-10-04 > > 10:55:07.468396073 +0200 > > @@ -0,0 +1,41 @@ > > +// PR middle-end/116896 > > +// { dg-do run { target c++20 } } > > +// { dg-options "-O2" } > > + > > +#include "pr116896-1.C" > > + > > +[[gnu::noipa]] auto > > +corge (int x) > > +{ > > + return x <=> 0; > > +} > > + > > +[[gnu::noipa]] auto > > +garply (unsigned x) > > +{ > > + return x <=> 0; > > +} > > + > > +int > > +main () > > +{ > > + if (foo (-1.0f, 1.0f) != std::partial_ordering::less > > + || foo (1.0f, -1.0f) != std::partial_ordering::greater > > + || foo (1.0f, 1.0f) != std::partial_ordering::equivalent > > + || foo (__builtin_nanf (""), 1.0f) != > > std::partial_ordering::unordered > > + || bar (-2.0f, 2.0f) != std::partial_ordering::less > > + || bar (2.0f, -2.0f) != std::partial_ordering::greater > > + || bar (-5.0f, -5.0f) != std::partial_ordering::equivalent > > + || baz (-42, 42) != std::strong_ordering::less > > + || baz (42, -42) != std::strong_ordering::greater > > + || baz (42, 42) != std::strong_ordering::equal > > + || qux (40, 42) != std::strong_ordering::less > > + || qux (42, 40) != std::strong_ordering::greater > > + || qux (40, 40) != std::strong_ordering::equal > > + || corge (-15) != std::strong_ordering::less > > + || corge (15) != std::strong_ordering::greater > > + || corge (0) != std::strong_ordering::equal > > + || garply (15) != std::strong_ordering::greater > > + || garply (0) != std::strong_ordering::equal) > > +__builtin_abort (); > > +} > > > > Jakub > > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

[PATCH][v2] Add single-lane SLP support to .GOMP_SIMD_LANE vectorization

2024-10-05 Thread Richard Biener
The following adds basic support for single-lane SLP .GOMP_SIMD_LANE vectorization, in particular it enables SLP discovery. Bootstrap and regtest running on x86_64-unknown-linux-gnu. * tree-vect-slp.cc (no_arg_map): New. (vect_get_operand_map): Handle IFN_GOMP_SIMD_LANE. (

Re: [PATCH] Fix PR middle-end/116933

2024-10-05 Thread Richard Biener
> Am 05.10.2024 um 10:22 schrieb Eric Botcazou : > > Hi, > > this polishes a few rough edges that prevent -ftrivial-auto-var-init=zero from > working in Ada: > > - build_common_builtin_nodes declares BUILT_IN_CLEAR_PADDING with 3 instead > of 2 parameters, now gimple_fold_builtin_clear_padd

[PATCH] Add single-lane SLP support to .GOMP_SIMD_LANE vectorization

2024-10-04 Thread Richard Biener
The following adds basic support for single-lane SLP .GOMP_SIMD_LANE vectorization, in particular it enables SLP discovery. * tree-vect-slp.cc (no_arg_map): New. (vect_get_operand_map): Handle IFN_GOMP_SIMD_LANE. (vect_build_slp_tree_1): Likewise. * tree-vect-stmts.

[PATCH] Fixup dumping of re-trying without/with single-lane SLP

2024-10-04 Thread Richard Biener
The following fixes the order of decrementing the SLP mode and the dumping. Build on x86_64-unknown-linux-gnu, pushed. * tree-vect-loop.cc (vect_analyze_loop_2): Derement 'slp' before dumping which stage we're starting. --- gcc/tree-vect-loop.cc | 6 +++--- 1 file changed, 3 inse

Re: [PATCH] diagnostic, pch: Fix up the new diagnostic PCH methods for ubsan checking [PR116936]

2024-10-04 Thread Richard Biener
On Fri, Oct 4, 2024 at 12:04 PM Jakub Jelinek wrote: > > Hi! > > The PR notes that the new pch_save/pch_restore methods I've added > recently invoke UB if either m_classification_history.address () > or m_push_list.address () is NULL (which can happen if those vectors > are empty (and in the pch_s

Re: [PATCH] libstdc++: Unroll loop in load_bytes function

2024-10-04 Thread Richard Biener
On Fri, Oct 4, 2024 at 11:20 AM Jonathan Wakely wrote: > > On Fri, 4 Oct 2024 at 10:19, Jonathan Wakely wrote: > > > > On Fri, 4 Oct 2024 at 07:53, Richard Biener > > wrote: > > > > > > On Wed, Oct 2, 2024 at 8:26 PM Jonathan Wakely wrote: > > &

[PATCH] Improve load permutation lowering

2024-10-04 Thread Richard Biener
The following makes sure the emitted even/odd extraction scheme follows one that ends up with actual trivial even/odd extract permutes. When we choose a level 2 extract we generate { 0, 1, 4, 5, ... } which for example the x86 backend doesn't recognize with just SSE and QImode elements. So this no

[PATCH] Relax gcc.dg/vect/pr65947-8.c

2024-10-04 Thread Richard Biener
When failing using forced SLP we do not print the non-SLP failure mode which reads slightly different. Massage the expectation a bit. Pushed. * gcc.dg/vect/pr65947-8.c: Adjust. --- gcc/testsuite/gcc.dg/vect/pr65947-8.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/

[PATCH] tree-optimization/99856 - fix testcase

2024-10-04 Thread Richard Biener
When making the testcase use aligned accesses I botched up the copy&paste. Fixed. Pushed. PR tree-optimization/99856 * gcc.dg/vect/pr99856.c: Fix copy&paste errors. --- gcc/testsuite/gcc.dg/vect/pr99856.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gc

Re: [PATCH 3/3] Record template specialization hash

2024-10-04 Thread Richard Biener
On Thu, 3 Oct 2024, Jason Merrill wrote: > On 10/2/24 7:53 AM, Richard Biener wrote: > > For a specific testcase a lot of compile-time is spent in re-hashing > > hashtable elements upon expansion. The following records the hash > > in the hash element. This speeds

Re: [PATCH 3/3] gimple: Add gimple_with_undefined_signed_overflow and use it [PR111276]

2024-10-04 Thread Richard Biener
On Thu, Oct 3, 2024 at 6:09 PM Andrew Pinski wrote: > > While looking into the ifcombine, I noticed that rewrite_to_defined_overflow > was rewriting already defined code. In the previous attempt at fixing this, > the review mentioned we should not be calling rewrite_to_defined_overflow > in those

Re: [PATCH 2/3] cfgexpand: Handle scope conflicts better [PR111422]

2024-10-04 Thread Richard Biener
On Thu, Oct 3, 2024 at 6:09 PM Andrew Pinski wrote: > > After fixing loop-im to do the correct overflow rewriting > for pointer types too. We end up with code like: > ``` > _9 = (unsigned long) &g; > _84 = _9 + 18446744073709551615; > _11 = _42 + _84; > _44 = (signed char *) _11; > ... >

Re: [PATCH 1/3] cfgexpand: Expand comment on when non-var clobbers can show up

2024-10-03 Thread Richard Biener
On Thu, Oct 3, 2024 at 6:09 PM Andrew Pinski wrote: > > The comment here is not wrong, just it would be better if mentioning > the C++ front-end instead of just the nested function lowering. OK > gcc/ChangeLog: > > * cfgexpand.cc (add_scope_conflicts_1): Expand comment > on when

Re: [PATCH] libstdc++: Unroll loop in load_bytes function

2024-10-03 Thread Richard Biener
On Wed, Oct 2, 2024 at 8:26 PM Jonathan Wakely wrote: > > On Wed, 2 Oct 2024 at 19:16, Jonathan Wakely wrote: > > > > On Wed, 2 Oct 2024 at 19:15, Dmitry Ilvokhin wrote: > > > > > > Instead of looping over every byte of the tail, unroll loop manually > > > using switch statement, then compilers

Re: [patch,testsuite] Fix gcc.c-torture/execute/ieee/pr108540-1.c

2024-10-03 Thread Richard Biener
On Thu, Oct 3, 2024 at 1:30 PM Georg-Johann Lay wrote: > > gcc.c-torture/execute/ieee/pr108540-1.c obviously requires that double > is a 64-bit type, hence add pr108540-1.x as an according filter. > > Ok for trunk? > > And is there a reason for why we are still putting test cases in > these old pa

Re: [PATCH] RISC-V: Define LOGICAL_OP_NON_SHORT_CIRCUIT to 1 [PR116615]

2024-10-03 Thread Richard Biener
On Thu, Oct 3, 2024 at 3:15 AM Andrew Waterman wrote: > > On Wed, Oct 2, 2024 at 4:41 PM Jeff Law wrote: > > > > > > > > On 10/2/24 4:39 PM, Andrew Waterman wrote: > > > On Wed, Oct 2, 2024 at 5:56 AM Jeff Law wrote: > > >> > > >> > > >> > > >> On 9/5/24 12:52 PM, Palmer Dabbelt wrote: > > >>> W

Re: [patch,testsuite,applied] Fix gcc.dg/signbit-6.c for int != 32-bit targets

2024-10-03 Thread Richard Biener
On Wed, Oct 2, 2024 at 5:01 PM Georg-Johann Lay wrote: > > This test failed on int != 32-bit targets due to > a[0] = b[0] = INT_MIN instead of using INT32_MIN. OK. Richard. > Johann > > -- > > testsuite/52641 - Fix gcc.dg/signbit-6.c for int != 32-bit targets. > > PR testsuite

Re: [PATCH] testsuite: Make check-function-bodies work with LTO

2024-10-03 Thread Richard Biener
On Wed, Oct 2, 2024 at 3:48 PM Richard Sandiford wrote: > > This patch tries to make check-function-bodies automatically > choose between reading the regular assembly file and reading the > LTO assembly file. There should only ever be one right answer, > since check-function-bodies doesn't make s

Re: [PATCH 3/3] Handle non-grouped stores as single-lane SLP

2024-10-03 Thread Richard Biener
On Thu, 3 Oct 2024, Thomas Schwinge wrote: > Hi! > > On 2024-09-06T11:30:06+0200, Richard Biener wrote: > > On Thu, 5 Sep 2024, Richard Biener wrote: > >> The following enables single-lane loop SLP discovery for non-grouped stores > >> and adjusts vectorizab

[PATCH] Restore aarch64 bootstrap

2024-10-03 Thread Richard Biener
This zero-initializes vec_init to avoid a bogus maybe-uninitialized diagnostic. Built on x86_64-unknown-linux-gnu, pushed as obvious. * tree-vect-loop.cc (vectorizable_induction): Initialize vec_init. --- gcc/tree-vect-loop.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

Re: [PATCH] tree-optimization/116566 - single lane SLP for VLA inductions

2024-10-03 Thread Richard Biener
On Wed, 2 Oct 2024, Andrew Pinski wrote: > On Tue, Oct 1, 2024 at 5:04 AM Richard Biener wrote: > > > > The following adds SLP support for vectorizing single-lane inductions > > with variable length vectors. > > This introduces a bootstrap failure on aarch64 due

Re: [PATCH] testsuite: Unset torture_current_flags after use

2024-10-02 Thread Richard Biener
> Am 02.10.2024 um 15:48 schrieb Richard Sandiford : > > Before running a test with specific torture options, gcc-dg-runtest > sets the global variable torture_current_flags to the set of torture > options that will be used. However, it never unset the variable > afterwards, which meant that

[PATCH] Replace another missed iterative_hash_object

2024-10-02 Thread Richard Biener
I missed one that's actually hit quite a lot, hashing of the canonical type TYPE_HASH. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed as obvious after the previous approval. Richard. * pt.cc (iterative_hash_template_arg): Use iterative_hash_hashval_t to hash TYPE_HAS

Re: [PATCH]middle-end: support SLP early break

2024-10-02 Thread Richard Biener
mt = STMT_VINFO_STMT (stmt_info); >basic_block cond_bb = gimple_bb (stmt); > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h > index > 490061aea2f6d465d9589eb97bbd34a920d76b1c..53483303c4ac3482760fe722354f602e0243e5a2 > 100644 > --- a/gcc/tree-vectorizer.h > +++ b/gcc/t

Re: [RFC PATCH] Allow limited extended asm at toplevel

2024-10-02 Thread Richard Biener
gt; } > + if (allows_reg && toplev_p) > + { > + error_at (loc, "invalid constraint outside of a function"); > + operand = error_mark_node; > + } > } > else >

[PATCH 3/3] Record template specialization hash

2024-10-02 Thread Richard Biener
For a specific testcase a lot of compile-time is spent in re-hashing hashtable elements upon expansion. The following records the hash in the hash element. This speeds up compilation by 20%. There's probably module-related uses that need to be adjusted. Bootstrap failed (guess I was expecting t

[PATCH 2/3] Release expanded template argument vector

2024-10-02 Thread Richard Biener
This reduces peak memory usage by 20% for a specific testcase. Bootstrapped and tested on x86_64-unknown-linux-gnu. It's very ugly so I'd appreciate suggestions on how to handle such situations better? gcc/cp/ * pt.cc (coerce_template_parms): Release expanded argument vector when

[PATCH 1/3] Speedup iterative_hash_template_arg

2024-10-02 Thread Richard Biener
Using iterative_hash_object is expensive compared to using iterative_hash_hashval_t which is fit for integer sized values. The following reduces the number of perf cycles spent in iterative_hash_template_arg and iterative_hash combined by 20%. Bootstrapped and tested on x86_64-unknown-linux-gnu.

[PATCH] Adjust gcc.dg/vect/vect-double-reduc-5.c

2024-10-02 Thread Richard Biener
The testcase XPASSes now and should do so everywhere I think. Pushed. * gcc.dg/vect/vect-double-reduc-5.c: Adjust. --- gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c

[PATCH] Adjust gcc.dg/vect/slp-12a.c

2024-10-02 Thread Richard Biener
We can now SLP the loop. There's PR116583 tracking that this still fails for VLA vectors when load-lanes doesn't support a group of size 8. We can't express this right now so the testcase keeps FAILing for aarch64 with SVE (but passes now for riscv). Pushed. * gcc.dg/vect/slp-12a.c: Adj

[PATCH] Adjust expectation for gcc.dg/vect/slp-19c.c

2024-10-02 Thread Richard Biener
We can now vectorize the first loop with SLP when using V2SImode vectors since then we can handle the non-power-of-two interleaving. We can also SLP the second loop reliably now after adding induction support for VLA vectors. Pushed. * gcc.dg/vect/slp-19c.c: Adjust expectation. --- gcc/t

[PATCH] un-XFAIL gcc.dg/vect/vect-double-reduc-5.c

2024-10-02 Thread Richard Biener
The testcase now passes, we can handle double reductions with multiple types fine. Pushed. * gcc.dg/vect/vect-double-reduc-5.c: Un-XFAIL everywhere. --- gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/gcc/testsuite/g

[PATCH] testsuite/116596 - fix gcc.dg/vect/slp-11a.c

2024-10-02 Thread Richard Biener
The condition on "vectorizing stmts using SLP" needs to match that of "vectorized 1 loops", obviously. Pushed. PR testsuite/116596 * gcc.dg/vect/slp-11a.c: Fix. --- gcc/testsuite/gcc.dg/vect/slp-11a.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsui

[PATCH] testsuite/116660 - adjust testcases unexpectedly failing on 32bit sparc

2024-10-02 Thread Richard Biener
Both testcases miss some effective target requires. Pushed. PR testsuite/116660 * gcc.dg/vect/no-scevccp-outer-12.c: Add vect_pack_trunc. * gcc.dg/vect/vect-multitypes-6.c: Add vect_char_add, remove explicit 32bit sparc XFAIL. --- gcc/testsuite/gcc.dg/vect/no-scev

Re: [PATCH] backprop: Fix deleting of a phi node [PR116922]

2024-10-02 Thread Richard Biener
On Wed, Oct 2, 2024 at 5:13 AM Andrew Pinski wrote: > > The problem here is remove_unused_var is called on a name that is > defined by a phi node but it deletes it like removing a normal statement. > remove_phi_node should be called rather than gsi_remove for phinodes. > > Note there is a possibil

Re: [PATCH] phiopt: Fix VCE moving by rewriting it into cast [PR116098]

2024-10-02 Thread Richard Biener
On Wed, Oct 2, 2024 at 1:11 AM Andrew Pinski wrote: > > Phiopt match_and_simplify might move a well defined VCE assign statement > from being conditional to being uncondtitional; that VCE might no longer > being defined. It will need a rewrite into a cast instead. > > This adds the rewriting code

Re: [PATCH][Backport][GCC12] tree-optimization/116585 - SSA corruption with split_constant_offset

2024-10-01 Thread Richard Biener
On Tue, 1 Oct 2024, Qing Zhao wrote: > From: Richard Biener > > Hi, this is the backport of the fix for PR116585 to GCC12. > bootstrapped and regress tested on both X86 and aarch64. > > Okay for committing? OK. > thanks. > > Qing. > > ==

Re: [PATCH][Backport][GCC13] tree-optimization/116585 - SSA corruption with split_constant_offset

2024-10-01 Thread Richard Biener
On Tue, 1 Oct 2024, Qing Zhao wrote: > From: Richard Biener > > Hi, this is the backport of the fix for PR116585 to GCC13. > bootstrapped and regress tested on both X86 and aarch64. > > Okay for committing? OK. > thanks. > > Qing. > > === > split_

Re: [PATCH][Backport][GCC14] tree-optimization/116585 - SSA corruption with split_constant_offset

2024-10-01 Thread Richard Biener
On Tue, 1 Oct 2024, Qing Zhao wrote: > From: Richard Biener > > Hi, this is the backport of the fix for PR116585 to GCC14. > bootstrapped and regress tested on both X86 and aarch64. > > Okay for committing? OK. &

[PATCH] Fix gcc.dg/pr116905.c

2024-10-01 Thread Richard Biener
I missed { dg-add-options float16 }. Pushed. * gcc.dg/pr116905.c: Add float16 options. --- gcc/testsuite/gcc.dg/pr116905.c | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/testsuite/gcc.dg/pr116905.c b/gcc/testsuite/gcc.dg/pr116905.c index 0a2b96ac1c1..89de8525b25 100644 --- a/gcc

[PATCH 2/2] testsuite/116654 - adjust gcc.target/powerpc/p9-vec-length-full-8.c

2024-10-01 Thread Richard Biener
gcc.target/powerpc/p9-vec-length-full-8.c was expecting all loops to use -with-len fully masked vectorization to avoid epilogues because the loops needed peeling for gaps. With SLP we have improved things here and the loops using V2D[IF]mode no longer need peeling for gaps since the target can com

[PATCH 1/2] testsuite/116654 - adjust gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c

2024-10-01 Thread Richard Biener
As we now SLP non-grouped stores we have to adjust the expected count. Pushed. PR testsuite/116654 * gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c: Adjust. --- gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --

Re: [PATCH] MATCH: Simplify `min(a, b) op max(a, b)` to `a op b` [PR109401]

2024-10-01 Thread Richard Biener
On Sun, Sep 29, 2024 at 5:28 PM Jeff Law wrote: > > > > On 9/25/24 2:30 AM, Eikansh Gupta wrote: > > This patch simplify `min(a,b) op max(a,b)` to `a op b`. This optimization > > will work for all the binary commutative operations. So, the `op` here can > > be one of {plus, mult, bit_and, bit_xor,

Re: [PATCH] MATCH: Simplify `(trunc)copysign ((extend)x, CST)` to `copysign (x, -1.0/1.0)` [PR112472]

2024-10-01 Thread Richard Biener
On Tue, Sep 24, 2024 at 10:58 AM Eikansh Gupta wrote: > > This patch simplify `(trunc)copysign ((extend)x, CST)` to `copysign (x, > -1.0/1.0)` > depending on the sign of CST. Previously, it was simplified to `copysign (x, > CST)`. > It can be optimized as the sign of the CST matters, not the val

[PATCH] tree-optimization/116902 - vectorizer load hosting breaks UID order #2

2024-10-01 Thread Richard Biener
This is another case of load hoisting breaking UID order in the preheader, this time between two hoistings. The easiest way out is to do what we do for the main stmt - copy instead of move. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. PR tree-optimization/116902 P

Re: [PATCH] Fix wrong code out of NRV + RSO + inlining (take 2)

2024-10-01 Thread Richard Biener
On Tue, Oct 1, 2024 at 12:01 PM Eric Botcazou wrote: > > Hi, > > the attached Ada testcase compiled with -O -flto exhibits a wrong code issue > when the 3 optimizations NRV + RSO + inlining are applied to the same call: if > the LHS of the call is marked write-only before inlining, then it will ke

[PATCH] tree-optimization/116654 - missed dr_explicit_realign[_optimized] with SLP

2024-10-01 Thread Richard Biener
With single-lane SLP we miss to use the power realing loads causing some testsuite FAILs. r14-2430-g4736ddd11874fe exempted SLP of non-grouped accesses because that could have been only splats where the scheme isn't used anyway, but now with single-lane SLP it can be contiguous accesses. Bootstra

Re: [PATCH] [PR86710][PR116826] match.pd: Fold logarithmic identities.

2024-10-01 Thread Richard Biener
gN(a), logN(a) + logN(b) -> logN(a*b), > and logN(a) - logN(b) -> logN(a/b). > > gcc/testsuite/ > * gcc.dg/tree-ssa/log_ident.c: New test. > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] phi-opt: Improve factor heurstic with constants and conversions from bool [PR116890]

2024-10-01 Thread Richard Biener
On Mon, Sep 30, 2024 at 11:50 PM Andrew Pinski wrote: > > Take: > ``` > if (t_3(D) != 0) > goto ; > else > goto ; > > > _8 = c_4(D) != 0; > _9 = (int) _8; > > > # e_2 = PHI <_9(3), 0(2)> > ``` > > We should factor out the conversion here as that will allow a simplfication t

Re: [PATCH] middle-end: Fix ifcvt predicate generation for masked function calls

2024-10-01 Thread Richard Biener
On Mon, Sep 30, 2024 at 8:40 PM Tamar Christina wrote: > > Hi Victor, > > Thanks! This looks good to me with one minor comment: > > > -Original Message- > > From: Victor Do Nascimento > > Sent: Monday, September 30, 2024 2:34 PM > > To: gcc-patches@gcc.gnu.org > > Cc: Tamar Christina ; ri

[PATCH] tree-optimization/116566 - single lane SLP for VLA inductions

2024-10-01 Thread Richard Biener
The following adds SLP support for vectorizing single-lane inductions with variable length vectors. Bootstrapped and tested on x86_64-unknown-linux-gnu. PR tree-optimization/116566 * tree-vect-loop.cc (vectorizable_induction): Handle single-lane SLP for VLA vectors. --- g

[PATCH] tree-optimization/116906 - unsafe PRE with never executed edges

2024-10-01 Thread Richard Biener
When we're computing ANTIC for PRE we treat edges to not yet visited blocks as having a maximum ANTIC solution to get at an optimistic solution in the iteration. That assumes the edges visted eventually execute. This is a wrong assumption that can lead to wrong code (and not only non-optimality)

[PATCH] tree-optimization/116905 - ICE with bogus range ops

2024-10-01 Thread Richard Biener
The following avoids querying ranges of vector entities. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. PR tree-optimization/116905 * tree-vect-stmts.cc (supportable_indirect_convert_operation): Fix guard for vect_get_range_info. * gcc.dg

Re: [PATCH v1 1/2] Match: Support form 2 for scalar signed integer SAT_SUB

2024-10-01 Thread Richard Biener
On Thu, Sep 26, 2024 at 2:25 PM wrote: > > From: Pan Li > > This patch would like to support the form 2 of the scalar signed > integer SAT_SUB. Aka below example: > > Form 2: > #define DEF_SAT_S_SUB_FMT_2(T, UT, MIN, MAX) \ > T __attribute__((noinline)) \ > sat_s_sub_##T##

[PATCH] tree-optimization/116566 - single lane SLP for VLA inductions

2024-09-30 Thread Richard Biener
The following adds SLP support for vectorizing single-lane inductions with variable length vectors. This is a WIP patch, local testing for SVE and riscv is fine but the CI might discover issues. PR tree-optimization/116566 * tree-vect-loop.cc (vectorizable_induction): Handle singl

[PATCH] tree-optimization/116879 - failure to recognize non-empty latch

2024-09-30 Thread Richard Biener
When we relaxed the vectorizers constraint on loop structure verifying the emptiness of the latch became too lose as can be seen in the case for PR116879 where the latch effectively contains two basic-blocks which one being an unmerged forwarder that's not empty. Bootstrapped and tested on x86_64-

RE: [PATCH]middle-end: check explicitly for external or constants when checking for loop invariant [PR116817]

2024-09-30 Thread Richard Biener
>if (check_bool_pattern (var, vinfo, bool_stmts)) > var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo); >else if (integer_type_for_mask (var, vinfo)) > return NULL; > else if (TREE_CODE (TREE_TYPE (var)) == BOOLEAN_TYPE > -&&

[PATCH] tree-optimization/113197 - bougs assert in PTA

2024-09-30 Thread Richard Biener
PTA asserts that EAF_NO_DIRECT_READ is not set when flags are set consistently which doesn't make sense. The following removes the assert. Bootstrap & regtest running on x86_64-unknown-linux-gnu. Richard. PR tree-optimization/113197 * tree-ssa-structalias.cc (handle_call_arg): R

<    1   2   3   4   5   6   7   8   9   10   >