The following handles SLP discovery of permuted masked loads which
was prohibited (because wrongly handled) for PR114375. In particular
with single-lane SLP at the moment all masked group loads appear
permuted and we fail to use masked load lanes as well. The following
addresses parts of the issu
The following avoids copying scalar stmts again for the re-lookup
of the slot to replace the NULL guard with node.
Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
Richard.
* tree-vect-slp.cc (vect_cse_slp_nodes): Fix memory leak.
---
gcc/tree-vect-slp.cc | 2 +-
1 file ch
The following massages the GIMPLE matching way of handling scan
stores to work with single-lane SLP. I do not fully understand all
the cases that can happen and the stmt matching at vectorizable_store
time is less than ideal - but the following gets me all the testcases
to pass with and without fo
On Tue, Oct 8, 2024 at 11:14 AM Hongtao Liu wrote:
>
> On Tue, Oct 8, 2024 at 4:56 PM Richard Biener
> wrote:
> >
> > On Tue, Oct 8, 2024 at 10:36 AM liuhongt wrote:
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.dg/fstac
On Tue, Oct 8, 2024 at 3:23 AM wrote:
>
> From: Pan Li
>
> This patch would like to support the form 3 and form 4 of the scalar signed
> integer SAT_SUB. Aka below example:
>
> Form 3:
> #define DEF_SAT_S_ADD_FMT_3(T, UT, MIN, MAX) \
> T __attribute__((noinline))
On Tue, Oct 8, 2024 at 10:34 AM wrote:
>
> From: Pan Li
>
> This patch would like to support the form 1 of the scalar signed
> integer SAT_TRUNC. Aka below example:
>
> Form 1:
> #define DEF_SAT_S_TRUNC_FMT_1(NT, WT, NT_MIN, NT_MAX) \
> NT __attribute__((noinline)) \
On Tue, Oct 8, 2024 at 10:34 AM wrote:
>
> From: Pan Li
>
> When try to matching saturation related pattern on PHI node, we may have
> to try each pattern for all phi node of bb. Aka:
>
> for each PHI node in bb:
> gphi *phi = xxx;
> try_match_sat_add (, phi);
> try_match_sat_sub (, phi);
On Tue, Oct 8, 2024 at 10:36 AM liuhongt wrote:
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/fstack-protector-strong.c: Adjust
> scan-assembler-times.
> * gcc.dg/graphite/scop-6.c: Add
> -Wno-aggressive-loop-optimizations.
> * gcc.dg/graphite/scop-9.c: Ditto.
>
On Tue, Oct 8, 2024 at 10:36 AM liuhongt wrote:
>
> >So should we adjust very-cheap to allow niter peeling as proposed or
> >should we switch the default at -O2 to cheap?
> I prefer the former.
>
> Update in V2:
> Adjust testcase after relax O2 vectorization.
>
> Ok for trunk?
OK.
Thanks,
Richar
On Thu, 3 Oct 2024, Jennifer Schmitz wrote:
>
>
> > On 1 Oct 2024, at 14:27, Richard Biener wrote:
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Tue, 1 Oct 2024, Jennifer Schmitz wrote:
> >
> >> Th
heck for
> mod optab support.
>
> gcc/testsuite/
> PR tree-optimization/116831
> * gcc.dg/torture/pr116831.c: New test.
>
--
Richard Biener
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
The following adds a pattern to elide a .REDUC_IOR operation when
the result is compared against zero with a cbranch. I've resorted
to using can_compare_p since that's what RTL expansion eventually
checks - while GIMPLE allowed whole vector equality compares for long
I'll notice vector lowering wo
On Tue, 8 Oct 2024, Jakub Jelinek wrote:
> On Mon, Oct 07, 2024 at 10:32:57AM +0200, Richard Biener wrote:
> > > They are implementation defined, -1, 0, 1, 2 is defined by libstdc++:
> > > using type = signed char;
> > > enum class _Ord : type { equivale
On Mon, Aug 5, 2024 at 7:05 AM Kugan Vivekanandarajah
wrote:
>
>
>
> > On 15 Jul 2024, at 5:18 pm, Jakub Jelinek wrote:
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Mon, Jul 15, 2024 at 12:39:22AM +, Kugan Vivekanandarajah wrote:
> >> OMP safelen handling is
.org
> > Subject: RE: Re-compute TYPE_MODE and DECL_MODE while streaming in for
> > accelerator
> >
> > External email: Use caution opening links or attachments
> >
> >
> > > -Original Message-
> > > From: Richard Biener
> > &
The following fixes checking for unsupported control flow in
vectorization to also cover the outer loop body.
Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
PR tree-optimization/116990
* tree-vect-loop.cc (vect_analyze_loop_form): Check the current
loop body
The following makes sure to discover the scalar loop IV exit during
analysis as failure to do so (if DCE and friends are disabled this
can happen due to if-conversion doing DCE and FRE on the if-converted
loop) would ICE later.
I refrained from larger refactoring to be able to eventually backport.
On Sat, Oct 5, 2024 at 10:17 AM Andi Kleen wrote:
>
> From: Andi Kleen
>
> Time vars normally use times(2) to get the user/sys/wall time, which is
> always a
> system call. I don't think the system time is very useful because most
> overhead
> is in user time. If we only use the wall (or monoto
On Wed, Oct 2, 2024 at 6:26 PM Victor Do Nascimento
wrote:
>
> Given the categorization of math built-in functions as `ECF_CONST',
> when if-converting their uses, their calls are not masked and are thus
> called with an all-true predicate.
>
> This, however, is not appropriate where built-ins hav
On Fri, 4 Oct 2024, Richard Sandiford wrote:
> This series should fix the target-independent parts of PR116583.
> (We also need some target-specific patches, to be posted separately.)
>
> The explanations are in the individual commit messages, but I've
> attached a -b diff below in case my attemp
On Mon, 7 Oct 2024, Jakub Jelinek wrote:
> On Mon, Oct 07, 2024 at 08:59:56AM +0200, Richard Biener wrote:
> > The forwprop added optmization looks like it would match PHI-opt better,
> > but I'm fine with leaving it in forwprop. I do wonder whether instead
> > of addin
On Fri, Oct 4, 2024 at 3:07 PM Jonathan Wakely wrote:
>
> On Fri, 4 Oct 2024 at 13:53, Dmitry Ilvokhin wrote:
> >
> > On Fri, Oct 04, 2024 at 10:20:27AM +0100, Jonathan Wakely wrote:
> > > On Fri, 4 Oct 2024 at 10:19, Jonathan Wakely wrote:
> > > >
>
"TARGET_80387 && (TARGET_CMOVE || (TARGET_SAHF && TARGET_USE_SAHF))"
> > {
> > - ix86_expand_fp_spaceship (operands[0], operands[1], operands[2]);
> > + ix86_expand_fp_spaceship (operands[0], operands[1], operands[2],
> > + operands[3]);
> > + DONE;
> > +})
> > +
> > +(define_expand "spaceship4"
> > + [(match_operand:SI 0 "register_operand")
> > + (match_operand:SWI 1 "nonimmediate_operand")
> > + (match_operand:SWI 2 "")
> > + (match_operand:SI 3 "const_int_operand")]
> > + ""
> > +{
> > + ix86_expand_int_spaceship (operands[0], operands[1], operands[2],
> > +operands[3]);
> >DONE;
> > })
> >
> > --- gcc/doc/md.texi.jj 2024-10-01 09:38:58.035961557 +0200
> > +++ gcc/doc/md.texi 2024-10-02 20:16:22.502329039 +0200
> > @@ -8568,11 +8568,15 @@ inclusive and operand 1 exclusive.
> > If this pattern is not defined, a call to the library function
> > @code{__clear_cache} is used.
> >
> > -@cindex @code{spaceship@var{m}3} instruction pattern
> > -@item @samp{spaceship@var{m}3}
> > +@cindex @code{spaceship@var{m}4} instruction pattern
> > +@item @samp{spaceship@var{m}4}
> > Initialize output operand 0 with mode of integer type to -1, 0, 1 or 2
> > if operand 1 with mode @var{m} compares less than operand 2, equal to
> > operand 2, greater than operand 2 or is unordered with operand 2.
> > +Operand 3 should be @code{const0_rtx} if the result is used in comparisons,
> > +@code{const1_rtx} if the result is used as integer value and the comparison
> > +is signed, @code{const2_rtx} if the result is used as integer value and
> > +the comparison is unsigned.
> > @var{m} should be a scalar floating point mode.
> >
> > This pattern is not allowed to @code{FAIL}.
> > --- gcc/testsuite/g++.target/i386/pr116896-1.C.jj 2024-10-03
> > 14:40:27.071813336 +0200
> > +++ gcc/testsuite/g++.target/i386/pr116896-1.C 2024-10-03
> > 15:52:06.243819660 +0200
> > @@ -0,0 +1,35 @@
> > +// PR middle-end/116896
> > +// { dg-do compile { target c++20 } }
> > +// { dg-options "-O2 -masm=att -fno-stack-protector" }
> > +// { dg-final { scan-assembler-times "\tjp\t" 1 } }
> > +// { dg-final { scan-assembler-not "\tj\[^mp\]\[a-z\]*\t" } }
> > +// { dg-final { scan-assembler-times "\tsbb\[bl\]\t\\\$0, " 3 } }
> > +// { dg-final { scan-assembler-times "\tseta\t" 3 } }
> > +// { dg-final { scan-assembler-times "\tsetg\t" 1 } }
> > +// { dg-final { scan-assembler-times "\tsetl\t" 1 } }
> > +
> > +#include
> > +
> > +[[gnu::noipa]] auto
> > +foo (float x, float y)
> > +{
> > + return x <=> y;
> > +}
> > +
> > +[[gnu::noipa, gnu::optimize ("fast-math")]] auto
> > +bar (float x, float y)
> > +{
> > + return x <=> y;
> > +}
> > +
> > +[[gnu::noipa]] auto
> > +baz (int x, int y)
> > +{
> > + return x <=> y;
> > +}
> > +
> > +[[gnu::noipa]] auto
> > +qux (unsigned x, unsigned y)
> > +{
> > + return x <=> y;
> > +}
> > --- gcc/testsuite/g++.target/i386/pr116896-2.C.jj 2024-10-03
> > 14:40:37.203674018 +0200
> > +++ gcc/testsuite/g++.target/i386/pr116896-2.C 2024-10-04
> > 10:55:07.468396073 +0200
> > @@ -0,0 +1,41 @@
> > +// PR middle-end/116896
> > +// { dg-do run { target c++20 } }
> > +// { dg-options "-O2" }
> > +
> > +#include "pr116896-1.C"
> > +
> > +[[gnu::noipa]] auto
> > +corge (int x)
> > +{
> > + return x <=> 0;
> > +}
> > +
> > +[[gnu::noipa]] auto
> > +garply (unsigned x)
> > +{
> > + return x <=> 0;
> > +}
> > +
> > +int
> > +main ()
> > +{
> > + if (foo (-1.0f, 1.0f) != std::partial_ordering::less
> > + || foo (1.0f, -1.0f) != std::partial_ordering::greater
> > + || foo (1.0f, 1.0f) != std::partial_ordering::equivalent
> > + || foo (__builtin_nanf (""), 1.0f) !=
> > std::partial_ordering::unordered
> > + || bar (-2.0f, 2.0f) != std::partial_ordering::less
> > + || bar (2.0f, -2.0f) != std::partial_ordering::greater
> > + || bar (-5.0f, -5.0f) != std::partial_ordering::equivalent
> > + || baz (-42, 42) != std::strong_ordering::less
> > + || baz (42, -42) != std::strong_ordering::greater
> > + || baz (42, 42) != std::strong_ordering::equal
> > + || qux (40, 42) != std::strong_ordering::less
> > + || qux (42, 40) != std::strong_ordering::greater
> > + || qux (40, 40) != std::strong_ordering::equal
> > + || corge (-15) != std::strong_ordering::less
> > + || corge (15) != std::strong_ordering::greater
> > + || corge (0) != std::strong_ordering::equal
> > + || garply (15) != std::strong_ordering::greater
> > + || garply (0) != std::strong_ordering::equal)
> > +__builtin_abort ();
> > +}
> >
> > Jakub
> >
>
--
Richard Biener
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
The following adds basic support for single-lane SLP .GOMP_SIMD_LANE
vectorization, in particular it enables SLP discovery.
Bootstrap and regtest running on x86_64-unknown-linux-gnu.
* tree-vect-slp.cc (no_arg_map): New.
(vect_get_operand_map): Handle IFN_GOMP_SIMD_LANE.
(
> Am 05.10.2024 um 10:22 schrieb Eric Botcazou :
>
> Hi,
>
> this polishes a few rough edges that prevent -ftrivial-auto-var-init=zero from
> working in Ada:
>
> - build_common_builtin_nodes declares BUILT_IN_CLEAR_PADDING with 3 instead
> of 2 parameters, now gimple_fold_builtin_clear_padd
The following adds basic support for single-lane SLP .GOMP_SIMD_LANE
vectorization, in particular it enables SLP discovery.
* tree-vect-slp.cc (no_arg_map): New.
(vect_get_operand_map): Handle IFN_GOMP_SIMD_LANE.
(vect_build_slp_tree_1): Likewise.
* tree-vect-stmts.
The following fixes the order of decrementing the SLP mode and
the dumping.
Build on x86_64-unknown-linux-gnu, pushed.
* tree-vect-loop.cc (vect_analyze_loop_2): Derement 'slp'
before dumping which stage we're starting.
---
gcc/tree-vect-loop.cc | 6 +++---
1 file changed, 3 inse
On Fri, Oct 4, 2024 at 12:04 PM Jakub Jelinek wrote:
>
> Hi!
>
> The PR notes that the new pch_save/pch_restore methods I've added
> recently invoke UB if either m_classification_history.address ()
> or m_push_list.address () is NULL (which can happen if those vectors
> are empty (and in the pch_s
On Fri, Oct 4, 2024 at 11:20 AM Jonathan Wakely wrote:
>
> On Fri, 4 Oct 2024 at 10:19, Jonathan Wakely wrote:
> >
> > On Fri, 4 Oct 2024 at 07:53, Richard Biener
> > wrote:
> > >
> > > On Wed, Oct 2, 2024 at 8:26 PM Jonathan Wakely wrote:
> > &
The following makes sure the emitted even/odd extraction scheme
follows one that ends up with actual trivial even/odd extract permutes.
When we choose a level 2 extract we generate { 0, 1, 4, 5, ... }
which for example the x86 backend doesn't recognize with just SSE
and QImode elements. So this no
When failing using forced SLP we do not print the non-SLP failure
mode which reads slightly different. Massage the expectation a bit.
Pushed.
* gcc.dg/vect/pr65947-8.c: Adjust.
---
gcc/testsuite/gcc.dg/vect/pr65947-8.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/
When making the testcase use aligned accesses I botched up the
copy&paste. Fixed.
Pushed.
PR tree-optimization/99856
* gcc.dg/vect/pr99856.c: Fix copy&paste errors.
---
gcc/testsuite/gcc.dg/vect/pr99856.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/gc
On Thu, 3 Oct 2024, Jason Merrill wrote:
> On 10/2/24 7:53 AM, Richard Biener wrote:
> > For a specific testcase a lot of compile-time is spent in re-hashing
> > hashtable elements upon expansion. The following records the hash
> > in the hash element. This speeds
On Thu, Oct 3, 2024 at 6:09 PM Andrew Pinski wrote:
>
> While looking into the ifcombine, I noticed that rewrite_to_defined_overflow
> was rewriting already defined code. In the previous attempt at fixing this,
> the review mentioned we should not be calling rewrite_to_defined_overflow
> in those
On Thu, Oct 3, 2024 at 6:09 PM Andrew Pinski wrote:
>
> After fixing loop-im to do the correct overflow rewriting
> for pointer types too. We end up with code like:
> ```
> _9 = (unsigned long) &g;
> _84 = _9 + 18446744073709551615;
> _11 = _42 + _84;
> _44 = (signed char *) _11;
> ...
>
On Thu, Oct 3, 2024 at 6:09 PM Andrew Pinski wrote:
>
> The comment here is not wrong, just it would be better if mentioning
> the C++ front-end instead of just the nested function lowering.
OK
> gcc/ChangeLog:
>
> * cfgexpand.cc (add_scope_conflicts_1): Expand comment
> on when
On Wed, Oct 2, 2024 at 8:26 PM Jonathan Wakely wrote:
>
> On Wed, 2 Oct 2024 at 19:16, Jonathan Wakely wrote:
> >
> > On Wed, 2 Oct 2024 at 19:15, Dmitry Ilvokhin wrote:
> > >
> > > Instead of looping over every byte of the tail, unroll loop manually
> > > using switch statement, then compilers
On Thu, Oct 3, 2024 at 1:30 PM Georg-Johann Lay wrote:
>
> gcc.c-torture/execute/ieee/pr108540-1.c obviously requires that double
> is a 64-bit type, hence add pr108540-1.x as an according filter.
>
> Ok for trunk?
>
> And is there a reason for why we are still putting test cases in
> these old pa
On Thu, Oct 3, 2024 at 3:15 AM Andrew Waterman wrote:
>
> On Wed, Oct 2, 2024 at 4:41 PM Jeff Law wrote:
> >
> >
> >
> > On 10/2/24 4:39 PM, Andrew Waterman wrote:
> > > On Wed, Oct 2, 2024 at 5:56 AM Jeff Law wrote:
> > >>
> > >>
> > >>
> > >> On 9/5/24 12:52 PM, Palmer Dabbelt wrote:
> > >>> W
On Wed, Oct 2, 2024 at 5:01 PM Georg-Johann Lay wrote:
>
> This test failed on int != 32-bit targets due to
> a[0] = b[0] = INT_MIN instead of using INT32_MIN.
OK.
Richard.
> Johann
>
> --
>
> testsuite/52641 - Fix gcc.dg/signbit-6.c for int != 32-bit targets.
>
> PR testsuite
On Wed, Oct 2, 2024 at 3:48 PM Richard Sandiford
wrote:
>
> This patch tries to make check-function-bodies automatically
> choose between reading the regular assembly file and reading the
> LTO assembly file. There should only ever be one right answer,
> since check-function-bodies doesn't make s
On Thu, 3 Oct 2024, Thomas Schwinge wrote:
> Hi!
>
> On 2024-09-06T11:30:06+0200, Richard Biener wrote:
> > On Thu, 5 Sep 2024, Richard Biener wrote:
> >> The following enables single-lane loop SLP discovery for non-grouped stores
> >> and adjusts vectorizab
This zero-initializes vec_init to avoid a bogus maybe-uninitialized
diagnostic.
Built on x86_64-unknown-linux-gnu, pushed as obvious.
* tree-vect-loop.cc (vectorizable_induction): Initialize
vec_init.
---
gcc/tree-vect-loop.cc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
On Wed, 2 Oct 2024, Andrew Pinski wrote:
> On Tue, Oct 1, 2024 at 5:04 AM Richard Biener wrote:
> >
> > The following adds SLP support for vectorizing single-lane inductions
> > with variable length vectors.
>
> This introduces a bootstrap failure on aarch64 due
> Am 02.10.2024 um 15:48 schrieb Richard Sandiford :
>
> Before running a test with specific torture options, gcc-dg-runtest
> sets the global variable torture_current_flags to the set of torture
> options that will be used. However, it never unset the variable
> afterwards, which meant that
I missed one that's actually hit quite a lot, hashing of the canonical
type TYPE_HASH.
Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed as obvious
after the previous approval.
Richard.
* pt.cc (iterative_hash_template_arg): Use iterative_hash_hashval_t
to hash TYPE_HAS
mt = STMT_VINFO_STMT (stmt_info);
>basic_block cond_bb = gimple_bb (stmt);
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index
> 490061aea2f6d465d9589eb97bbd34a920d76b1c..53483303c4ac3482760fe722354f602e0243e5a2
> 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/t
gt; }
> + if (allows_reg && toplev_p)
> + {
> + error_at (loc, "invalid constraint outside of a function");
> + operand = error_mark_node;
> + }
> }
> else
>
For a specific testcase a lot of compile-time is spent in re-hashing
hashtable elements upon expansion. The following records the hash
in the hash element. This speeds up compilation by 20%.
There's probably module-related uses that need to be adjusted.
Bootstrap failed (guess I was expecting t
This reduces peak memory usage by 20% for a specific testcase.
Bootstrapped and tested on x86_64-unknown-linux-gnu.
It's very ugly so I'd appreciate suggestions on how to handle such
situations better?
gcc/cp/
* pt.cc (coerce_template_parms): Release expanded argument
vector when
Using iterative_hash_object is expensive compared to using
iterative_hash_hashval_t which is fit for integer sized values.
The following reduces the number of perf cycles spent in
iterative_hash_template_arg and iterative_hash combined by 20%.
Bootstrapped and tested on x86_64-unknown-linux-gnu.
The testcase XPASSes now and should do so everywhere I think.
Pushed.
* gcc.dg/vect/vect-double-reduc-5.c: Adjust.
---
gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c | 5 +
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c
We can now SLP the loop. There's PR116583 tracking that this still
fails for VLA vectors when load-lanes doesn't support a group of
size 8. We can't express this right now so the testcase keeps
FAILing for aarch64 with SVE (but passes now for riscv).
Pushed.
* gcc.dg/vect/slp-12a.c: Adj
We can now vectorize the first loop with SLP when using V2SImode
vectors since then we can handle the non-power-of-two interleaving.
We can also SLP the second loop reliably now after adding induction
support for VLA vectors.
Pushed.
* gcc.dg/vect/slp-19c.c: Adjust expectation.
---
gcc/t
The testcase now passes, we can handle double reductions with multiple
types fine.
Pushed.
* gcc.dg/vect/vect-double-reduc-5.c: Un-XFAIL everywhere.
---
gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c | 5 +
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/gcc/testsuite/g
The condition on "vectorizing stmts using SLP" needs to match that
of "vectorized 1 loops", obviously.
Pushed.
PR testsuite/116596
* gcc.dg/vect/slp-11a.c: Fix.
---
gcc/testsuite/gcc.dg/vect/slp-11a.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/testsui
Both testcases miss some effective target requires.
Pushed.
PR testsuite/116660
* gcc.dg/vect/no-scevccp-outer-12.c: Add vect_pack_trunc.
* gcc.dg/vect/vect-multitypes-6.c: Add vect_char_add, remove
explicit 32bit sparc XFAIL.
---
gcc/testsuite/gcc.dg/vect/no-scev
On Wed, Oct 2, 2024 at 5:13 AM Andrew Pinski wrote:
>
> The problem here is remove_unused_var is called on a name that is
> defined by a phi node but it deletes it like removing a normal statement.
> remove_phi_node should be called rather than gsi_remove for phinodes.
>
> Note there is a possibil
On Wed, Oct 2, 2024 at 1:11 AM Andrew Pinski wrote:
>
> Phiopt match_and_simplify might move a well defined VCE assign statement
> from being conditional to being uncondtitional; that VCE might no longer
> being defined. It will need a rewrite into a cast instead.
>
> This adds the rewriting code
On Tue, 1 Oct 2024, Qing Zhao wrote:
> From: Richard Biener
>
> Hi, this is the backport of the fix for PR116585 to GCC12.
> bootstrapped and regress tested on both X86 and aarch64.
>
> Okay for committing?
OK.
> thanks.
>
> Qing.
>
> ==
On Tue, 1 Oct 2024, Qing Zhao wrote:
> From: Richard Biener
>
> Hi, this is the backport of the fix for PR116585 to GCC13.
> bootstrapped and regress tested on both X86 and aarch64.
>
> Okay for committing?
OK.
> thanks.
>
> Qing.
>
> ===
> split_
On Tue, 1 Oct 2024, Qing Zhao wrote:
> From: Richard Biener
>
> Hi, this is the backport of the fix for PR116585 to GCC14.
> bootstrapped and regress tested on both X86 and aarch64.
>
> Okay for committing?
OK.
&
I missed { dg-add-options float16 }.
Pushed.
* gcc.dg/pr116905.c: Add float16 options.
---
gcc/testsuite/gcc.dg/pr116905.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/gcc/testsuite/gcc.dg/pr116905.c b/gcc/testsuite/gcc.dg/pr116905.c
index 0a2b96ac1c1..89de8525b25 100644
--- a/gcc
gcc.target/powerpc/p9-vec-length-full-8.c was expecting all loops to
use -with-len fully masked vectorization to avoid epilogues because
the loops needed peeling for gaps. With SLP we have improved things
here and the loops using V2D[IF]mode no longer need peeling for gaps
since the target can com
As we now SLP non-grouped stores we have to adjust the expected
count.
Pushed.
PR testsuite/116654
* gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c: Adjust.
---
gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --
On Sun, Sep 29, 2024 at 5:28 PM Jeff Law wrote:
>
>
>
> On 9/25/24 2:30 AM, Eikansh Gupta wrote:
> > This patch simplify `min(a,b) op max(a,b)` to `a op b`. This optimization
> > will work for all the binary commutative operations. So, the `op` here can
> > be one of {plus, mult, bit_and, bit_xor,
On Tue, Sep 24, 2024 at 10:58 AM Eikansh Gupta
wrote:
>
> This patch simplify `(trunc)copysign ((extend)x, CST)` to `copysign (x,
> -1.0/1.0)`
> depending on the sign of CST. Previously, it was simplified to `copysign (x,
> CST)`.
> It can be optimized as the sign of the CST matters, not the val
This is another case of load hoisting breaking UID order in the
preheader, this time between two hoistings. The easiest way out is
to do what we do for the main stmt - copy instead of move.
Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
PR tree-optimization/116902
P
On Tue, Oct 1, 2024 at 12:01 PM Eric Botcazou wrote:
>
> Hi,
>
> the attached Ada testcase compiled with -O -flto exhibits a wrong code issue
> when the 3 optimizations NRV + RSO + inlining are applied to the same call: if
> the LHS of the call is marked write-only before inlining, then it will ke
With single-lane SLP we miss to use the power realing loads causing
some testsuite FAILs. r14-2430-g4736ddd11874fe exempted SLP of
non-grouped accesses because that could have been only splats
where the scheme isn't used anyway, but now with single-lane SLP
it can be contiguous accesses.
Bootstra
gN(a), logN(a) + logN(b) -> logN(a*b),
> and logN(a) - logN(b) -> logN(a/b).
>
> gcc/testsuite/
> * gcc.dg/tree-ssa/log_ident.c: New test.
>
--
Richard Biener
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
On Mon, Sep 30, 2024 at 11:50 PM Andrew Pinski wrote:
>
> Take:
> ```
> if (t_3(D) != 0)
> goto ;
> else
> goto ;
>
>
> _8 = c_4(D) != 0;
> _9 = (int) _8;
>
>
> # e_2 = PHI <_9(3), 0(2)>
> ```
>
> We should factor out the conversion here as that will allow a simplfication t
On Mon, Sep 30, 2024 at 8:40 PM Tamar Christina wrote:
>
> Hi Victor,
>
> Thanks! This looks good to me with one minor comment:
>
> > -Original Message-
> > From: Victor Do Nascimento
> > Sent: Monday, September 30, 2024 2:34 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Tamar Christina ; ri
The following adds SLP support for vectorizing single-lane inductions
with variable length vectors.
Bootstrapped and tested on x86_64-unknown-linux-gnu.
PR tree-optimization/116566
* tree-vect-loop.cc (vectorizable_induction): Handle single-lane
SLP for VLA vectors.
---
g
When we're computing ANTIC for PRE we treat edges to not yet visited
blocks as having a maximum ANTIC solution to get at an optimistic
solution in the iteration. That assumes the edges visted eventually
execute. This is a wrong assumption that can lead to wrong code
(and not only non-optimality)
The following avoids querying ranges of vector entities.
Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
Richard.
PR tree-optimization/116905
* tree-vect-stmts.cc (supportable_indirect_convert_operation):
Fix guard for vect_get_range_info.
* gcc.dg
On Thu, Sep 26, 2024 at 2:25 PM wrote:
>
> From: Pan Li
>
> This patch would like to support the form 2 of the scalar signed
> integer SAT_SUB. Aka below example:
>
> Form 2:
> #define DEF_SAT_S_SUB_FMT_2(T, UT, MIN, MAX) \
> T __attribute__((noinline)) \
> sat_s_sub_##T##
The following adds SLP support for vectorizing single-lane inductions
with variable length vectors.
This is a WIP patch, local testing for SVE and riscv is fine but the
CI might discover issues.
PR tree-optimization/116566
* tree-vect-loop.cc (vectorizable_induction): Handle singl
When we relaxed the vectorizers constraint on loop structure verifying
the emptiness of the latch became too lose as can be seen in the case
for PR116879 where the latch effectively contains two basic-blocks
which one being an unmerged forwarder that's not empty.
Bootstrapped and tested on x86_64-
>if (check_bool_pattern (var, vinfo, bool_stmts))
> var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo);
>else if (integer_type_for_mask (var, vinfo))
> return NULL;
> else if (TREE_CODE (TREE_TYPE (var)) == BOOLEAN_TYPE
> -&&
PTA asserts that EAF_NO_DIRECT_READ is not set when flags are
set consistently which doesn't make sense. The following removes
the assert.
Bootstrap & regtest running on x86_64-unknown-linux-gnu.
Richard.
PR tree-optimization/113197
* tree-ssa-structalias.cc (handle_call_arg): R
On Sun, Sep 29, 2024 at 5:13 PM Florian Weimer wrote:
>
> Sometimes this is a user error, sometimes it is more of an ICE.
> In either case, more information about the conflict is helpful.
>
> I used to this to get a better idea about what is going on with
> PR116887. The original diagnostics look
On Sun, Sep 29, 2024 at 8:01 PM Jeff Law wrote:
>
>
>
> On 9/13/24 5:06 AM, Mariam Arutunian wrote:
> > Symbolically execute potential CRC loops and check whether the loop
> > actually calculates CRC (uses LFSR matching).
> > Calculated CRC and created LFSR are compared on each iteration of the
>
The following fixes the case when vectorizing a load hoists an invariant
load and dependent stmts, thereby breaking UID order of said stmts.
While we duplicate the load we just move the dependences.
Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
Richard.
PR tree-optimizat
On Sat, 28 Sep 2024, Jakub Jelinek wrote:
> Hi!
>
> C++ has
> https://eel.is/c++draft/dcl.init#general-6.2
> https://eel.is/c++draft/dcl.init#general-6.3
> which says that during zero-initialization padding bits of structures
> and unions are zero initialized, and in
> https://eel.is/c++draft/dcl
On Fri, Sep 27, 2024 at 6:39 PM Artemiy Volkov
wrote:
>
> On 9/27/2024 1:24 PM, Richard Biener wrote:
> > On Mon, 23 Sep 2024, Artemiy Volkov wrote:
> >
> >> Implement a match.pd transformation inverting the sign of X in
> >> C1 - X cmp C2, where C1 and C2 are
On Fri, 27 Sep 2024, Jakub Jelinek wrote:
> On Fri, Sep 27, 2024 at 12:14:47PM +0200, Richard Biener wrote:
> > I can investigate a bit when there's a testcase showing the issue.
>
> The testcase is pr78687.C with Marek's cp-gimplify.cc patch.
OK, I can reproduce. The
The following moves my entry to where it belongs alphabetically
(it wasn't moved when s/Guenther/Biener/).
Pushed as obvious.
* doc/contrib.texi (Richard Biener): Move entry.
---
gcc/doc/contrib.texi | 8
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/gc
When there's volatile qualified stores we do not have to treat the
destination as pointing to ANYTHING. It's only when reading from
it that we want to treat the resulting pointers as pointing to ANYTHING.
Bootstrapped and tested on x86_64-unknown-linux-gnu.
Richard.
PR tree-optimization
-tree-dump-times "gimple_simplified to.* \\+ -11.*\n.*>=
> -21" 1 "forwprop1" } } */
> +/* { dg-final { scan-tree-dump-times "gimple_simplified to.* \\+ -11.*\n.*>=
> -30" 1 "forwprop1" } } */
> +/* { dg-final { scan-tree-dump-times "gimple_simp
On Fri, 27 Sep 2024, Jakub Jelinek wrote:
> On Fri, Sep 27, 2024 at 08:16:43AM +0200, Richard Biener wrote:
> > > __attribute__((noinline))
> > > struct ref_proxy f ()
> > > {
> > >struct ref_proxy ptr;
> > >struct ref_proxy D.10036;
l < 20; // f() > -21u
> +}
> +
> +int32_t i3b(void)
> +{
> + uint32_t l = 30 + (uint32_t)f();
> + return l >= 30; // f() <= -31u
> +}
> +
> +int32_t i3c(void)
> +{
> + uint32_t l = 40 + (uint32_t)f();
> + return l > 39; // f() < -39u
rwprop1" } } */
> +/* { dg-final { scan-tree-dump-times "gimple_simplified to.* \\+ 19.*\n.*<=
> 29" 1 "forwprop1" } } */
> +/* { dg-final { scan-tree-dump-times "gimple_simplified to.* \\+
> 4294967196.*\n.*<= 100" 1 "forwprop1" } } */
> +/* { dg-final { scan-tree-dump-times "gimple_simplified to.* \\+
> 4294967095.*\n.*<= 99" 1 "forwprop1" } } */
> +/* { dg-final { scan-tree-dump-times "gimple_simplified to.* \\+ 999.*\n.*>
> 1999" 1 "forwprop1" } } */
> +/* { dg-final { scan-tree-dump-times "gimple_simplified to.* \\+ 2000.*\n.*>
> 3000" 1 "forwprop1" } } */
> +/* { dg-final { scan-tree-dump-times "gimple_simplified to.* \\+
> 4294957295.*\n.*> " 1 "forwprop1" } } */
> +/* { dg-final { scan-tree-dump-times "gimple_simplified to.* \\+
> 4294947296.*\n.*> 1" 1 "forwprop1" } } */
>
--
Richard Biener
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
2; // return 0
> +}
> +
> +int32_t i1g(void)
> +{
> + int32_t l = 2;
> + l = INT32_MAX/2 + 30 - (int32_t)f();
> + return l <= INT32_MIN/2 - 30; // return 1
> +}
> +
> +
> +/* { dg-final { scan-tree-dump-times "Removing dead stmt:.*?- _" 5
> "forwprop
. I will take care to include the tag in the git commit message.
OK.
Thanks,
Richard.
> Thanks,
> Filip Kastl
>
--
Richard Biener
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
On Fri, Sep 27, 2024 at 9:52 AM wrote:
>
> From: Pan Li
>
> We iterate all phi node of bb to try to match the SAT_* pattern
> for scalar integer. We also remove the phi mode when the relevant
> pattern matched.
>
> Unfortunately the iterator may have no idea the phi node is removed
> and continu
On Sun, Sep 22, 2024 at 5:49 AM -thor wrote:
>
> From: thor
>
> This is the second revision of:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662849.html
>
> I've incorporated the feedback given both by Richard and David - I didn't
> find any memory leaks when testing in valg
On Sun, Sep 22, 2024 at 5:49 AM -thor wrote:
>
> From: thor
>
> This patch allows one to dump a tree as HTML from within gdb by invoking,
> i.e,
> htlml-tree tree
I have managed to get a browser window launched with the following
incremental patch
(xdg-open should be a better default than
On Fri, Sep 27, 2024 at 6:27 AM Pietro Monteiro
wrote:
>
> The prefetch instruction that is emitted by __builtin_prefetch is re-ordered
> on GCC, but not on clang[0]. GCC's behavior is surprising because when using
> the builtin you want the instruction to be placed at the exact point where
> y
!= 2147395600)
> > + abort ();
> > +
> > + return 0;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
> > +/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern:
> > detected" 46 "vect" } } */
> > +/* { dg-final { scan-assembler "\[ \t\]udot\tz\[0-9\]+.s, z\[0-9\]+.h,
> > z\[0-9\]+.h" } } */
> > +/* { dg-final { scan-assembler "\[ \t\]sdot\tz\[0-9\]+.s, z\[0-9\]+.h,
> > z\[0-9\]+.h" } } */
> > diff --git a/gcc/testsuite/lib/target-supports.exp
> > b/gcc/testsuite/lib/target-supports.exp
> > index 11ba77ca404..ebbc2fb8015 100644
> > --- a/gcc/testsuite/lib/target-supports.exp
> > +++ b/gcc/testsuite/lib/target-supports.exp
> > @@ -4258,6 +4258,15 @@ proc check_effective_target_vect_int { } {
> > }}]
> > }
> >
> > +# Return 1 if the target supports two-way dot products on inputs of hi mode
> > +# producing si outputs, 0 otherwise.
> > +
> > +proc check_effective_target_vect_dotprod_hisi { } {
> > +return [check_cached_effective_target_indexed aarch64_sme2 {
> > + expr { [check_effective_target_aarch64_sme2]
> > +}}]
> > +}
> > +
> > # Return 1 if the target supports vectorization of early breaks,
> > # 0 otherwise.
> > #
>
>
--
Richard Biener
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
1 - 100 of 2040 matches
Mail list logo