[Bug tree-optimization/122776] vectorizable_simd_clone_call looks at LOOP_VINFO_FULLY_MASKED_P during analysis

2025-12-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122776

--- Comment #7 from GCC Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:941df4bc7ac83574374a3623ac11b8685c2282cb

commit r16-5898-g941df4bc7ac83574374a3623ac11b8685c2282cb
Author: Richard Biener 
Date:   Fri Dec 5 09:04:02 2025 +0100

Make gcc.dg/vect/vect-simd-clone-24.c more robust

When -march=cascadelake is added we get 256bit vectorization by
default but there's no OMP SIMD ABI for this case with inbranch.
So add -mprefer-vector-width=512 to the testcase.

PR tree-optimization/122776
* gcc.dg/vect/vect-simd-clone-24.c: Add -mprefer-vector-width=512.

[Bug tree-optimization/122776] vectorizable_simd_clone_call looks at LOOP_VINFO_FULLY_MASKED_P during analysis

2025-12-04 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122776

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
   Target Milestone|--- |16.0
 Resolution|--- |FIXED

--- Comment #6 from Richard Biener  ---
Fixed in GC 16.

[Bug tree-optimization/122776] vectorizable_simd_clone_call looks at LOOP_VINFO_FULLY_MASKED_P during analysis

2025-12-04 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122776

--- Comment #5 from GCC Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:cea34ac07e3bd74086f854dd43afb29b4940d8de

commit r16-5888-gcea34ac07e3bd74086f854dd43afb29b4940d8de
Author: Richard Biener 
Date:   Sun Nov 23 14:01:03 2025 +0100

Select both inbranch and notinbranch clone during SIMD call analysis

The following recors both a possibly notinbranch and an inbranch
SIMD clone during analysis so that we can properly handle the
late decision on loop masking.  Recording of linear-clause data
from analysis is extended to cover linear-clause arguments from
both clones.

This also fixes AVX512 masked loop code generation in line with
the previous fixes.

PR tree-optimization/122776
* tree-vectorizer.h (vect_simd_clone_data::clone,
vect_simd_clone_data::clone_inbranch): New fields for
the two selected clones.
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Record
both a possibly notinbranch and a inbranch clone.  Delay
the choice between both to code generation based on
LOOP_VINFO_FULLY_MASKED_P.

* gcc.dg/vect/vect-simd-clone-24.c: New testcase.
* gcc.dg/gomp/pr110485.c: Adjust.

[Bug tree-optimization/122776] vectorizable_simd_clone_call looks at LOOP_VINFO_FULLY_MASKED_P during analysis

2025-11-21 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122776

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #4 from Richard Biener  ---
I'll see if this works out next week.

[Bug tree-optimization/122776] vectorizable_simd_clone_call looks at LOOP_VINFO_FULLY_MASKED_P during analysis

2025-11-21 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122776

--- Comment #3 from Richard Biener  ---
(In reply to Jakub Jelinek from comment #2)
> (In reply to Richard Biener from comment #1)
> > Not only due to this check the following testcase fails to use fully masking
> > or a masked epilog.  I'm not sure how common it would be to have both
> > inbranch
> > and notinbranch variants, I did not check if there's a syntax to get both
> > created for a function definition.
> > 
> > #pragma omp declare simd simdlen(16) inbranch
> > int __attribute__((const)) baz (int x);
> > #pragma omp declare simd simdlen(16) notinbranch
> > int __attribute__((const)) baz (int x);
> 
> Note, just
> #pragma omp declare simd simdlen(16)
> int __attribute__((const)) baz (int x);
> should create both inbranch and notinbranch clones, or you can use
> #pragma omp declare simd simdlen(16) inbranch
> #pragma omp declare simd simdlen(16) notinbranch
> int __attribute__((const)) baz (int x);
> too.
> 
> I believe the #c0 mentioned hunk is from Andre, and if
> LOOP_VINFO_FULLY_MASKED_P isn't computed until late, I think we need some
> alternative for that.

I think we need to compute both a inbranch and notinbranch "optimal" SIMD
clone.  We only know after vectorizable_simd_clone_call analysis whether we
actually will use a masked loop or not.  If we don't we can code-generate
for the notinbranch variant if that's available.

[Bug tree-optimization/122776] vectorizable_simd_clone_call looks at LOOP_VINFO_FULLY_MASKED_P during analysis

2025-11-21 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122776

Jakub Jelinek  changed:

   What|Removed |Added

 CC||avieira at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek  ---
(In reply to Richard Biener from comment #1)
> Not only due to this check the following testcase fails to use fully masking
> or a masked epilog.  I'm not sure how common it would be to have both
> inbranch
> and notinbranch variants, I did not check if there's a syntax to get both
> created for a function definition.
> 
> #pragma omp declare simd simdlen(16) inbranch
> int __attribute__((const)) baz (int x);
> #pragma omp declare simd simdlen(16) notinbranch
> int __attribute__((const)) baz (int x);

Note, just
#pragma omp declare simd simdlen(16)
int __attribute__((const)) baz (int x);
should create both inbranch and notinbranch clones, or you can use
#pragma omp declare simd simdlen(16) inbranch
#pragma omp declare simd simdlen(16) notinbranch
int __attribute__((const)) baz (int x);
too.

I believe the #c0 mentioned hunk is from Andre, and if
LOOP_VINFO_FULLY_MASKED_P isn't computed until late, I think we need some
alternative for that.

[Bug tree-optimization/122776] vectorizable_simd_clone_call looks at LOOP_VINFO_FULLY_MASKED_P during analysis

2025-11-21 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122776

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2025-11-21
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
 CC||jakub at gcc dot gnu.org

--- Comment #1 from Richard Biener  ---
Not only due to this check the following testcase fails to use fully masking or
a masked epilog.  I'm not sure how common it would be to have both inbranch
and notinbranch variants, I did not check if there's a syntax to get both
created for a function definition.

#pragma omp declare simd simdlen(16) inbranch
int __attribute__((const)) baz (int x);
#pragma omp declare simd simdlen(16) notinbranch
int __attribute__((const)) baz (int x);

int a[1024];

void foo (int n, int * __restrict b)
{
  for (int i = 0; i < n; ++i)
if (baz (a[i]))
  b[i] = baz (b[i]);
}