[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622

2026-01-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762

--- Comment #30 from Andrew Pinski  ---
> Similarly to before it's a bit non-deterministic for some reason.

Yes it feels that way too. See PR 123773 which I filed for the same ICE.
I noticed that the C++ testcase I added only fails with -g even.

[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622

2025-12-04 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|REOPENED|RESOLVED

--- Comment #29 from Richard Biener  ---
(In reply to Tamar Christina from comment #28)
> The ICE is back after re-landing of the original commit.

Re-landing?  What's the rev this appears at?

> Though only affecting wrf now.
> 
> module_cu_g3.fppized.f90: In function 'g3drv.constprop':
> module_cu_g3.fppized.f90:9:4: internal compiler error: in prepare_vec_mask,
> at tree-vect-stmts.cc:1605
> 9 |SUBROUTINE G3DRV(&
>   |^
> 0x1d53ff3 internal_error(char const*, ...)
>   
> /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/diagnostic-global-context.cc:787
> 0x840e37 fancy_abort(char const*, int, char const*)
>   /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/diagnostics/context.cc:1805
> 0x10437eb prepare_vec_mask(_loop_vec_info*, tree_node*, tree_node*,
> tree_node*, gimple_stmt_iterator*)
>   /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-stmts.cc:1605
> 0x1063ceb vectorizable_store
>   /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-stmts.cc:9261
> 0x1067fa3 vect_transform_stmt(vec_info*, _stmt_vec_info*,
> gimple_stmt_iterator*, _slp_tree*, _slp_instance*)
>   /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-stmts.cc:13183
> 0x109c75b vect_schedule_slp_node
> 
> Reducer:
> 
> !GCC$ builtin (exp) attributes simd (notinbranch)
>   MODULE MODULE_CU_BMJ
>   INTEGER:: JTB
>   CONTAINS
>   SUBROUTINE BMJDRVRQVCUTEN
> REAL, DIMENSION(JTB) :: THEOLD,TOLDY2T
>   DO KTH=1,KTHM
> TH=TH+DTH
> DENOM=TH
> IF (DENOM>EPS) THEN
>QS=EXP(0/DENOM)
> ELSE
>QS=0.
> ENDIF
> THEOLD(KTH)=EXP(ELOCP*QS)
>   ENDDO
>   CALL SPLINE
>   END
>   END
> 
> Compiled with -Ofast -mcpu=neoverse-v2
> 
> Similarly to before it's a bit non-deterministic for some reason.

Unfortunately to a point that a cross does not reproduce it for me.
It also seems to be only tangentially related given it hits in
vectorizable_store and not vectorizable_simd_clone_call so please
track this in a new bug.

[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622

2025-12-03 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1

[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622

2025-12-01 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762

Tamar Christina  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|FIXED   |---

--- Comment #28 from Tamar Christina  ---
The ICE is back after re-landing of the original commit.

Though only affecting wrf now.

module_cu_g3.fppized.f90: In function 'g3drv.constprop':
module_cu_g3.fppized.f90:9:4: internal compiler error: in prepare_vec_mask, at
tree-vect-stmts.cc:1605
9 |SUBROUTINE G3DRV(&
  |^
0x1d53ff3 internal_error(char const*, ...)
   
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/diagnostic-global-context.cc:787
0x840e37 fancy_abort(char const*, int, char const*)
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/diagnostics/context.cc:1805
0x10437eb prepare_vec_mask(_loop_vec_info*, tree_node*, tree_node*, tree_node*,
gimple_stmt_iterator*)
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-stmts.cc:1605
0x1063ceb vectorizable_store
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-stmts.cc:9261
0x1067fa3 vect_transform_stmt(vec_info*, _stmt_vec_info*,
gimple_stmt_iterator*, _slp_tree*, _slp_instance*)
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-stmts.cc:13183
0x109c75b vect_schedule_slp_node

Reducer:

!GCC$ builtin (exp) attributes simd (notinbranch)
  MODULE MODULE_CU_BMJ
  INTEGER:: JTB
  CONTAINS
  SUBROUTINE BMJDRVRQVCUTEN
REAL, DIMENSION(JTB) :: THEOLD,TOLDY2T
  DO KTH=1,KTHM
TH=TH+DTH
DENOM=TH
IF (DENOM>EPS) THEN
   QS=EXP(0/DENOM)
ELSE
   QS=0.
ENDIF
THEOLD(KTH)=EXP(ELOCP*QS)
  ENDDO
  CALL SPLINE
  END
  END

Compiled with -Ofast -mcpu=neoverse-v2

Similarly to before it's a bit non-deterministic for some reason.

[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622

2025-11-21 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #27 from Richard Biener  ---
Should be fixed now.

[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622

2025-11-21 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762

--- Comment #26 from GCC Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:9dff0355c75d70d2a7583b626426599ee054406a

commit r16-5491-g9dff0355c75d70d2a7583b626426599ee054406a
Author: Richard Biener 
Date:   Fri Nov 21 12:14:46 2025 +0100

Fix OMP SIMD clone mask register and query

The following removes the confusion around num_mask_args that was
added to properly "guess" the number of mask elements in a AVX512
mask that's just represented as int.  The actual mistake lies in
the mixup of 'ncopies' which is used to track the number of
OMP SIMD calls to be emitted rather than the number of input
vectors.  So this reverts the earlier r16-5374-g5c2fdfc24e343c,
uses the proper 'ncopies' for loop mask record/query and adjusts
the guessing of the SIMD arg mask elements.

PR tree-optimization/122762
PR tree-optimization/122736
PR tree-optimization/122790
* cgraph.h (cgraph_simd_clone_arg::linear_step): Document
use for SIMD_CLONE_ARG_TYPE_MASK.
* omp-simd-clone.cc (simd_clone_adjust_argument_types):
Record the number of mask arguments in linear_step if
mask_mode is not VOIDmode.
* tree-vect-stmts.cc (vectorizable_simd_clone_call):
Remove num_mask_args computation, use a proper ncopies
to query/register loop masks, use linear_step for the
number of mask arguments when determining the number of
mask elements in a mask argument.

* gcc.dg/vect/vect-simd-clone-23.c: New testcase.

[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622

2025-11-21 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762

--- Comment #25 from rguenther at suse dot de  ---
On Thu, 20 Nov 2025, rsandifo at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762
> 
> --- Comment #24 from Richard Sandiford  ---
> (In reply to Richard Biener from comment #21)
> > (In reply to Richard Sandiford from comment #17)
> > > (In reply to Richard Biener from comment #11)
> > > > (In reply to Richard Sandiford from comment #10)
> > [...]
> > > > > As for why it does that, see the commit message for
> > > > > g:482b2b43e5101921ad94e51e052a18b353f8a3f5, which introduced the 
> > > > > check.
> > > >
> > > > Hm, but that talks about argument passing / return ABIs?
> > > > useless_type_conversion_p applies to value types, at most its check
> > > > for FUNCTION_TYPE compatibility would need to use this check?
> > > 
> > > Wouldn't that be bad for composability though?  It would seem odd for "T1
> > > f()" vs "T2 g()" to impose stricter rules than T1 vs T2.
> > 
> > Sure, but useless_type_conversion_p is generally the wrong place to address
> > ABI compatibility issues.  What breaks when not having this check in
> > useless_type_conversion_p?  The testcases added, if they break, would
> > point to places where we use this function wrongly.  It's by far
> > "complete" to check whether two FUNCTION_TYPE have the same ABI.
> 
> I think the FUNCTION_TYPE thing is a bit of a red herring.  The main reason 
> for
> the check is to preserve the types of vector values.  That's because
> code-generation of calls depends on the gimple type of the value being passed,
> rather than the argument type recorded in the FUNCTION_TYPE.  For example, see
> calls.cc:
> 
> FOR_EACH_CALL_EXPR_ARG (arg, iter, exp)
>   {
> tree argtype = TREE_TYPE (arg);
> 
> if (targetm.calls.split_complex_arg…)
>   …
> else
>   args[j].tree_value = arg;
> 
> as used in:
> 
>   for (argpos = 0; argpos < num_actuals; i--, argpos++)
> {
>   tree type = TREE_TYPE (args[i].tree_value);
>   …
>   function_arg_info arg (type, argpos < n_named_args);
>   if (pass_by_reference (args_so_far_pnt, arg))
> …
> }
> 
> where the way that something is passed is keyed off the type of the gimple
> value.

Hmm, IMO that's broken by design.

> The FUNCTION_TYPE's TYPE_ARG_TYPES is only used to distinguish
> prototyped from unprototyped functions and to determine which arguments are
> named and which are varargs.
> 
> And I think it has to be that way.  The different passing of SVE vs. non-SVE
> types extends to varargs.  If gimple didn't preserve the difference, then we
> wouldn't know which convention to use.

OK, for varargs I'd agree, but then this should ideally be nailed down
before or at gimplification, after that the actual value type shouldn't
come into play.

That this all makes two identical value types incompatible on GIMPLE
for SVE is quite unfortunate.

[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622

2025-11-20 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762

--- Comment #24 from Richard Sandiford  ---
(In reply to Richard Biener from comment #21)
> (In reply to Richard Sandiford from comment #17)
> > (In reply to Richard Biener from comment #11)
> > > (In reply to Richard Sandiford from comment #10)
> [...]
> > > > As for why it does that, see the commit message for
> > > > g:482b2b43e5101921ad94e51e052a18b353f8a3f5, which introduced the check.
> > >
> > > Hm, but that talks about argument passing / return ABIs?
> > > useless_type_conversion_p applies to value types, at most its check
> > > for FUNCTION_TYPE compatibility would need to use this check?
> > 
> > Wouldn't that be bad for composability though?  It would seem odd for "T1
> > f()" vs "T2 g()" to impose stricter rules than T1 vs T2.
> 
> Sure, but useless_type_conversion_p is generally the wrong place to address
> ABI compatibility issues.  What breaks when not having this check in
> useless_type_conversion_p?  The testcases added, if they break, would
> point to places where we use this function wrongly.  It's by far
> "complete" to check whether two FUNCTION_TYPE have the same ABI.

I think the FUNCTION_TYPE thing is a bit of a red herring.  The main reason for
the check is to preserve the types of vector values.  That's because
code-generation of calls depends on the gimple type of the value being passed,
rather than the argument type recorded in the FUNCTION_TYPE.  For example, see
calls.cc:

FOR_EACH_CALL_EXPR_ARG (arg, iter, exp)
  {
tree argtype = TREE_TYPE (arg);

if (targetm.calls.split_complex_arg…)
  …
else
  args[j].tree_value = arg;

as used in:

  for (argpos = 0; argpos < num_actuals; i--, argpos++)
{
  tree type = TREE_TYPE (args[i].tree_value);
  …
  function_arg_info arg (type, argpos < n_named_args);
  if (pass_by_reference (args_so_far_pnt, arg))
…
}

where the way that something is passed is keyed off the type of the gimple
value.  The FUNCTION_TYPE's TYPE_ARG_TYPES is only used to distinguish
prototyped from unprototyped functions and to determine which arguments are
named and which are varargs.

And I think it has to be that way.  The different passing of SVE vs. non-SVE
types extends to varargs.  If gimple didn't preserve the difference, then we
wouldn't know which convention to use.

[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622

2025-11-20 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762

--- Comment #23 from Jakub Jelinek  ---
And if you want to see it in vectorization action, perhaps
#pragma omp declare simd simdlen(64) inbranch
char foo (char x);
#pragma omp declare simd simdlen(64) inbranch
short qux (short x);
#pragma omp declare simd simdlen(64) inbranch
int bar (int x);
#pragma omp declare simd simdlen(32) inbranch
long long baz (long long x);

char a[1024], b[1024];

void
xyzzy ()
{
  for (int i = 0; i < 1024; ++i)
if (a[i])
  b[i] = foo (b[i]);
}

short c[1024], d[1024];

void
fred ()
{
  for (int i = 0; i < 1024; ++i)
if (c[i])
  d[i] = qux (d[i]);
}

int e[1024], f[1024];

void
garply ()
{
  for (int i = 0; i < 1024; ++i)
if (e[i])
  f[i] = bar (f[i]);
}

long long g[1024], h[1024];

void
corge ()
{
  for (int i = 0; i < 1024; ++i)
if (g[i])
  h[i] = baz (h[i]);
}
-O3 -mavx512f I'd say.

[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622

2025-11-20 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762

--- Comment #22 from Jakub Jelinek  ---
Perhaps something similar to what is used e.g. for AVX512F gathers (those also
have integer masks).
Though, the ZMM simd clones are more complicated, because when the
characteristic data type is large, the mask can hold just a few bits, fewer
than the precision of the type.
Consider
#pragma omp declare simd simdlen(64) inbranch
char foo (char x) { return x; }
#pragma omp declare simd simdlen(64) inbranch
short qux (short x) { return x; }
#pragma omp declare simd simdlen(64) inbranch
int bar (int x) { return x; }
#pragma omp declare simd simdlen(32) inbranch
long long baz (long long x) { return x; }
The _ZGVe* clones will have:
for foo just a single 64-bit integer mask argument
for qux 2 32-bit integer mask arguments, each holding 32 bits
for bar 4 32-bit integer mask arguments, each holding just 16 bits
for baz 4 32-bit integer mask arguments, each holding 16 bits as well (for
simdlen(64)
baz we punt, too large simdlen, because for _ZGVb it would need 32 arguments
for x and that is more than the ABI specifies).

[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622

2025-11-20 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762

--- Comment #21 from Richard Biener  ---
(In reply to Richard Sandiford from comment #17)
> (In reply to Richard Biener from comment #11)
> > (In reply to Richard Sandiford from comment #10)
[...]
> > > As for why it does that, see the commit message for
> > > g:482b2b43e5101921ad94e51e052a18b353f8a3f5, which introduced the check.
> >
> > Hm, but that talks about argument passing / return ABIs?
> > useless_type_conversion_p applies to value types, at most its check
> > for FUNCTION_TYPE compatibility would need to use this check?
> 
> Wouldn't that be bad for composability though?  It would seem odd for "T1
> f()" vs "T2 g()" to impose stricter rules than T1 vs T2.

Sure, but useless_type_conversion_p is generally the wrong place to address
ABI compatibility issues.  What breaks when not having this check in
useless_type_conversion_p?  The testcases added, if they break, would
point to places where we use this function wrongly.  It's by far
"complete" to check whether two FUNCTION_TYPE have the same ABI.

> > @Richard Sandiford: Potentially silly question as you have probably
> > thought of this, but I was reading through your patch 'Add a
> > compatible_vector_types_p target hook' and I was wondering whether
> > it wouldn't make more sense to implement the more profitable ABI
> > by describing it as such. That is, get rid of this special 'SVE'
> > type thing and mark all previous functions that used this type as
> > 'vector_abi' (I think that's what we call it?) such that the types
> > are all the same/equivalent, but we only differ in behaviour at
> > ABI boundary where it matters?
> 
> At least for SVE, “vector ABI” just controls the call-clobbered register
> state.  Whether something is “vector ABI” is determined by how its arguments
> and return values are passed, rather than the other way around.

[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622

2025-11-20 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762

Richard Biener  changed:

   What|Removed |Added

 Blocks||122736

--- Comment #20 from Richard Biener  ---
(In reply to Jakub Jelinek from comment #19)
> (In reply to Richard Biener from comment #18)
> > so here we have a simdclone with 'unsigned int' vector_type for its
> > SIMD_CLONE_ARG_TYPE_MASK.  That's unexpected.
> 
> That is a normal expected case, that just means that the mask argument is
> passed
> as bits inside of unsigned int (ncopies unsigned int arguments in
> particular, each one
> holding simdlen / ncopies bits).  See the above omp-simd-clone.cc code which
> sets it up that way.

Hmm.  OK.

So then I suppose the confusion is with what data vector type to associate the
SIMD_CLONE_ARG_TYPE_MASK argument.  As it seems that vector_type is not
useable as-is, at least for AVX512, we need some vector type to record/query
the loop mask for.  On the use side we fail to AND the loop mask with any
.COND_CALL (...), that's wrong-code, but for a regular call masked loop we do

  tree masktype = bestn->simdclone->args[mask_i].vector_type;
  if (SCALAR_INT_MODE_P (bestn->simdclone->mask_mode))
/* Guess the number of lanes represented by masktype.  */
callee_nelements = exact_div (bestn->simdclone->simdlen,
  bestn->simdclone->nargs - nargs);
  else
callee_nelements = TYPE_VECTOR_SUBPARTS (masktype);
  o = vector_unroll_factor (nunits, callee_nelements);
  for (m = j * o; m < (j + 1) * o; m++)
{
  if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
{
  vec_loop_masks *loop_masks = &LOOP_VINFO_MASKS (loop_vinfo);
  mask = vect_get_loop_mask (loop_vinfo, gsi, loop_masks,
 ncopies, masktype, j);

for an integer masktype this call will ICE.  Originally on the mask
recording side we used 'vectype', so possibly that would be correct here,
too.  This is possibly the base_type in the simdclone code you quote.

Unfortunately this doesn't allow me to revert r16-5374-g5c2fdfc24e343c
without re-introducing PR122736.  *sigh*

But it would fix the gcc.dg/vect/vect-simd-clone-20.c ICE.

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 748b3bcb0ab..627d940c023 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -4885,7 +4885,7 @@ vectorizable_simd_clone_call (vec_info *vinfo,
stmt_vec_info stmt_info,
{
  vec_loop_masks *loop_masks = &LOOP_VINFO_MASKS (loop_vinfo);
  mask = vect_get_loop_mask (loop_vinfo, gsi, loop_masks,
-ncopies, masktype, j);
+ncopies, vectype, j);
}
  else
mask = vect_build_all_ones_mask (vinfo, stmt_info, masktype);


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122736
[Bug 122736] loop masking with SIMD clones ICEs in vect_verify_full_masking

[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622

2025-11-20 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762

--- Comment #19 from Jakub Jelinek  ---
(In reply to Richard Biener from comment #18)
> so here we have a simdclone with 'unsigned int' vector_type for its
> SIMD_CLONE_ARG_TYPE_MASK.  That's unexpected.

That is a normal expected case, that just means that the mask argument is
passed
as bits inside of unsigned int (ncopies unsigned int arguments in particular,
each one
holding simdlen / ncopies bits).  See the above omp-simd-clone.cc code which
sets it up that way.

[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622

2025-11-20 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762

--- Comment #18 from Richard Biener  ---
(In reply to Richard Biener from comment #13)
> I can reproduce a similar (ASLR dependent) issue with
> 
> make check-gcc
> RUNTESTFLAGS="--target_board=unix/-march=znver4/--param=vect-partial-vector-
> usage=2 vect.exp=vect-simd-clone-20.c"
> 
> on x86_64:
> 
> /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gcc.dg/vect/vect-simd-
> clone-20.c:24:1: internal compiler error: tree check: expected vector_type,
> have integer_type in TYPE_VECTOR_SUBPARTS, at tree.h:4368
> 0x37e157b internal_error(char const*, ...)
>
> /space/rguenther/src/gcc-autopar_devel/gcc/diagnostic-global-context.cc:787
> 0x1e96386 tree_check_failed(tree_node const*, char const*, int, char const*,
> ...)
> /space/rguenther/src/gcc-autopar_devel/gcc/tree.cc:9189
> 0xf74d9f tree_check(tree_node const*, char const*, int, char const*,
> tree_code)
> /space/rguenther/src/gcc-autopar_devel/gcc/tree.h:4034
> 0x101780b TYPE_VECTOR_SUBPARTS(tree_node const*)
> /space/rguenther/src/gcc-autopar_devel/gcc/tree.h:4368
> 0x1e00e83 vect_get_loop_mask(_loop_vec_info*, gimple_stmt_iterator*,
> vec_loop_masks*, unsigned int, tree_node*, unsigned int)
> /space/rguenther/src/gcc-autopar_devel/gcc/tree-vect-loop.cc:10590
> 0x1daec1d vectorizable_simd_clone_call
> /space/rguenther/src/gcc-autopar_devel/gcc/tree-vect-stmts.cc:4887

so here we have a simdclone with 'unsigned int' vector_type for its
SIMD_CLONE_ARG_TYPE_MASK.  That's unexpected.

One issue is that we do not inspect all simdclone args when not wrapped
in a .MASK_CALL, but for loop masking we'd still need to do that.  The
logic also seems to expect a 1:1 relationship between scalar call number
of args and simdclone number of args (not allowing for larger simdlen
with multiple args for an original scalar arg).

I wonder whether it's worth special-casing that we are going to always
be able to code generate the all-true mask when masking is not necessary
but we chose an in-branch simdclone.  Otherwise we could selectively
disable loop masking when the ARG_TYPE_MASK vectype isn't a vector type.

[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622

2025-11-20 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762

--- Comment #17 from Richard Sandiford  ---
(In reply to Richard Biener from comment #11)
> (In reply to Richard Sandiford from comment #10)
> > +Andre since this is about SIMD clones.
> > 
> > (In reply to Richard Biener from comment #8)
> > > simd_clone_adjust_sve_vector_type does
> > > 
> > > 30276 /* ??? This creates anonymous "SVE type" attributes for all 
> > > types,
> > > 30277even those that correspond to  types.  This 
> > > affects
> > > type
> > > 30278compatibility in C/C++, but not in gimple.  (Gimple type
> > > equivalence
> > > 30279is instead decided by TARGET_COMPATIBLE_VECTOR_TYPES_P.)
> > > 30280
> > > (gdb) 
> > > 30281Thus a C/C++ definition of the implementation function will
> > > have a
> > > 30282different function type from the declaration that this code
> > > creates.
> > > 30283However, it doesn't seem worth trying to fix that until we 
> > > have
> > > a
> > > 30284way of handling implementations that operate on unpacked 
> > > types.
> > > */
> > > 30285 type = build_distinct_type_copy (type);
> > > 30286 aarch64_sve::add_sve_type_attribute (type, num_zr, num_pr, NULL,
> > > NULL);
> > > 
> > > but at least the comment about GIMPLE type compatibility is wrong.
> > 
> > Why do you say that it's wrong?  AIUI (going only from the description) the
> > ICE is caused by the fact that gimple (useless_type_conversion_p) does check
> > TARGET_COMPATIBLE_VECTOR_TYPES_P.
> 
> The comment says it does _not_ affect gimple type compatibility.  Then of
> course follows to say that's decided with that hook, but that hook checks
> presence of the SVE type attribute.

Ah, I see.  I thought you meant that the bit in brackets was wrong.

What the wider comment means is that the SVE type attribute created here is
anonymous, whereas the SVE type attributes created for the “real” built-in SVE
ACLE types are not anonymous.  This would make the types created here
incompatible in C/C++ terms with the built-in SVE types.

For example, in C/C++ terms, a scalable vector of int32_ts created by this code
would not be compatible with the built-in svint32_t type.  But the two types
would be compatible in gimple terms, because gimple only cares about the
presence or absence of the SVE attribute, not what the names are.  This is why
creating a “proper” SVE type attribute wasn't worth the effort.

In other words, the wider comment is describing compatibility rules between two
SVE types, whereas in this PR the issue is compatiblity between SVE and non-SVE
types.  SVE and non-SVE types are incompatible in both C/C++ and gimple.

> > As for why it does that, see the commit message for
> > g:482b2b43e5101921ad94e51e052a18b353f8a3f5, which introduced the check.
>
> Hm, but that talks about argument passing / return ABIs?
> useless_type_conversion_p applies to value types, at most its check
> for FUNCTION_TYPE compatibility would need to use this check?

Wouldn't that be bad for composability though?  It would seem odd for "T1 f()"
vs "T2 g()" to impose stricter rules than T1 vs T2.

> @Richard Sandiford: Potentially silly question as you have probably
> thought of this, but I was reading through your patch 'Add a
> compatible_vector_types_p target hook' and I was wondering whether
> it wouldn't make more sense to implement the more profitable ABI
> by describing it as such. That is, get rid of this special 'SVE'
> type thing and mark all previous functions that used this type as
> 'vector_abi' (I think that's what we call it?) such that the types
> are all the same/equivalent, but we only differ in behaviour at
> ABI boundary where it matters?

At least for SVE, “vector ABI” just controls the call-clobbered register state.
 Whether something is “vector ABI” is determined by how its arguments and
return values are passed, rather than the other way around.

[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622

2025-11-20 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762

--- Comment #16 from Jakub Jelinek  ---
Here is the omp-simd-clone.cc code to create the mask types and arguments,
which should hopefully show up everything that needs to be matched on the
vectorizer side:
  tree base_type = simd_clone_compute_base_data_type (sc->origin, sc);
  tree mask_type;
  if (INTEGRAL_TYPE_P (base_type) || POINTER_TYPE_P (base_type))
veclen = sc->vecsize_int;
  else
veclen = sc->vecsize_float;
  if (known_eq (veclen, 0U))
veclen = sc->simdlen;
  else
veclen = exact_div (veclen,
GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type)));
  if (multiple_p (veclen, sc->simdlen))
veclen = sc->simdlen;
  if (sc->mask_mode != VOIDmode)
mask_type
  = lang_hooks.types.type_for_mode (sc->mask_mode, 1);
  else if (POINTER_TYPE_P (base_type))
mask_type = build_vector_type (pointer_sized_int_node, veclen);
  else
mask_type = build_vector_type (base_type, veclen);

  k = vector_unroll_factor (sc->simdlen, veclen);

  /* We have previously allocated one extra entry for the mask.  Use
 it and fill it.  */
  sc->nargs++;
  if (sc->mask_mode != VOIDmode)
base_type = boolean_type_node;
  if (node->definition)
{
  sc->args[i].orig_arg
= build_decl (UNKNOWN_LOCATION, PARM_DECL, NULL, base_type);
  if (sc->mask_mode == VOIDmode)
sc->args[i].simd_array
  = create_tmp_simd_array ("mask", base_type, sc->simdlen);
  else if (k > 1)
sc->args[i].simd_array
  = create_tmp_simd_array ("mask", mask_type, k);
  else
sc->args[i].simd_array = NULL_TREE;
}
  sc->args[i].orig_type = base_type;
  sc->args[i].arg_type = SIMD_CLONE_ARG_TYPE_MASK;
  sc->args[i].vector_type = mask_type;

The target hook can set sc->mask_mode to non-VOIDmode for the x86 ZMM like
cases, i.e. scalar bitmask argument (or scalar bitmasks),
VOIDmode for a normal vector of the characteristic data type.  The first line
computes the characteristic data type.
Then veclen is either sc->simdlen, or sc->simdlen is some integral multiple of
veclen, k is what will be ncopies on the vectorizer side.
The vector_type then is some scalar or vector type handling veclen masks and
there will be k arguments of this kind in the end,
but just one SIMD_CLONE_ARG_TYPE_MASK covering them all.

[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622

2025-11-20 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762

--- Comment #15 from avieira at gcc dot gnu.org ---
@Richard Sandiford: Potentially silly question as you have probably thought of
this, but I was reading through your patch 'Add a compatible_vector_types_p
target hook' and I was wondering whether it wouldn't make more sense to
implement the more profitable ABI by describing it as such. That is, get rid of
this special 'SVE' type thing and mark all previous functions that used this
type as 'vector_abi' (I think that's what we call it?) such that the types are
all the same/equivalent, but we only differ in behaviour at ABI boundary where
it matters?

@Jakub: I have to admit I have cleared my cache of this piece of work but I
assume that the reason num_args_mask is there is because we don't have modes
that represent 'ncopies' of masks. So I'm not sure how that'd work? Use BLK and
set the right size in tree ? Anyway it's been too long.

[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622

2025-11-20 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762

--- Comment #14 from Jakub Jelinek  ---
(In reply to Richard Biener from comment #12)
> (In reply to Jakub Jelinek from comment #9)
> > I believe there should be always at most one SIMD_CLONE_ARG_TYPE_MASK
> > argument, so no idea why the num_mask_args code has been added.
> > At least simd_clone_adjust_argument_types adds it either 0 times or once
> > (for ->inbranch case).
> > Based on ncopied it can (and often does) be turned into multiple arguments
> > in various ABIs, at least for larger vectorization factors.
> > Regarding SVE, I'm afraid I have no idea what the actual vector ABI is in
> > that case (and I think OpenMP standard support for the variable length ABIs
> > has only landed very recently).
> 
> Hm, seems I added this in r14-4629-g3179ad72f67f31.  For some reason I
> thought that there might be multiple mask arguments.  Like when simdlen is
> very large,
> 256 for example, and so we'd have multiple args as you say.

In the end there can be many mask arguments, but I believe there should be just
one SIMD_CLONE_ARG_TYPE_MASK, that is basically a made up scalar argument which
can then map to multiple vectorized arguments, like a simple int etc. can as
well, say for AVX512F versions for simdlen 16 int can be passed in a single
V16SImode argument,
but for simdlen 32 it needs 2 V16SImode arguments.  Similarly in the same case
mask
would be for simdlen 16 a single integer with 16 bits in it, etc.

"For masked vector functions, the additional “mask” parameters are required.
For XMM, YMM1, and YMM2 ISA, each element of “mask” parameters has the data
type of the characteristic data type.
The number of mask parameters is the same as number of parameters required to
pass the vector of characteristic data type for the given vector length. For
example, with XMM targets, if characteristic data type is int and VLEN is 8,
two MI128 mask parameters are passed (see table 2). The value of a mask
parameter must be either bit patterns of all ones or all zeros for each
element.
For the MIC and ZMM ISA, mask arguments are passed in scalar (GPR) registers.
The mask parameters are collection of 1-bit masks in unsigned integers. The
total number of mask bits is equal to VLEN. The number of mask parameters is
equal to the number of parameters for the vector of characteristic data type.
The mask bits occupy the least significant bits of unsigned integer. For
example, if the characteristic data type is double and VLEN is 16, there are 16
mask bits stored in two unsigned integers. For ZMM ISA, if the characteristic
data type is char and VL is 64, there is one 64-bit unsigned integer parameter
for mask.
Mask parameters are passed after all other parameters in the same order of
parameters that they apply to.
For each element of the vector, if the corresponding mask value is zero, the
return value associated to that element is zero."

[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622

2025-11-20 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762

--- Comment #13 from Richard Biener  ---
I can reproduce a similar (ASLR dependent) issue with

make check-gcc
RUNTESTFLAGS="--target_board=unix/-march=znver4/--param=vect-partial-vector-usage=2
vect.exp=vect-simd-clone-20.c"

on x86_64:

/space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gcc.dg/vect/vect-simd-clone-20.c:24:1:
internal compiler error: tree check: expected vector_type, have integer_type in
TYPE_VECTOR_SUBPARTS, at tree.h:4368
0x37e157b internal_error(char const*, ...)
   
/space/rguenther/src/gcc-autopar_devel/gcc/diagnostic-global-context.cc:787
0x1e96386 tree_check_failed(tree_node const*, char const*, int, char const*,
...)
/space/rguenther/src/gcc-autopar_devel/gcc/tree.cc:9189
0xf74d9f tree_check(tree_node const*, char const*, int, char const*, tree_code)
/space/rguenther/src/gcc-autopar_devel/gcc/tree.h:4034
0x101780b TYPE_VECTOR_SUBPARTS(tree_node const*)
/space/rguenther/src/gcc-autopar_devel/gcc/tree.h:4368
0x1e00e83 vect_get_loop_mask(_loop_vec_info*, gimple_stmt_iterator*,
vec_loop_masks*, unsigned int, tree_node*, unsigned int)
/space/rguenther/src/gcc-autopar_devel/gcc/tree-vect-loop.cc:10590
0x1daec1d vectorizable_simd_clone_call
/space/rguenther/src/gcc-autopar_devel/gcc/tree-vect-stmts.cc:4887

[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622

2025-11-20 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762

--- Comment #12 from Richard Biener  ---
(In reply to Jakub Jelinek from comment #9)
> I believe there should be always at most one SIMD_CLONE_ARG_TYPE_MASK
> argument, so no idea why the num_mask_args code has been added.
> At least simd_clone_adjust_argument_types adds it either 0 times or once
> (for ->inbranch case).
> Based on ncopied it can (and often does) be turned into multiple arguments
> in various ABIs, at least for larger vectorization factors.
> Regarding SVE, I'm afraid I have no idea what the actual vector ABI is in
> that case (and I think OpenMP standard support for the variable length ABIs
> has only landed very recently).

Hm, seems I added this in r14-4629-g3179ad72f67f31.  For some reason I thought
that there might be multiple mask arguments.  Like when simdlen is very large,
256 for example, and so we'd have multiple args as you say.

The issue in the PR122736 case is the mismatch between the vector type used for
registering the loop mask use and the later query of the loop mask.  As the
latter is more obviously correct (mismatched types would ICE), I have adjusted
the former.  That place was last changed with r15-7095-g1dd79f44dfb64b by
Tamar.

[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622

2025-11-20 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762

--- Comment #11 from Richard Biener  ---
(In reply to Richard Sandiford from comment #10)
> +Andre since this is about SIMD clones.
> 
> (In reply to Richard Biener from comment #8)
> > simd_clone_adjust_sve_vector_type does
> > 
> > 30276 /* ??? This creates anonymous "SVE type" attributes for all types,
> > 30277even those that correspond to  types.  This affects
> > type
> > 30278compatibility in C/C++, but not in gimple.  (Gimple type
> > equivalence
> > 30279is instead decided by TARGET_COMPATIBLE_VECTOR_TYPES_P.)
> > 30280
> > (gdb) 
> > 30281Thus a C/C++ definition of the implementation function will
> > have a
> > 30282different function type from the declaration that this code
> > creates.
> > 30283However, it doesn't seem worth trying to fix that until we have
> > a
> > 30284way of handling implementations that operate on unpacked types.
> > */
> > 30285 type = build_distinct_type_copy (type);
> > 30286 aarch64_sve::add_sve_type_attribute (type, num_zr, num_pr, NULL,
> > NULL);
> > 
> > but at least the comment about GIMPLE type compatibility is wrong.
> 
> Why do you say that it's wrong?  AIUI (going only from the description) the
> ICE is caused by the fact that gimple (useless_type_conversion_p) does check
> TARGET_COMPATIBLE_VECTOR_TYPES_P.

The comment says it does _not_ affect gimple type compatibility.  Then of
course follows to say that's decided with that hook, but that hook checks
presence of the SVE type attribute.

> 
> As for why it does that, see the commit message for
> g:482b2b43e5101921ad94e51e052a18b353f8a3f5, which introduced the check.

Hm, but that talks about argument passing / return ABIs? 
useless_type_conversion_p applies to value types, at most its check
for FUNCTION_TYPE compatibility would need to use this check?

[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622

2025-11-20 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762

Richard Sandiford  changed:

   What|Removed |Added

 CC||avieira at gcc dot gnu.org

--- Comment #10 from Richard Sandiford  ---
+Andre since this is about SIMD clones.

(In reply to Richard Biener from comment #8)
> simd_clone_adjust_sve_vector_type does
> 
> 30276 /* ??? This creates anonymous "SVE type" attributes for all types,
> 30277even those that correspond to  types.  This affects
> type
> 30278compatibility in C/C++, but not in gimple.  (Gimple type
> equivalence
> 30279is instead decided by TARGET_COMPATIBLE_VECTOR_TYPES_P.)
> 30280
> (gdb) 
> 30281Thus a C/C++ definition of the implementation function will
> have a
> 30282different function type from the declaration that this code
> creates.
> 30283However, it doesn't seem worth trying to fix that until we have
> a
> 30284way of handling implementations that operate on unpacked types.
> */
> 30285 type = build_distinct_type_copy (type);
> 30286 aarch64_sve::add_sve_type_attribute (type, num_zr, num_pr, NULL,
> NULL);
> 
> but at least the comment about GIMPLE type compatibility is wrong.

Why do you say that it's wrong?  AIUI (going only from the description) the ICE
is caused by the fact that gimple (useless_type_conversion_p) does check
TARGET_COMPATIBLE_VECTOR_TYPES_P.

As for why it does that, see the commit message for
g:482b2b43e5101921ad94e51e052a18b353f8a3f5, which introduced the check.

[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622

2025-11-20 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762

--- Comment #9 from Jakub Jelinek  ---
I believe there should be always at most one SIMD_CLONE_ARG_TYPE_MASK argument,
so no idea why the num_mask_args code has been added.
At least simd_clone_adjust_argument_types adds it either 0 times or once (for
->inbranch case).
Based on ncopied it can (and often does) be turned into multiple arguments in
various ABIs, at least for larger vectorization factors.
Regarding SVE, I'm afraid I have no idea what the actual vector ABI is in that
case (and I think OpenMP standard support for the variable length ABIs has only
landed very recently).

[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622

2025-11-20 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762

Richard Biener  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org
  Component|tree-optimization   |target

--- Comment #8 from Richard Biener  ---
simd_clone_adjust_sve_vector_type does

30276 /* ??? This creates anonymous "SVE type" attributes for all types,
30277even those that correspond to  types.  This affects
type
30278compatibility in C/C++, but not in gimple.  (Gimple type
equivalence
30279is instead decided by TARGET_COMPATIBLE_VECTOR_TYPES_P.)
30280
(gdb) 
30281Thus a C/C++ definition of the implementation function will have a
30282different function type from the declaration that this code
creates.
30283However, it doesn't seem worth trying to fix that until we have a
30284way of handling implementations that operate on unpacked types. 
*/
30285 type = build_distinct_type_copy (type);
30286 aarch64_sve::add_sve_type_attribute (type, num_zr, num_pr, NULL,
NULL);

but at least the comment about GIMPLE type compatibility is wrong.

I believe this is really a target issue.  I can workaround this in the
vectorizer by re-building the simdclone argument vector (boolean) type
like with

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 748b3bcb0ab..7a062cc0634 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -4499,7 +4499,9 @@ vectorizable_simd_clone_call (vec_info *vinfo,
stmt_vec_info stmt_info,
(exact_div (bestn->simdclone->simdlen, num_mask_args),
 TYPE_MODE (bestn->simdclone->args[i].vector_type));
  else
-   arg_vectype = bestn->simdclone->args[i].vector_type;
+   arg_vectype = build_truth_vector_type_for_mode
+   (TYPE_VECTOR_SUBPARTS
(bestn->simdclone->args[i].vector_type),
+TYPE_MODE (bestn->simdclone->args[i].vector_type));
  vect_record_loop_mask (loop_vinfo,
 &LOOP_VINFO_MASKS (loop_vinfo),
 ncopies * num_mask_args, arg_vectype,

but I have no idea why the ICE is even dependent on ASLR, thus why we get
sometimes the mixup of SVE builtin annotated types and sometimes not.

I also have not traced down the reason for the num_mask_args thing (splitting
a mask into multiple arguments?) or why we got away with using 'vectype'
(the vector type of the result) instead of the vector type of the arguments.

The original reason for the change was that we ran into this with
V4DF / V8DF mixup for simd clone arguments vs. result.  The code seems
to handle concat/extract of pieces OK-ish in the end.

So maybe the change in the rev. was wrong, but it's now more in-line with
the assertion earlier when selecting 'bestn' with respect to
num_mask_args.

Test coverage is quite bad here.