[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762 --- Comment #30 from Andrew Pinski --- > Similarly to before it's a bit non-deterministic for some reason. Yes it feels that way too. See PR 123773 which I filed for the same ICE. I noticed that the C++ testcase I added only fails with -g even.
[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Status|REOPENED|RESOLVED --- Comment #29 from Richard Biener --- (In reply to Tamar Christina from comment #28) > The ICE is back after re-landing of the original commit. Re-landing? What's the rev this appears at? > Though only affecting wrf now. > > module_cu_g3.fppized.f90: In function 'g3drv.constprop': > module_cu_g3.fppized.f90:9:4: internal compiler error: in prepare_vec_mask, > at tree-vect-stmts.cc:1605 > 9 |SUBROUTINE G3DRV(& > |^ > 0x1d53ff3 internal_error(char const*, ...) > > /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/diagnostic-global-context.cc:787 > 0x840e37 fancy_abort(char const*, int, char const*) > /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/diagnostics/context.cc:1805 > 0x10437eb prepare_vec_mask(_loop_vec_info*, tree_node*, tree_node*, > tree_node*, gimple_stmt_iterator*) > /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-stmts.cc:1605 > 0x1063ceb vectorizable_store > /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-stmts.cc:9261 > 0x1067fa3 vect_transform_stmt(vec_info*, _stmt_vec_info*, > gimple_stmt_iterator*, _slp_tree*, _slp_instance*) > /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-stmts.cc:13183 > 0x109c75b vect_schedule_slp_node > > Reducer: > > !GCC$ builtin (exp) attributes simd (notinbranch) > MODULE MODULE_CU_BMJ > INTEGER:: JTB > CONTAINS > SUBROUTINE BMJDRVRQVCUTEN > REAL, DIMENSION(JTB) :: THEOLD,TOLDY2T > DO KTH=1,KTHM > TH=TH+DTH > DENOM=TH > IF (DENOM>EPS) THEN >QS=EXP(0/DENOM) > ELSE >QS=0. > ENDIF > THEOLD(KTH)=EXP(ELOCP*QS) > ENDDO > CALL SPLINE > END > END > > Compiled with -Ofast -mcpu=neoverse-v2 > > Similarly to before it's a bit non-deterministic for some reason. Unfortunately to a point that a cross does not reproduce it for me. It also seems to be only tangentially related given it hits in vectorizable_store and not vectorizable_simd_clone_call so please track this in a new bug.
[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762 Richard Biener changed: What|Removed |Added Priority|P3 |P1
[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762 Tamar Christina changed: What|Removed |Added Status|RESOLVED|REOPENED Resolution|FIXED |--- --- Comment #28 from Tamar Christina --- The ICE is back after re-landing of the original commit. Though only affecting wrf now. module_cu_g3.fppized.f90: In function 'g3drv.constprop': module_cu_g3.fppized.f90:9:4: internal compiler error: in prepare_vec_mask, at tree-vect-stmts.cc:1605 9 |SUBROUTINE G3DRV(& |^ 0x1d53ff3 internal_error(char const*, ...) /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/diagnostic-global-context.cc:787 0x840e37 fancy_abort(char const*, int, char const*) /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/diagnostics/context.cc:1805 0x10437eb prepare_vec_mask(_loop_vec_info*, tree_node*, tree_node*, tree_node*, gimple_stmt_iterator*) /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-stmts.cc:1605 0x1063ceb vectorizable_store /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-stmts.cc:9261 0x1067fa3 vect_transform_stmt(vec_info*, _stmt_vec_info*, gimple_stmt_iterator*, _slp_tree*, _slp_instance*) /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-stmts.cc:13183 0x109c75b vect_schedule_slp_node Reducer: !GCC$ builtin (exp) attributes simd (notinbranch) MODULE MODULE_CU_BMJ INTEGER:: JTB CONTAINS SUBROUTINE BMJDRVRQVCUTEN REAL, DIMENSION(JTB) :: THEOLD,TOLDY2T DO KTH=1,KTHM TH=TH+DTH DENOM=TH IF (DENOM>EPS) THEN QS=EXP(0/DENOM) ELSE QS=0. ENDIF THEOLD(KTH)=EXP(ELOCP*QS) ENDDO CALL SPLINE END END Compiled with -Ofast -mcpu=neoverse-v2 Similarly to before it's a bit non-deterministic for some reason.
[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #27 from Richard Biener --- Should be fixed now.
[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762 --- Comment #26 from GCC Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:9dff0355c75d70d2a7583b626426599ee054406a commit r16-5491-g9dff0355c75d70d2a7583b626426599ee054406a Author: Richard Biener Date: Fri Nov 21 12:14:46 2025 +0100 Fix OMP SIMD clone mask register and query The following removes the confusion around num_mask_args that was added to properly "guess" the number of mask elements in a AVX512 mask that's just represented as int. The actual mistake lies in the mixup of 'ncopies' which is used to track the number of OMP SIMD calls to be emitted rather than the number of input vectors. So this reverts the earlier r16-5374-g5c2fdfc24e343c, uses the proper 'ncopies' for loop mask record/query and adjusts the guessing of the SIMD arg mask elements. PR tree-optimization/122762 PR tree-optimization/122736 PR tree-optimization/122790 * cgraph.h (cgraph_simd_clone_arg::linear_step): Document use for SIMD_CLONE_ARG_TYPE_MASK. * omp-simd-clone.cc (simd_clone_adjust_argument_types): Record the number of mask arguments in linear_step if mask_mode is not VOIDmode. * tree-vect-stmts.cc (vectorizable_simd_clone_call): Remove num_mask_args computation, use a proper ncopies to query/register loop masks, use linear_step for the number of mask arguments when determining the number of mask elements in a mask argument. * gcc.dg/vect/vect-simd-clone-23.c: New testcase.
[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762
--- Comment #25 from rguenther at suse dot de ---
On Thu, 20 Nov 2025, rsandifo at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762
>
> --- Comment #24 from Richard Sandiford ---
> (In reply to Richard Biener from comment #21)
> > (In reply to Richard Sandiford from comment #17)
> > > (In reply to Richard Biener from comment #11)
> > > > (In reply to Richard Sandiford from comment #10)
> > [...]
> > > > > As for why it does that, see the commit message for
> > > > > g:482b2b43e5101921ad94e51e052a18b353f8a3f5, which introduced the
> > > > > check.
> > > >
> > > > Hm, but that talks about argument passing / return ABIs?
> > > > useless_type_conversion_p applies to value types, at most its check
> > > > for FUNCTION_TYPE compatibility would need to use this check?
> > >
> > > Wouldn't that be bad for composability though? It would seem odd for "T1
> > > f()" vs "T2 g()" to impose stricter rules than T1 vs T2.
> >
> > Sure, but useless_type_conversion_p is generally the wrong place to address
> > ABI compatibility issues. What breaks when not having this check in
> > useless_type_conversion_p? The testcases added, if they break, would
> > point to places where we use this function wrongly. It's by far
> > "complete" to check whether two FUNCTION_TYPE have the same ABI.
>
> I think the FUNCTION_TYPE thing is a bit of a red herring. The main reason
> for
> the check is to preserve the types of vector values. That's because
> code-generation of calls depends on the gimple type of the value being passed,
> rather than the argument type recorded in the FUNCTION_TYPE. For example, see
> calls.cc:
>
> FOR_EACH_CALL_EXPR_ARG (arg, iter, exp)
> {
> tree argtype = TREE_TYPE (arg);
>
> if (targetm.calls.split_complex_arg…)
> …
> else
> args[j].tree_value = arg;
>
> as used in:
>
> for (argpos = 0; argpos < num_actuals; i--, argpos++)
> {
> tree type = TREE_TYPE (args[i].tree_value);
> …
> function_arg_info arg (type, argpos < n_named_args);
> if (pass_by_reference (args_so_far_pnt, arg))
> …
> }
>
> where the way that something is passed is keyed off the type of the gimple
> value.
Hmm, IMO that's broken by design.
> The FUNCTION_TYPE's TYPE_ARG_TYPES is only used to distinguish
> prototyped from unprototyped functions and to determine which arguments are
> named and which are varargs.
>
> And I think it has to be that way. The different passing of SVE vs. non-SVE
> types extends to varargs. If gimple didn't preserve the difference, then we
> wouldn't know which convention to use.
OK, for varargs I'd agree, but then this should ideally be nailed down
before or at gimplification, after that the actual value type shouldn't
come into play.
That this all makes two identical value types incompatible on GIMPLE
for SVE is quite unfortunate.
[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762
--- Comment #24 from Richard Sandiford ---
(In reply to Richard Biener from comment #21)
> (In reply to Richard Sandiford from comment #17)
> > (In reply to Richard Biener from comment #11)
> > > (In reply to Richard Sandiford from comment #10)
> [...]
> > > > As for why it does that, see the commit message for
> > > > g:482b2b43e5101921ad94e51e052a18b353f8a3f5, which introduced the check.
> > >
> > > Hm, but that talks about argument passing / return ABIs?
> > > useless_type_conversion_p applies to value types, at most its check
> > > for FUNCTION_TYPE compatibility would need to use this check?
> >
> > Wouldn't that be bad for composability though? It would seem odd for "T1
> > f()" vs "T2 g()" to impose stricter rules than T1 vs T2.
>
> Sure, but useless_type_conversion_p is generally the wrong place to address
> ABI compatibility issues. What breaks when not having this check in
> useless_type_conversion_p? The testcases added, if they break, would
> point to places where we use this function wrongly. It's by far
> "complete" to check whether two FUNCTION_TYPE have the same ABI.
I think the FUNCTION_TYPE thing is a bit of a red herring. The main reason for
the check is to preserve the types of vector values. That's because
code-generation of calls depends on the gimple type of the value being passed,
rather than the argument type recorded in the FUNCTION_TYPE. For example, see
calls.cc:
FOR_EACH_CALL_EXPR_ARG (arg, iter, exp)
{
tree argtype = TREE_TYPE (arg);
if (targetm.calls.split_complex_arg…)
…
else
args[j].tree_value = arg;
as used in:
for (argpos = 0; argpos < num_actuals; i--, argpos++)
{
tree type = TREE_TYPE (args[i].tree_value);
…
function_arg_info arg (type, argpos < n_named_args);
if (pass_by_reference (args_so_far_pnt, arg))
…
}
where the way that something is passed is keyed off the type of the gimple
value. The FUNCTION_TYPE's TYPE_ARG_TYPES is only used to distinguish
prototyped from unprototyped functions and to determine which arguments are
named and which are varargs.
And I think it has to be that way. The different passing of SVE vs. non-SVE
types extends to varargs. If gimple didn't preserve the difference, then we
wouldn't know which convention to use.
[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762
--- Comment #23 from Jakub Jelinek ---
And if you want to see it in vectorization action, perhaps
#pragma omp declare simd simdlen(64) inbranch
char foo (char x);
#pragma omp declare simd simdlen(64) inbranch
short qux (short x);
#pragma omp declare simd simdlen(64) inbranch
int bar (int x);
#pragma omp declare simd simdlen(32) inbranch
long long baz (long long x);
char a[1024], b[1024];
void
xyzzy ()
{
for (int i = 0; i < 1024; ++i)
if (a[i])
b[i] = foo (b[i]);
}
short c[1024], d[1024];
void
fred ()
{
for (int i = 0; i < 1024; ++i)
if (c[i])
d[i] = qux (d[i]);
}
int e[1024], f[1024];
void
garply ()
{
for (int i = 0; i < 1024; ++i)
if (e[i])
f[i] = bar (f[i]);
}
long long g[1024], h[1024];
void
corge ()
{
for (int i = 0; i < 1024; ++i)
if (g[i])
h[i] = baz (h[i]);
}
-O3 -mavx512f I'd say.
[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762
--- Comment #22 from Jakub Jelinek ---
Perhaps something similar to what is used e.g. for AVX512F gathers (those also
have integer masks).
Though, the ZMM simd clones are more complicated, because when the
characteristic data type is large, the mask can hold just a few bits, fewer
than the precision of the type.
Consider
#pragma omp declare simd simdlen(64) inbranch
char foo (char x) { return x; }
#pragma omp declare simd simdlen(64) inbranch
short qux (short x) { return x; }
#pragma omp declare simd simdlen(64) inbranch
int bar (int x) { return x; }
#pragma omp declare simd simdlen(32) inbranch
long long baz (long long x) { return x; }
The _ZGVe* clones will have:
for foo just a single 64-bit integer mask argument
for qux 2 32-bit integer mask arguments, each holding 32 bits
for bar 4 32-bit integer mask arguments, each holding just 16 bits
for baz 4 32-bit integer mask arguments, each holding 16 bits as well (for
simdlen(64)
baz we punt, too large simdlen, because for _ZGVb it would need 32 arguments
for x and that is more than the ABI specifies).
[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762 --- Comment #21 from Richard Biener --- (In reply to Richard Sandiford from comment #17) > (In reply to Richard Biener from comment #11) > > (In reply to Richard Sandiford from comment #10) [...] > > > As for why it does that, see the commit message for > > > g:482b2b43e5101921ad94e51e052a18b353f8a3f5, which introduced the check. > > > > Hm, but that talks about argument passing / return ABIs? > > useless_type_conversion_p applies to value types, at most its check > > for FUNCTION_TYPE compatibility would need to use this check? > > Wouldn't that be bad for composability though? It would seem odd for "T1 > f()" vs "T2 g()" to impose stricter rules than T1 vs T2. Sure, but useless_type_conversion_p is generally the wrong place to address ABI compatibility issues. What breaks when not having this check in useless_type_conversion_p? The testcases added, if they break, would point to places where we use this function wrongly. It's by far "complete" to check whether two FUNCTION_TYPE have the same ABI. > > @Richard Sandiford: Potentially silly question as you have probably > > thought of this, but I was reading through your patch 'Add a > > compatible_vector_types_p target hook' and I was wondering whether > > it wouldn't make more sense to implement the more profitable ABI > > by describing it as such. That is, get rid of this special 'SVE' > > type thing and mark all previous functions that used this type as > > 'vector_abi' (I think that's what we call it?) such that the types > > are all the same/equivalent, but we only differ in behaviour at > > ABI boundary where it matters? > > At least for SVE, “vector ABI” just controls the call-clobbered register > state. Whether something is “vector ABI” is determined by how its arguments > and return values are passed, rather than the other way around.
[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762
Richard Biener changed:
What|Removed |Added
Blocks||122736
--- Comment #20 from Richard Biener ---
(In reply to Jakub Jelinek from comment #19)
> (In reply to Richard Biener from comment #18)
> > so here we have a simdclone with 'unsigned int' vector_type for its
> > SIMD_CLONE_ARG_TYPE_MASK. That's unexpected.
>
> That is a normal expected case, that just means that the mask argument is
> passed
> as bits inside of unsigned int (ncopies unsigned int arguments in
> particular, each one
> holding simdlen / ncopies bits). See the above omp-simd-clone.cc code which
> sets it up that way.
Hmm. OK.
So then I suppose the confusion is with what data vector type to associate the
SIMD_CLONE_ARG_TYPE_MASK argument. As it seems that vector_type is not
useable as-is, at least for AVX512, we need some vector type to record/query
the loop mask for. On the use side we fail to AND the loop mask with any
.COND_CALL (...), that's wrong-code, but for a regular call masked loop we do
tree masktype = bestn->simdclone->args[mask_i].vector_type;
if (SCALAR_INT_MODE_P (bestn->simdclone->mask_mode))
/* Guess the number of lanes represented by masktype. */
callee_nelements = exact_div (bestn->simdclone->simdlen,
bestn->simdclone->nargs - nargs);
else
callee_nelements = TYPE_VECTOR_SUBPARTS (masktype);
o = vector_unroll_factor (nunits, callee_nelements);
for (m = j * o; m < (j + 1) * o; m++)
{
if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
{
vec_loop_masks *loop_masks = &LOOP_VINFO_MASKS (loop_vinfo);
mask = vect_get_loop_mask (loop_vinfo, gsi, loop_masks,
ncopies, masktype, j);
for an integer masktype this call will ICE. Originally on the mask
recording side we used 'vectype', so possibly that would be correct here,
too. This is possibly the base_type in the simdclone code you quote.
Unfortunately this doesn't allow me to revert r16-5374-g5c2fdfc24e343c
without re-introducing PR122736. *sigh*
But it would fix the gcc.dg/vect/vect-simd-clone-20.c ICE.
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 748b3bcb0ab..627d940c023 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -4885,7 +4885,7 @@ vectorizable_simd_clone_call (vec_info *vinfo,
stmt_vec_info stmt_info,
{
vec_loop_masks *loop_masks = &LOOP_VINFO_MASKS (loop_vinfo);
mask = vect_get_loop_mask (loop_vinfo, gsi, loop_masks,
-ncopies, masktype, j);
+ncopies, vectype, j);
}
else
mask = vect_build_all_ones_mask (vinfo, stmt_info, masktype);
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122736
[Bug 122736] loop masking with SIMD clones ICEs in vect_verify_full_masking
[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762 --- Comment #19 from Jakub Jelinek --- (In reply to Richard Biener from comment #18) > so here we have a simdclone with 'unsigned int' vector_type for its > SIMD_CLONE_ARG_TYPE_MASK. That's unexpected. That is a normal expected case, that just means that the mask argument is passed as bits inside of unsigned int (ncopies unsigned int arguments in particular, each one holding simdlen / ncopies bits). See the above omp-simd-clone.cc code which sets it up that way.
[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762 --- Comment #18 from Richard Biener --- (In reply to Richard Biener from comment #13) > I can reproduce a similar (ASLR dependent) issue with > > make check-gcc > RUNTESTFLAGS="--target_board=unix/-march=znver4/--param=vect-partial-vector- > usage=2 vect.exp=vect-simd-clone-20.c" > > on x86_64: > > /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gcc.dg/vect/vect-simd- > clone-20.c:24:1: internal compiler error: tree check: expected vector_type, > have integer_type in TYPE_VECTOR_SUBPARTS, at tree.h:4368 > 0x37e157b internal_error(char const*, ...) > > /space/rguenther/src/gcc-autopar_devel/gcc/diagnostic-global-context.cc:787 > 0x1e96386 tree_check_failed(tree_node const*, char const*, int, char const*, > ...) > /space/rguenther/src/gcc-autopar_devel/gcc/tree.cc:9189 > 0xf74d9f tree_check(tree_node const*, char const*, int, char const*, > tree_code) > /space/rguenther/src/gcc-autopar_devel/gcc/tree.h:4034 > 0x101780b TYPE_VECTOR_SUBPARTS(tree_node const*) > /space/rguenther/src/gcc-autopar_devel/gcc/tree.h:4368 > 0x1e00e83 vect_get_loop_mask(_loop_vec_info*, gimple_stmt_iterator*, > vec_loop_masks*, unsigned int, tree_node*, unsigned int) > /space/rguenther/src/gcc-autopar_devel/gcc/tree-vect-loop.cc:10590 > 0x1daec1d vectorizable_simd_clone_call > /space/rguenther/src/gcc-autopar_devel/gcc/tree-vect-stmts.cc:4887 so here we have a simdclone with 'unsigned int' vector_type for its SIMD_CLONE_ARG_TYPE_MASK. That's unexpected. One issue is that we do not inspect all simdclone args when not wrapped in a .MASK_CALL, but for loop masking we'd still need to do that. The logic also seems to expect a 1:1 relationship between scalar call number of args and simdclone number of args (not allowing for larger simdlen with multiple args for an original scalar arg). I wonder whether it's worth special-casing that we are going to always be able to code generate the all-true mask when masking is not necessary but we chose an in-branch simdclone. Otherwise we could selectively disable loop masking when the ARG_TYPE_MASK vectype isn't a vector type.
[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762 --- Comment #17 from Richard Sandiford --- (In reply to Richard Biener from comment #11) > (In reply to Richard Sandiford from comment #10) > > +Andre since this is about SIMD clones. > > > > (In reply to Richard Biener from comment #8) > > > simd_clone_adjust_sve_vector_type does > > > > > > 30276 /* ??? This creates anonymous "SVE type" attributes for all > > > types, > > > 30277even those that correspond to types. This > > > affects > > > type > > > 30278compatibility in C/C++, but not in gimple. (Gimple type > > > equivalence > > > 30279is instead decided by TARGET_COMPATIBLE_VECTOR_TYPES_P.) > > > 30280 > > > (gdb) > > > 30281Thus a C/C++ definition of the implementation function will > > > have a > > > 30282different function type from the declaration that this code > > > creates. > > > 30283However, it doesn't seem worth trying to fix that until we > > > have > > > a > > > 30284way of handling implementations that operate on unpacked > > > types. > > > */ > > > 30285 type = build_distinct_type_copy (type); > > > 30286 aarch64_sve::add_sve_type_attribute (type, num_zr, num_pr, NULL, > > > NULL); > > > > > > but at least the comment about GIMPLE type compatibility is wrong. > > > > Why do you say that it's wrong? AIUI (going only from the description) the > > ICE is caused by the fact that gimple (useless_type_conversion_p) does check > > TARGET_COMPATIBLE_VECTOR_TYPES_P. > > The comment says it does _not_ affect gimple type compatibility. Then of > course follows to say that's decided with that hook, but that hook checks > presence of the SVE type attribute. Ah, I see. I thought you meant that the bit in brackets was wrong. What the wider comment means is that the SVE type attribute created here is anonymous, whereas the SVE type attributes created for the “real” built-in SVE ACLE types are not anonymous. This would make the types created here incompatible in C/C++ terms with the built-in SVE types. For example, in C/C++ terms, a scalable vector of int32_ts created by this code would not be compatible with the built-in svint32_t type. But the two types would be compatible in gimple terms, because gimple only cares about the presence or absence of the SVE attribute, not what the names are. This is why creating a “proper” SVE type attribute wasn't worth the effort. In other words, the wider comment is describing compatibility rules between two SVE types, whereas in this PR the issue is compatiblity between SVE and non-SVE types. SVE and non-SVE types are incompatible in both C/C++ and gimple. > > As for why it does that, see the commit message for > > g:482b2b43e5101921ad94e51e052a18b353f8a3f5, which introduced the check. > > Hm, but that talks about argument passing / return ABIs? > useless_type_conversion_p applies to value types, at most its check > for FUNCTION_TYPE compatibility would need to use this check? Wouldn't that be bad for composability though? It would seem odd for "T1 f()" vs "T2 g()" to impose stricter rules than T1 vs T2. > @Richard Sandiford: Potentially silly question as you have probably > thought of this, but I was reading through your patch 'Add a > compatible_vector_types_p target hook' and I was wondering whether > it wouldn't make more sense to implement the more profitable ABI > by describing it as such. That is, get rid of this special 'SVE' > type thing and mark all previous functions that used this type as > 'vector_abi' (I think that's what we call it?) such that the types > are all the same/equivalent, but we only differ in behaviour at > ABI boundary where it matters? At least for SVE, “vector ABI” just controls the call-clobbered register state. Whether something is “vector ABI” is determined by how its arguments and return values are passed, rather than the other way around.
[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762
--- Comment #16 from Jakub Jelinek ---
Here is the omp-simd-clone.cc code to create the mask types and arguments,
which should hopefully show up everything that needs to be matched on the
vectorizer side:
tree base_type = simd_clone_compute_base_data_type (sc->origin, sc);
tree mask_type;
if (INTEGRAL_TYPE_P (base_type) || POINTER_TYPE_P (base_type))
veclen = sc->vecsize_int;
else
veclen = sc->vecsize_float;
if (known_eq (veclen, 0U))
veclen = sc->simdlen;
else
veclen = exact_div (veclen,
GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type)));
if (multiple_p (veclen, sc->simdlen))
veclen = sc->simdlen;
if (sc->mask_mode != VOIDmode)
mask_type
= lang_hooks.types.type_for_mode (sc->mask_mode, 1);
else if (POINTER_TYPE_P (base_type))
mask_type = build_vector_type (pointer_sized_int_node, veclen);
else
mask_type = build_vector_type (base_type, veclen);
k = vector_unroll_factor (sc->simdlen, veclen);
/* We have previously allocated one extra entry for the mask. Use
it and fill it. */
sc->nargs++;
if (sc->mask_mode != VOIDmode)
base_type = boolean_type_node;
if (node->definition)
{
sc->args[i].orig_arg
= build_decl (UNKNOWN_LOCATION, PARM_DECL, NULL, base_type);
if (sc->mask_mode == VOIDmode)
sc->args[i].simd_array
= create_tmp_simd_array ("mask", base_type, sc->simdlen);
else if (k > 1)
sc->args[i].simd_array
= create_tmp_simd_array ("mask", mask_type, k);
else
sc->args[i].simd_array = NULL_TREE;
}
sc->args[i].orig_type = base_type;
sc->args[i].arg_type = SIMD_CLONE_ARG_TYPE_MASK;
sc->args[i].vector_type = mask_type;
The target hook can set sc->mask_mode to non-VOIDmode for the x86 ZMM like
cases, i.e. scalar bitmask argument (or scalar bitmasks),
VOIDmode for a normal vector of the characteristic data type. The first line
computes the characteristic data type.
Then veclen is either sc->simdlen, or sc->simdlen is some integral multiple of
veclen, k is what will be ncopies on the vectorizer side.
The vector_type then is some scalar or vector type handling veclen masks and
there will be k arguments of this kind in the end,
but just one SIMD_CLONE_ARG_TYPE_MASK covering them all.
[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762 --- Comment #15 from avieira at gcc dot gnu.org --- @Richard Sandiford: Potentially silly question as you have probably thought of this, but I was reading through your patch 'Add a compatible_vector_types_p target hook' and I was wondering whether it wouldn't make more sense to implement the more profitable ABI by describing it as such. That is, get rid of this special 'SVE' type thing and mark all previous functions that used this type as 'vector_abi' (I think that's what we call it?) such that the types are all the same/equivalent, but we only differ in behaviour at ABI boundary where it matters? @Jakub: I have to admit I have cleared my cache of this piece of work but I assume that the reason num_args_mask is there is because we don't have modes that represent 'ncopies' of masks. So I'm not sure how that'd work? Use BLK and set the right size in tree ? Anyway it's been too long.
[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762 --- Comment #14 from Jakub Jelinek --- (In reply to Richard Biener from comment #12) > (In reply to Jakub Jelinek from comment #9) > > I believe there should be always at most one SIMD_CLONE_ARG_TYPE_MASK > > argument, so no idea why the num_mask_args code has been added. > > At least simd_clone_adjust_argument_types adds it either 0 times or once > > (for ->inbranch case). > > Based on ncopied it can (and often does) be turned into multiple arguments > > in various ABIs, at least for larger vectorization factors. > > Regarding SVE, I'm afraid I have no idea what the actual vector ABI is in > > that case (and I think OpenMP standard support for the variable length ABIs > > has only landed very recently). > > Hm, seems I added this in r14-4629-g3179ad72f67f31. For some reason I > thought that there might be multiple mask arguments. Like when simdlen is > very large, > 256 for example, and so we'd have multiple args as you say. In the end there can be many mask arguments, but I believe there should be just one SIMD_CLONE_ARG_TYPE_MASK, that is basically a made up scalar argument which can then map to multiple vectorized arguments, like a simple int etc. can as well, say for AVX512F versions for simdlen 16 int can be passed in a single V16SImode argument, but for simdlen 32 it needs 2 V16SImode arguments. Similarly in the same case mask would be for simdlen 16 a single integer with 16 bits in it, etc. "For masked vector functions, the additional “mask” parameters are required. For XMM, YMM1, and YMM2 ISA, each element of “mask” parameters has the data type of the characteristic data type. The number of mask parameters is the same as number of parameters required to pass the vector of characteristic data type for the given vector length. For example, with XMM targets, if characteristic data type is int and VLEN is 8, two MI128 mask parameters are passed (see table 2). The value of a mask parameter must be either bit patterns of all ones or all zeros for each element. For the MIC and ZMM ISA, mask arguments are passed in scalar (GPR) registers. The mask parameters are collection of 1-bit masks in unsigned integers. The total number of mask bits is equal to VLEN. The number of mask parameters is equal to the number of parameters for the vector of characteristic data type. The mask bits occupy the least significant bits of unsigned integer. For example, if the characteristic data type is double and VLEN is 16, there are 16 mask bits stored in two unsigned integers. For ZMM ISA, if the characteristic data type is char and VL is 64, there is one 64-bit unsigned integer parameter for mask. Mask parameters are passed after all other parameters in the same order of parameters that they apply to. For each element of the vector, if the corresponding mask value is zero, the return value associated to that element is zero."
[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762 --- Comment #13 from Richard Biener --- I can reproduce a similar (ASLR dependent) issue with make check-gcc RUNTESTFLAGS="--target_board=unix/-march=znver4/--param=vect-partial-vector-usage=2 vect.exp=vect-simd-clone-20.c" on x86_64: /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gcc.dg/vect/vect-simd-clone-20.c:24:1: internal compiler error: tree check: expected vector_type, have integer_type in TYPE_VECTOR_SUBPARTS, at tree.h:4368 0x37e157b internal_error(char const*, ...) /space/rguenther/src/gcc-autopar_devel/gcc/diagnostic-global-context.cc:787 0x1e96386 tree_check_failed(tree_node const*, char const*, int, char const*, ...) /space/rguenther/src/gcc-autopar_devel/gcc/tree.cc:9189 0xf74d9f tree_check(tree_node const*, char const*, int, char const*, tree_code) /space/rguenther/src/gcc-autopar_devel/gcc/tree.h:4034 0x101780b TYPE_VECTOR_SUBPARTS(tree_node const*) /space/rguenther/src/gcc-autopar_devel/gcc/tree.h:4368 0x1e00e83 vect_get_loop_mask(_loop_vec_info*, gimple_stmt_iterator*, vec_loop_masks*, unsigned int, tree_node*, unsigned int) /space/rguenther/src/gcc-autopar_devel/gcc/tree-vect-loop.cc:10590 0x1daec1d vectorizable_simd_clone_call /space/rguenther/src/gcc-autopar_devel/gcc/tree-vect-stmts.cc:4887
[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762 --- Comment #12 from Richard Biener --- (In reply to Jakub Jelinek from comment #9) > I believe there should be always at most one SIMD_CLONE_ARG_TYPE_MASK > argument, so no idea why the num_mask_args code has been added. > At least simd_clone_adjust_argument_types adds it either 0 times or once > (for ->inbranch case). > Based on ncopied it can (and often does) be turned into multiple arguments > in various ABIs, at least for larger vectorization factors. > Regarding SVE, I'm afraid I have no idea what the actual vector ABI is in > that case (and I think OpenMP standard support for the variable length ABIs > has only landed very recently). Hm, seems I added this in r14-4629-g3179ad72f67f31. For some reason I thought that there might be multiple mask arguments. Like when simdlen is very large, 256 for example, and so we'd have multiple args as you say. The issue in the PR122736 case is the mismatch between the vector type used for registering the loop mask use and the later query of the loop mask. As the latter is more obviously correct (mismatched types would ICE), I have adjusted the former. That place was last changed with r15-7095-g1dd79f44dfb64b by Tamar.
[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762 --- Comment #11 from Richard Biener --- (In reply to Richard Sandiford from comment #10) > +Andre since this is about SIMD clones. > > (In reply to Richard Biener from comment #8) > > simd_clone_adjust_sve_vector_type does > > > > 30276 /* ??? This creates anonymous "SVE type" attributes for all types, > > 30277even those that correspond to types. This affects > > type > > 30278compatibility in C/C++, but not in gimple. (Gimple type > > equivalence > > 30279is instead decided by TARGET_COMPATIBLE_VECTOR_TYPES_P.) > > 30280 > > (gdb) > > 30281Thus a C/C++ definition of the implementation function will > > have a > > 30282different function type from the declaration that this code > > creates. > > 30283However, it doesn't seem worth trying to fix that until we have > > a > > 30284way of handling implementations that operate on unpacked types. > > */ > > 30285 type = build_distinct_type_copy (type); > > 30286 aarch64_sve::add_sve_type_attribute (type, num_zr, num_pr, NULL, > > NULL); > > > > but at least the comment about GIMPLE type compatibility is wrong. > > Why do you say that it's wrong? AIUI (going only from the description) the > ICE is caused by the fact that gimple (useless_type_conversion_p) does check > TARGET_COMPATIBLE_VECTOR_TYPES_P. The comment says it does _not_ affect gimple type compatibility. Then of course follows to say that's decided with that hook, but that hook checks presence of the SVE type attribute. > > As for why it does that, see the commit message for > g:482b2b43e5101921ad94e51e052a18b353f8a3f5, which introduced the check. Hm, but that talks about argument passing / return ABIs? useless_type_conversion_p applies to value types, at most its check for FUNCTION_TYPE compatibility would need to use this check?
[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762 Richard Sandiford changed: What|Removed |Added CC||avieira at gcc dot gnu.org --- Comment #10 from Richard Sandiford --- +Andre since this is about SIMD clones. (In reply to Richard Biener from comment #8) > simd_clone_adjust_sve_vector_type does > > 30276 /* ??? This creates anonymous "SVE type" attributes for all types, > 30277even those that correspond to types. This affects > type > 30278compatibility in C/C++, but not in gimple. (Gimple type > equivalence > 30279is instead decided by TARGET_COMPATIBLE_VECTOR_TYPES_P.) > 30280 > (gdb) > 30281Thus a C/C++ definition of the implementation function will > have a > 30282different function type from the declaration that this code > creates. > 30283However, it doesn't seem worth trying to fix that until we have > a > 30284way of handling implementations that operate on unpacked types. > */ > 30285 type = build_distinct_type_copy (type); > 30286 aarch64_sve::add_sve_type_attribute (type, num_zr, num_pr, NULL, > NULL); > > but at least the comment about GIMPLE type compatibility is wrong. Why do you say that it's wrong? AIUI (going only from the description) the ICE is caused by the fact that gimple (useless_type_conversion_p) does check TARGET_COMPATIBLE_VECTOR_TYPES_P. As for why it does that, see the commit message for g:482b2b43e5101921ad94e51e052a18b353f8a3f5, which introduced the check.
[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762 --- Comment #9 from Jakub Jelinek --- I believe there should be always at most one SIMD_CLONE_ARG_TYPE_MASK argument, so no idea why the num_mask_args code has been added. At least simd_clone_adjust_argument_types adds it either 0 times or once (for ->inbranch case). Based on ncopied it can (and often does) be turned into multiple arguments in various ABIs, at least for larger vectorization factors. Regarding SVE, I'm afraid I have no idea what the actual vector ABI is in that case (and I think OpenMP standard support for the variable length ABIs has only landed very recently).
[Bug target/122762] [16 Regression] various SPECCPU 2017 benchmark ICEs with in prepare_vec_mask, at tree-vect-stmts.cc:1622
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122762 Richard Biener changed: What|Removed |Added CC||jakub at gcc dot gnu.org Component|tree-optimization |target --- Comment #8 from Richard Biener --- simd_clone_adjust_sve_vector_type does 30276 /* ??? This creates anonymous "SVE type" attributes for all types, 30277even those that correspond to types. This affects type 30278compatibility in C/C++, but not in gimple. (Gimple type equivalence 30279is instead decided by TARGET_COMPATIBLE_VECTOR_TYPES_P.) 30280 (gdb) 30281Thus a C/C++ definition of the implementation function will have a 30282different function type from the declaration that this code creates. 30283However, it doesn't seem worth trying to fix that until we have a 30284way of handling implementations that operate on unpacked types. */ 30285 type = build_distinct_type_copy (type); 30286 aarch64_sve::add_sve_type_attribute (type, num_zr, num_pr, NULL, NULL); but at least the comment about GIMPLE type compatibility is wrong. I believe this is really a target issue. I can workaround this in the vectorizer by re-building the simdclone argument vector (boolean) type like with diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 748b3bcb0ab..7a062cc0634 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -4499,7 +4499,9 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info, (exact_div (bestn->simdclone->simdlen, num_mask_args), TYPE_MODE (bestn->simdclone->args[i].vector_type)); else - arg_vectype = bestn->simdclone->args[i].vector_type; + arg_vectype = build_truth_vector_type_for_mode + (TYPE_VECTOR_SUBPARTS (bestn->simdclone->args[i].vector_type), +TYPE_MODE (bestn->simdclone->args[i].vector_type)); vect_record_loop_mask (loop_vinfo, &LOOP_VINFO_MASKS (loop_vinfo), ncopies * num_mask_args, arg_vectype, but I have no idea why the ICE is even dependent on ASLR, thus why we get sometimes the mixup of SVE builtin annotated types and sometimes not. I also have not traced down the reason for the num_mask_args thing (splitting a mask into multiple arguments?) or why we got away with using 'vectype' (the vector type of the result) instead of the vector type of the arguments. The original reason for the change was that we ran into this with V4DF / V8DF mixup for simd clone arguments vs. result. The code seems to handle concat/extract of pieces OK-ish in the end. So maybe the change in the rev. was wrong, but it's now more in-line with the assertion earlier when selecting 'bestn' with respect to num_mask_args. Test coverage is quite bad here.
