[Bug middle-end/121509] [16 Regression] runtime differences with -march=znver3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121509 --- Comment #11 from David Binderman --- (In reply to Richard Biener from comment #8) > Of course it requires a high level of obfuscation to get > this past unrolling, VRP and other opts that would do this. I find csmith very good indeed at finding optimiser bugs. I can do about 35,000 runs an hour here. Perhaps gcc optimizer testing can be extended to include a reasonable number of csmith runs.
[Bug middle-end/121509] [16 Regression] runtime differences with -march=znver3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121509 Richard Biener changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #10 from Richard Biener --- Fixed.
[Bug middle-end/121509] [16 Regression] runtime differences with -march=znver3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121509 --- Comment #9 from GCC Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:c5dab6fb402c93a92f6aa808c43956dfb9328190 commit r16-3162-gc5dab6fb402c93a92f6aa808c43956dfb9328190 Author: Richard Biener Date: Tue Aug 12 09:51:54 2025 +0200 tree-optimization/121509 - failure to detect unvectorizable loop With the hybrid stmt detection no longer working as a gate-keeper to detect unhandled stmts we have to, and can, detect those earlier. The appropriate place is vect_mark_stmts_to_be_vectorized where for trivially relevant PHIs we can stop analyzing when the PHI wasn't classified as a known def during vect_analyze_scalar_cycles. PR tree-optimization/121509 * tree-vect-stmts.cc (vect_mark_stmts_to_be_vectorized): Fail early when we detect a relevant but not handled PHI. * gcc.dg/vect/pr121509.c: New testcase.
[Bug middle-end/121509] [16 Regression] runtime differences with -march=znver3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121509
Richard Biener changed:
What|Removed |Added
Keywords|needs-reduction |missed-optimization
--- Comment #8 from Richard Biener ---
This is also a missed optimization in final value replacement which fails to
replace the PHI. Of course it requires a high level of obfuscation to get this
past unrolling, VRP and other opts that would do this.
Testcase:
long g_73[2] = {6L,6L};
int __GIMPLE (ssa,startwith("loop")) __attribute__((noipa))
foo ()
{
signed char g;
int l;
int _1;
unsigned char _3;
unsigned char _4;
__BB(2):
goto __BB3;
__BB(3,loop_header(1)):
l_5 = __PHI (__BB2: _Literal (int) -511973466, __BB3: 1);
g_6 = __PHI (__BB2: _Literal (signed char) 0, __BB3: g_12);
_1 = (int) g_6;
g_73[_1] = 0l;
_3 = (unsigned char) g_6;
_4 = _3 + _Literal (unsigned char) 1;
g_12 = (signed char) _4;
if (g_12 > _Literal (signed char) 1)
goto __BB4;
else
goto __BB3;
__BB(4):
l_14 = __PHI (__BB3: l_5);
return l_14;
}
int main()
{
if (foo () != 1)
__builtin_abort ();
return 0;
}
[Bug middle-end/121509] [16 Regression] runtime differences with -march=znver3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121509 Richard Biener changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2025-08-12 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #7 from Richard Biener --- Broken: > ./xgcc -B. t.c -O2 -march=znver3 -w -fno-tree-slp-vectorize -fopt-info-vec runData/keep/in.23654.c:402:26: optimized: loop vectorized using 16 byte vectors and unroll factor 2 runData/keep/in.23654.c:661:40: optimized: loop vectorized using 32 byte vectors and unroll factor 32 > ./a.out checksum = C6D41636 OK (revision reverted): > ./xgcc -B. t.c -O2 -march=znver3 -w -fno-tree-slp-vectorize -fopt-info-vec runData/keep/in.23654.c:661:40: optimized: loop vectorized using 32 byte vectors and unroll factor 32 > ./a.out checksum = DFCC84BC The revision can cause a loop to be vectorized when it was previously detected to miss some required scalar stmts covered by SLP. runData/keep/in.23654.c:402:26: note: Analyze phi: l_2131_631 = PHI <1(229), -511973466(61)> runData/keep/in.23654.c:402:26: missed: intermediate value used outside loop. runData/keep/in.23654.c:402:26: missed: Unknown def-use cycle pattern. ... runData/keep/in.23654.c:402:26: note: === vect_detect_hybrid_slp === runData/keep/in.23654.c:402:26: note: Processing hybrid candidate : l_2131_631 = PHI <1(229), -511973466(61)> runData/keep/in.23654.c:402:26: note: Found loop_vect sink: l_2131_631 = PHI <1(229), -511973466(61)> runData/keep/in.23654.c:402:26: note: Loop contains SLP and non-SLP stmts runData/keep/in.23654.c:402:26: missed: needs non-SLP handling [local count: 1204977]: # prephitmp_1189 = PHI <_440(47), prephitmp_1187(60)> [local count: 10644573]: # g_87.167_632 = PHI <_544(229), 0(61)> # l_2131_631 = PHI <1(229), -511973466(61)> # ivtmp_717 = PHI _536 = (int) g_87.167_632; g_73[_536] = 0; g_87.166_542 = (unsigned char) g_87.167_632; _543 = g_87.166_542 + 1; _544 = (signed char) _543; ivtmp_716 = ivtmp_717 - 1; if (ivtmp_716 != 0) goto ; [94.50%] else goto ; [5.50%] [local count: 10059121]: goto ; [100.00%] [local count: 1204977]:
[Bug middle-end/121509] [16 Regression] runtime differences with -march=znver3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121509 --- Comment #6 from David Binderman --- Created attachment 62104 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=62104&action=edit C source code Second test case: foundBugs $ rm -f ./a.out && $CC -w -O2 bug1114B.c && ./a.out checksum = B60777B8 foundBugs $ rm -f ./a.out && $CC -w -O2 -march=native bug1114B.c && ./a.out checksum = 56A00110 foundBugs $
[Bug middle-end/121509] [16 Regression] runtime differences with -march=znver3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121509 Sam James changed: What|Removed |Added CC||sjames at gcc dot gnu.org --- Comment #5 from Sam James --- (In reply to David Binderman from comment #4) > Created attachment 62101 [details] > C source code > > After more than two hours of reduction, I attach > the partially reduced code. I don't think this reduction is right. Valgrind finds something at -O0 and there's at least one issue with out of bounds in a loop. I can have a look.
[Bug middle-end/121509] [16 Regression] runtime differences with -march=znver3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121509 --- Comment #4 from David Binderman --- Created attachment 62101 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=62101&action=edit C source code After more than two hours of reduction, I attach the partially reduced code.
[Bug middle-end/121509] [16 Regression] runtime differences with -march=znver3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121509 --- Comment #3 from David Binderman --- trunk $ git bisect bad df86ac52fccb2dec df86ac52fccb2deccb53fb79f71db1fd700476bc is the first bad commit commit df86ac52fccb2deccb53fb79f71db1fd700476bc (HEAD) Author: Richard Biener Date: Tue Aug 5 13:20:07 2025 +0200 Remove hybrid SLP detection Reduction still running after an hour.
[Bug middle-end/121509] [16 Regression] runtime differences with -march=znver3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121509 Andrew Pinski changed: What|Removed |Added Summary|runtime differences with|[16 Regression] runtime |-march=znver3 |differences with ||-march=znver3 Target Milestone|--- |16.0 Keywords||wrong-code Component|c |middle-end
