[Bug middle-end/121509] [16 Regression] runtime differences with -march=znver3

2025-08-12 Thread dcb314 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121509

--- Comment #11 from David Binderman  ---
(In reply to Richard Biener from comment #8)
> Of course it requires a high level of obfuscation to get
> this past unrolling, VRP and other opts that would do this.

I find csmith very good indeed at finding optimiser bugs.
I can do about 35,000 runs an hour here.

Perhaps gcc optimizer testing can be extended to include
a reasonable number of csmith runs.

[Bug middle-end/121509] [16 Regression] runtime differences with -march=znver3

2025-08-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121509

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #10 from Richard Biener  ---
Fixed.

[Bug middle-end/121509] [16 Regression] runtime differences with -march=znver3

2025-08-12 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121509

--- Comment #9 from GCC Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:c5dab6fb402c93a92f6aa808c43956dfb9328190

commit r16-3162-gc5dab6fb402c93a92f6aa808c43956dfb9328190
Author: Richard Biener 
Date:   Tue Aug 12 09:51:54 2025 +0200

tree-optimization/121509 - failure to detect unvectorizable loop

With the hybrid stmt detection no longer working as a gate-keeper
to detect unhandled stmts we have to, and can, detect those earlier.
The appropriate place is vect_mark_stmts_to_be_vectorized where
for trivially relevant PHIs we can stop analyzing when the PHI
wasn't classified as a known def during vect_analyze_scalar_cycles.

PR tree-optimization/121509
* tree-vect-stmts.cc (vect_mark_stmts_to_be_vectorized):
Fail early when we detect a relevant but not handled PHI.

* gcc.dg/vect/pr121509.c: New testcase.

[Bug middle-end/121509] [16 Regression] runtime differences with -march=znver3

2025-08-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121509

Richard Biener  changed:

   What|Removed |Added

   Keywords|needs-reduction |missed-optimization

--- Comment #8 from Richard Biener  ---
This is also a missed optimization in final value replacement which fails to
replace the PHI.  Of course it requires a high level of obfuscation to get this
past unrolling, VRP and other opts that would do this.

Testcase:

long g_73[2] = {6L,6L};
int __GIMPLE (ssa,startwith("loop")) __attribute__((noipa))
foo ()
{
  signed char g;
  int l;
  int _1;
  unsigned char _3;
  unsigned char _4;

  __BB(2):
  goto __BB3;

  __BB(3,loop_header(1)):
  l_5 = __PHI (__BB2: _Literal (int) -511973466, __BB3: 1);
  g_6 = __PHI (__BB2: _Literal (signed char) 0, __BB3: g_12);
  _1 = (int) g_6;
  g_73[_1] = 0l;
  _3 = (unsigned char) g_6;
  _4 = _3 + _Literal (unsigned char) 1;
  g_12 = (signed char) _4;
  if (g_12 > _Literal (signed char) 1)
goto __BB4;
  else
goto __BB3;

  __BB(4):
  l_14 = __PHI (__BB3: l_5);
  return l_14;
}

int main()
{
  if (foo () != 1)
__builtin_abort ();
  return 0;
}

[Bug middle-end/121509] [16 Regression] runtime differences with -march=znver3

2025-08-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121509

Richard Biener  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2025-08-12
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #7 from Richard Biener  ---
Broken:

> ./xgcc -B. t.c -O2 -march=znver3 -w -fno-tree-slp-vectorize -fopt-info-vec
runData/keep/in.23654.c:402:26: optimized: loop vectorized using 16 byte
vectors and unroll factor 2
runData/keep/in.23654.c:661:40: optimized: loop vectorized using 32 byte
vectors and unroll factor 32
> ./a.out 
checksum = C6D41636

OK (revision reverted):

> ./xgcc -B. t.c -O2 -march=znver3 -w -fno-tree-slp-vectorize -fopt-info-vec
runData/keep/in.23654.c:661:40: optimized: loop vectorized using 32 byte
vectors and unroll factor 32
> ./a.out 
checksum = DFCC84BC

The revision can cause a loop to be vectorized when it was previously detected
to miss some required scalar stmts covered by SLP.

runData/keep/in.23654.c:402:26: note:   Analyze phi: l_2131_631 = PHI <1(229),
-511973466(61)>
runData/keep/in.23654.c:402:26: missed:   intermediate value used outside loop.
runData/keep/in.23654.c:402:26: missed:   Unknown def-use cycle pattern. 
...
runData/keep/in.23654.c:402:26: note:   === vect_detect_hybrid_slp ===
runData/keep/in.23654.c:402:26: note:   Processing hybrid candidate :
l_2131_631 = PHI <1(229), -511973466(61)>
runData/keep/in.23654.c:402:26: note:   Found loop_vect sink: l_2131_631 = PHI
<1(229), -511973466(61)>
runData/keep/in.23654.c:402:26: note:   Loop contains SLP and non-SLP stmts
runData/keep/in.23654.c:402:26: missed:  needs non-SLP handling

   [local count: 1204977]:
  # prephitmp_1189 = PHI <_440(47), prephitmp_1187(60)>

   [local count: 10644573]:
  # g_87.167_632 = PHI <_544(229), 0(61)>
  # l_2131_631 = PHI <1(229), -511973466(61)> 
  # ivtmp_717 = PHI 
  _536 = (int) g_87.167_632;
  g_73[_536] = 0;
  g_87.166_542 = (unsigned char) g_87.167_632;
  _543 = g_87.166_542 + 1;
  _544 = (signed char) _543;
  ivtmp_716 = ivtmp_717 - 1;
  if (ivtmp_716 != 0)
goto ; [94.50%]
  else
goto ; [5.50%]

   [local count: 10059121]:
  goto ; [100.00%]

   [local count: 1204977]:

[Bug middle-end/121509] [16 Regression] runtime differences with -march=znver3

2025-08-11 Thread dcb314 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121509

--- Comment #6 from David Binderman  ---
Created attachment 62104
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=62104&action=edit
C source code

Second test case:

foundBugs $ rm -f ./a.out && $CC -w -O2 bug1114B.c && ./a.out
checksum = B60777B8
foundBugs $ rm -f ./a.out && $CC -w -O2 -march=native bug1114B.c && ./a.out
checksum = 56A00110
foundBugs $

[Bug middle-end/121509] [16 Regression] runtime differences with -march=znver3

2025-08-11 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121509

Sam James  changed:

   What|Removed |Added

 CC||sjames at gcc dot gnu.org

--- Comment #5 from Sam James  ---
(In reply to David Binderman from comment #4)
> Created attachment 62101 [details]
> C source code
> 
> After more than two hours of reduction, I attach 
> the partially reduced code.

I don't think this reduction is right. Valgrind finds something at -O0 and
there's at least one issue with out of bounds in a loop.

I can have a look.

[Bug middle-end/121509] [16 Regression] runtime differences with -march=znver3

2025-08-11 Thread dcb314 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121509

--- Comment #4 from David Binderman  ---
Created attachment 62101
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=62101&action=edit
C source code

After more than two hours of reduction, I attach 
the partially reduced code.

[Bug middle-end/121509] [16 Regression] runtime differences with -march=znver3

2025-08-11 Thread dcb314 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121509

--- Comment #3 from David Binderman  ---
trunk $ git bisect bad df86ac52fccb2dec
df86ac52fccb2deccb53fb79f71db1fd700476bc is the first bad commit
commit df86ac52fccb2deccb53fb79f71db1fd700476bc (HEAD)
Author: Richard Biener 
Date:   Tue Aug 5 13:20:07 2025 +0200

Remove hybrid SLP detection

Reduction still running after an hour.

[Bug middle-end/121509] [16 Regression] runtime differences with -march=znver3

2025-08-11 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121509

Andrew Pinski  changed:

   What|Removed |Added

Summary|runtime differences with|[16 Regression] runtime
   |-march=znver3   |differences with
   ||-march=znver3
   Target Milestone|--- |16.0
   Keywords||wrong-code
  Component|c   |middle-end