[Bug tree-optimization/121298] [16 Regression] Missed bswap and rotate detection from r16-2601-ge8a51144c02e1c

2025-12-03 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121298

--- Comment #4 from Richard Biener  ---
I think treating associatable operations in some systematic way would be good. 
If the whole chain is handled the order usually does not matter, but if only
parts of the chain can be combined and recognized this results in missed
optimizations otherwise.

Better handling of a larger toplevel operation, recognizing a sub-part of
it is a bswap and re-materializing the remaining pieces would be another
way of viewing this I guess.

But I guess we were just lucky before the reassoc change and bswap should
be free to re-associate to make a bswap match.

[Bug tree-optimization/121298] [16 Regression] Missed bswap and rotate detection from r16-2601-ge8a51144c02e1c

2025-12-01 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121298

--- Comment #3 from Jakub Jelinek  ---
This was quite ugly already before that change.
What the whole operation does is in bswap terminology 2 1 3 4 swap (i.e. lowest
byte is the 2nd of val, then 1st, then 3rd, then 4th), which is something bswap
pass doesn't handle.
So it tried to bswap just the earlier stmts and there it before r16-2601 saw
something
which constructs 2 1 0 0 (i.e. 2nd byte of val, 1st byte of val, zero, zero)
and decides to lower that to (__builtin_bswap32 (val) & 0x) r<< 16,
wonder if that
wouldn't be better written as simply (unsigned) (((unsigned short) val) r <<
8), i.e. just bswap16 extended, that is also 3 gimple ops (cast, lrotate, cast)
without a need for a fancy constant.

Anyway, the way bswap pass works, it just analyzes everything and either it is
something it decides to optimize and then it does, or not and then it punts. 
And it walks a bb backwards, trying to optimize each stmt.
Now, with random reassociation of large | or ^ or + (for + guess only unsigned
ones) operands this sometimes works and sometimes doesn't.
Say if I want __builtin_bswap32 (x) | __builtin_bswap32 (y) and write it as
((x & 0xff) << 24) | ((x & 0xff00) << 8) | ((x & 0xff) >> 8) | ((x &
0xff00) >> 24))
| ((y & 0xff) << 24) | ((y & 0xff00) << 8) | ((y & 0xff) >> 8) | ((y &
0xff00) >> 24))
and it isn't reassociated from that form, it is recognized, while if it is
((x & 0xff) << 24) | ((y & 0xff) << 24) | ((x & 0xff00) << 8) | ((y & 0xff00)
<< 8)
| ((x & 0xff) >> 8) | ((y & 0xff) >> 8) | ((x & 0xff00) >> 24)) |
((y & 0xff00) >> 24))
then it won't match.
Shall we try to randomly reassociate toplevel |^ or unsigned +?
I.e. if we see BIT_IOR_EXPR, BIT_XOR_EXPR or unsigned PLUS_EXPR, in similar way
how
reassoc collects all operands collect all operands (perhaps with some limit),
run 
find_bswap_or_nop_1 on each leaf operand, sort them based on having a
source_stmt non-NULL, precision and vuse and then in each group try to
gradually perform_symbolic_merge + verify_symbolic_number_p if some of their
combination makes sense and for each such combination note if
find_bswap_or_nop_finalize or for bswap pass only also the
if (!is_bswap_or_nop_p (tmp_n, cmpxchg, cmpnop, mask, bswap)) stuff after it
moved into a helper function makes something recognizable and then pick the
largest collection of arguments that can be turned into something recognizable?
 Plus some flag marking on stmts we've already walked that way, so that it
doesn't have terrible compile time complexity.

[Bug tree-optimization/121298] [16 Regression] Missed bswap and rotate detection from r16-2601-ge8a51144c02e1c

2025-11-24 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121298

Jeffrey A. Law  changed:

   What|Removed |Added

   Priority|P3  |P2

[Bug tree-optimization/121298] [16 Regression] Missed bswap and rotate detection from r16-2601-ge8a51144c02e1c

2025-08-11 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121298

Andrew Pinski  changed:

   What|Removed |Added

 CC||ubizjak at gmail dot com

--- Comment #2 from Andrew Pinski  ---
*** Bug 121517 has been marked as a duplicate of this bug. ***

[Bug tree-optimization/121298] [16 Regression] Missed bswap and rotate detection from r16-2601-ge8a51144c02e1c

2025-07-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121298

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2025-07-29
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #1 from Richard Biener  ---
find_bswap_or_nop_1 finds

_1 = val_7(D) & 4294901760;

and

$19 = {n = 0x4030102, type = , 
  base_addr = , offset = , bytepos = {coeffs = {0}}, 
  src = , alias_set = , vuse = ,
range = 4, 
  n_ops = 3}

which I think "just" lacks some extra processing to figure the HImode rotate
and the IOR with the remaining bits in the base.  But the code is already
a bit messy.  It's the bswap pass performing the transform in both cases
at the moment.

  bswapdst_4 = __builtin_bswap32 (val_7(D));
  bswapmaskdst_11 = bswapdst_4 & 4294901760;
  _10 = bswapmaskdst_11 r<< 16;
  _8 = _1 | _10;


Both testcases are actually the same btw, just associating bit-and and shift.

unsigned int bswap8 (unsigned int val)
{
  return (val & 0x) | ((val & 0xff00) >> 8) | ((val & 0xff) << 8);
}

void ext(int x);
void foo(int x)
{
ext((x&~0x)|((x>>8)&0xff)|((x&0xff)<<8));
}

without the causing rev. we get the same result for the outer operation
but then the tail has the HImode rotate associated together.  But this
is visible in the whole operation as well.

[Bug tree-optimization/121298] [16 Regression] Missed bswap and rotate detection from r16-2601-ge8a51144c02e1c

2025-07-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121298

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
   Target Milestone|--- |16.0