[Bug tree-optimization/121298] [16 Regression] Missed bswap and rotate detection from r16-2601-ge8a51144c02e1c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121298 --- Comment #4 from Richard Biener --- I think treating associatable operations in some systematic way would be good. If the whole chain is handled the order usually does not matter, but if only parts of the chain can be combined and recognized this results in missed optimizations otherwise. Better handling of a larger toplevel operation, recognizing a sub-part of it is a bswap and re-materializing the remaining pieces would be another way of viewing this I guess. But I guess we were just lucky before the reassoc change and bswap should be free to re-associate to make a bswap match.
[Bug tree-optimization/121298] [16 Regression] Missed bswap and rotate detection from r16-2601-ge8a51144c02e1c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121298 --- Comment #3 from Jakub Jelinek --- This was quite ugly already before that change. What the whole operation does is in bswap terminology 2 1 3 4 swap (i.e. lowest byte is the 2nd of val, then 1st, then 3rd, then 4th), which is something bswap pass doesn't handle. So it tried to bswap just the earlier stmts and there it before r16-2601 saw something which constructs 2 1 0 0 (i.e. 2nd byte of val, 1st byte of val, zero, zero) and decides to lower that to (__builtin_bswap32 (val) & 0x) r<< 16, wonder if that wouldn't be better written as simply (unsigned) (((unsigned short) val) r << 8), i.e. just bswap16 extended, that is also 3 gimple ops (cast, lrotate, cast) without a need for a fancy constant. Anyway, the way bswap pass works, it just analyzes everything and either it is something it decides to optimize and then it does, or not and then it punts. And it walks a bb backwards, trying to optimize each stmt. Now, with random reassociation of large | or ^ or + (for + guess only unsigned ones) operands this sometimes works and sometimes doesn't. Say if I want __builtin_bswap32 (x) | __builtin_bswap32 (y) and write it as ((x & 0xff) << 24) | ((x & 0xff00) << 8) | ((x & 0xff) >> 8) | ((x & 0xff00) >> 24)) | ((y & 0xff) << 24) | ((y & 0xff00) << 8) | ((y & 0xff) >> 8) | ((y & 0xff00) >> 24)) and it isn't reassociated from that form, it is recognized, while if it is ((x & 0xff) << 24) | ((y & 0xff) << 24) | ((x & 0xff00) << 8) | ((y & 0xff00) << 8) | ((x & 0xff) >> 8) | ((y & 0xff) >> 8) | ((x & 0xff00) >> 24)) | ((y & 0xff00) >> 24)) then it won't match. Shall we try to randomly reassociate toplevel |^ or unsigned +? I.e. if we see BIT_IOR_EXPR, BIT_XOR_EXPR or unsigned PLUS_EXPR, in similar way how reassoc collects all operands collect all operands (perhaps with some limit), run find_bswap_or_nop_1 on each leaf operand, sort them based on having a source_stmt non-NULL, precision and vuse and then in each group try to gradually perform_symbolic_merge + verify_symbolic_number_p if some of their combination makes sense and for each such combination note if find_bswap_or_nop_finalize or for bswap pass only also the if (!is_bswap_or_nop_p (tmp_n, cmpxchg, cmpnop, mask, bswap)) stuff after it moved into a helper function makes something recognizable and then pick the largest collection of arguments that can be turned into something recognizable? Plus some flag marking on stmts we've already walked that way, so that it doesn't have terrible compile time complexity.
[Bug tree-optimization/121298] [16 Regression] Missed bswap and rotate detection from r16-2601-ge8a51144c02e1c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121298 Jeffrey A. Law changed: What|Removed |Added Priority|P3 |P2
[Bug tree-optimization/121298] [16 Regression] Missed bswap and rotate detection from r16-2601-ge8a51144c02e1c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121298 Andrew Pinski changed: What|Removed |Added CC||ubizjak at gmail dot com --- Comment #2 from Andrew Pinski --- *** Bug 121517 has been marked as a duplicate of this bug. ***
[Bug tree-optimization/121298] [16 Regression] Missed bswap and rotate detection from r16-2601-ge8a51144c02e1c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121298
Richard Biener changed:
What|Removed |Added
Last reconfirmed||2025-07-29
Ever confirmed|0 |1
Status|UNCONFIRMED |NEW
--- Comment #1 from Richard Biener ---
find_bswap_or_nop_1 finds
_1 = val_7(D) & 4294901760;
and
$19 = {n = 0x4030102, type = ,
base_addr = , offset = , bytepos = {coeffs = {0}},
src = , alias_set = , vuse = ,
range = 4,
n_ops = 3}
which I think "just" lacks some extra processing to figure the HImode rotate
and the IOR with the remaining bits in the base. But the code is already
a bit messy. It's the bswap pass performing the transform in both cases
at the moment.
bswapdst_4 = __builtin_bswap32 (val_7(D));
bswapmaskdst_11 = bswapdst_4 & 4294901760;
_10 = bswapmaskdst_11 r<< 16;
_8 = _1 | _10;
Both testcases are actually the same btw, just associating bit-and and shift.
unsigned int bswap8 (unsigned int val)
{
return (val & 0x) | ((val & 0xff00) >> 8) | ((val & 0xff) << 8);
}
void ext(int x);
void foo(int x)
{
ext((x&~0x)|((x>>8)&0xff)|((x&0xff)<<8));
}
without the causing rev. we get the same result for the outer operation
but then the tail has the HImode rotate associated together. But this
is visible in the whole operation as well.
[Bug tree-optimization/121298] [16 Regression] Missed bswap and rotate detection from r16-2601-ge8a51144c02e1c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121298 Richard Biener changed: What|Removed |Added Keywords||missed-optimization Target Milestone|--- |16.0
