[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481 --- Comment #42 from Xi Ruoyao --- The LoongArch bitreversehi2 somehow evades (maybe happens to evade?) the extra extension. It performs a bitreverse operation in full length GPR (i.e. 64-bit bitrev.d is used instead of 32-bit bitrev.w) first, then shifts the result, and finally sets SRP_UNSIGNED.
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481 --- Comment #41 from Jakub Jelinek --- Though bitreverse itself needs all bits, it is just the sequence of bitreverse + right shift with the needed shift count that doesn't. Though it is true that it is the case on all arches, so if it could be done in ext_dce or somewhere else generically, then it doesn't need to be peephole2 handled on various arches. Note, bswap actually has a similar behavior, except that it doesn't exist for QImode and for HImode it is usually just a rotate, so most targets don't expand it as bswapsi >> 16. So only relevant for targets which only have bswapdi and not bswapsi...
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481 --- Comment #40 from Drea Pinski --- (In reply to Roger Sayle from comment #36) > I'm guessing that either expanding via a paradoxical SUBREG, or a currently > missing RTL simplification (or a backend define_insn_and_split or peephole2) > should be able to remove the (I believe) unnecessary zero_extend. I would have thought maybe ext_dce might help here. Maybe it needs to be taught about bitreverse ...
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481
--- Comment #39 from Jakub Jelinek ---
Untested
--- gcc/config/aarch64/aarch64.md.jj2026-05-23 01:23:05.065826858 +0200
+++ gcc/config/aarch64/aarch64.md 2026-05-23 11:59:11.578144774 +0200
@@ -5790,6 +5790,48 @@ (define_expand "bitreverse2"
(bitreverse:GPI (match_operand:GPI 1 "register_operand")))]
)
+;; Peephole2s to get rid of useless zero extension from
+;; __builtin_bitreverse{8,16}.
+(define_peephole2
+ [(set (match_operand:SI 0 "register_operand")
+ (zero_extend:SI (match_operand:SHORT 1 "register_operand")))
+ (set (match_operand:SI 2 "register_operand")
+ (bitreverse:SI (match_dup 0)))
+ (set (match_operand:SI 3 "register_operand")
+ (lshiftrt:SI (match_dup 2) (match_operand:SI 4 "const_int_operand")))]
+ "INTVAL (operands[4]) == 32 - GET_MODE_BITSIZE (mode)
+ && (REGNO (operands[0]) == REGNO (operands[2])
+ || peep2_reg_dead_p (2, operands[0]))
+ && (REGNO (operands[3]) == REGNO (operands[2])
+ || peep2_reg_dead_p (3, operands[2]))"
+ [(set (match_dup 2) (bitreverse:SI (match_dup 1)))
+ (set (match_dup 3) (lshiftrt:SI (match_dup 2) (match_dup 4)))]
+ {
+operands[1] = gen_lowpart (SImode, operands[1]);
+ }
+)
+
+(define_peephole2
+ [(set (match_operand:SI 0 "register_operand")
+ (zero_extend:SI (match_operand:SHORT 1 "register_operand")))
+ (set (match_operand:SI 2 "register_operand")
+ (bitreverse:SI (match_dup 0)))
+ (set (match_operand:DI 3 "register_operand")
+ (zero_extend:DI (lshiftrt:SI (match_dup 2)
+(match_operand:SI 4
"const_int_operand"]
+ "INTVAL (operands[4]) == 32 - GET_MODE_BITSIZE (mode)
+ && (REGNO (operands[0]) == REGNO (operands[2])
+ || peep2_reg_dead_p (2, operands[0]))
+ && (REGNO (operands[3]) == REGNO (operands[2])
+ || peep2_reg_dead_p (3, operands[2]))"
+ [(set (match_dup 2) (bitreverse:SI (match_dup 1)))
+ (set (match_dup 3) (zero_extend:DI (lshiftrt:SI (match_dup 2)
+ (match_dup 4]
+ {
+operands[1] = gen_lowpart (SImode, operands[1]);
+ }
+)
+
(define_expand "ffs2"
[(match_operand:GPI 0 "register_operand")
(match_operand:GPI 1 "register_operand")]
removes the zero extensions from
unsigned char foo (unsigned char x) { return __builtin_bitreverse8 (x); }
unsigned short bar (unsigned short x) { return __builtin_bitreverse16 (x); }
unsigned char a;
void baz (unsigned char x) { a = __builtin_bitreverse8 (x); }
unsigned short b;
void qux (unsigned short x) { b = __builtin_bitreverse16 (x); }
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481 --- Comment #38 from Jakub Jelinek --- define_peephole2 IMHO should be able to fix this up, but I have no idea what else could do that.
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481 --- Comment #37 from Disservin --- (In reply to Roger Sayle from comment #36) > Please forgive me asking an aarch64 specific question in a middle-end PR, > but in the recent (excellent) bitreverse patches for aarch64 is the > bitwise-AND actually required in: > > ** br8: > ** and w0, w0, 255 > ** rbitw0, w0 > ** lsr w0, w0, 24 > ** ret > > and > > ** br16: > ** and w0, w0, 65535 > ** rbitw0, w0 > ** lsr w0, w0, 16 > ** ret > > I'm guessing that either expanding via a paradoxical SUBREG, or a currently > missing RTL simplification (or a backend define_insn_and_split or peephole2) > should be able to remove the (I believe) unnecessary zero_extend. it‘s not, see https://gcc.gnu.org/pipermail/gcc-patches/2026-May/717054.html but I first wanted to land the rbit patch since codegen wise it’s already a big improvement over the generic fallback.. i’d be happy if someone with more knowledge can get rid of the zero extend
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481 Roger Sayle changed: What|Removed |Added CC||roger at nextmovesoftware dot com --- Comment #36 from Roger Sayle --- Please forgive me asking an aarch64 specific question in a middle-end PR, but in the recent (excellent) bitreverse patches for aarch64 is the bitwise-AND actually required in: ** br8: ** and w0, w0, 255 ** rbitw0, w0 ** lsr w0, w0, 24 ** ret and ** br16: ** and w0, w0, 65535 ** rbitw0, w0 ** lsr w0, w0, 16 ** ret I'm guessing that either expanding via a paradoxical SUBREG, or a currently missing RTL simplification (or a backend define_insn_and_split or peephole2) should be able to remove the (I believe) unnecessary zero_extend.
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481 --- Comment #35 from GCC Commits --- The trunk branch has been updated by Andrew Pinski : https://gcc.gnu.org/g:f0c0214fbcf10198b5dbc2d102a7c8f55f631f4c commit r17-678-gf0c0214fbcf10198b5dbc2d102a7c8f55f631f4c Author: Disservin Date: Wed May 20 20:01:07 2026 +0200 aarch64: Add bitreverse expanders [PR50481] Add missing AArch64 bitreverse expanders so __builtin_bitreverse* can lower to existing rbit patterns. PR target/50481 gcc/testsuite/ChangeLog: * gcc.target/aarch64/bitreverse.c: New test. gcc/ChangeLog: * config/aarch64/aarch64.md (bitreverse2, bitreverseqi2, bitreversehi2): New expanders. * config/aarch64/aarch64-simd.md (bitreverse2): New expander. Signed-off-by: Disservin
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481
--- Comment #34 from GCC Commits ---
The master branch has been updated by Xi Ruoyao :
https://gcc.gnu.org/g:82d9668ca6950419b1b00c0c4f8c9a87b6071051
commit r17-635-g82d9668ca6950419b1b00c0c4f8c9a87b6071051
Author: Xi Ruoyao
Date: Mon May 18 22:59:33 2026 +0800
LoongArch: Rename rbit to bitreverse2 [PR 50481]
r17-523 has added the __builtin_bitreverse{8,16,32,64} builtins and
established that the standard optab names for them are
bitreverse2. Rename the rbit expanders so they'll be used
for those builtins.
r17-567 has already removed the uses of rbit so the old names do
not need to be kept.
PR target/50481
gcc/
* config/loongarch/loongarch.md (@rbit2): Rename to
...
(@bitreverse2): ... this.
(rbithi2): Rename to ...
(bitreversehi2): ... this.
(rbitqi2): Rename to ...
(bitreverseqi2): ... this.
(rbitsi_extended): Rename to ...
(bitreversesi2_extended): ... this.
gcc/testsuite/
* gcc.target/loongarch/la64/bitreverse.c: New test.
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481 --- Comment #33 from GCC Commits --- The master branch has been updated by Georg-Johann Lay : https://gcc.gnu.org/g:bc43bf57757f34f349149cf93b81d8dcdbe411a2 commit r17-598-gbc43bf57757f34f349149cf93b81d8dcdbe411a2 Author: Georg-Johann Lay Date: Tue May 19 13:33:00 2026 +0200 AVR: Add bitreverseqi2 insns. Now that https://gcc.gnu.org/r17-591 has been applied, the middle-end will express 8-bit bitreverse code in terms of a 16-bit bitreverse. Therefore, add bitreverseqi2 insns. PR target/50481 gcc/ * config/avr/avr.md (bitreverseqi2): New insn-and-split. (*bitreverseqi2): New insn.
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481
--- Comment #32 from GCC Commits ---
The master branch has been updated by Jakub Jelinek :
https://gcc.gnu.org/g:07ce51abde6d6f92febab863a68c69266a808f9f
commit r17-591-g07ce51abde6d6f92febab863a68c69266a808f9f
Author: Jakub Jelinek
Date: Tue May 19 09:29:33 2026 +0200
optabs: Handle bitreverse using widening or two bitreverses of halves
[PR50481]
The following patch extends the widen_bswap and expand_doubleword_bswap
functions to handle also bitreverse, so that all the backends with
say just bitreversesi2 or bitreverse{s,d}i2 can handle also
bitreverse{q,h}i2 and bitreverse{d,t}i2 easily.
2026-05-19 Jakub Jelinek
PR target/50481
* optabs.cc (widen_bswap): Add UNOPTAB argument and use it instead
of hardcoded bswap_optab. Rename to ...
(widen_bswap_or_bitreverse): ... this.
(expand_doubleword_bswap): Add UNOPTAB argument and use it instead
of hardcoded bswap_optab. Rename to ...
(expand_doubleword_bswap_or_bitreverse): ... this.
(expand_bitreverse): Use widen_bswap_or_bitreverse and
expand_doubleword_bswap_or_bitreverse.
(expand_unop): Adjust widen_bswap and expand_doubleword_bswap
callers
to use new names and add an extra bswap_optab argument.
Reviewed-by: Jeffrey Law
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481 --- Comment #31 from Jakub Jelinek --- (In reply to Disservin from comment #30) > > It is certainly larger (both x86_64 and ia32) and on x86_64 same number of > > instructions: > > clang manages to shave some more instructions off compared to gcc's output > https://godbolt.org/z/EeoonM4va Still it is larger and slower.
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481 --- Comment #30 from Disservin --- > It is certainly larger (both x86_64 and ia32) and on x86_64 same number of > instructions: clang manages to shave some more instructions off compared to gcc's output https://godbolt.org/z/EeoonM4va
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481
--- Comment #29 from Jakub Jelinek ---
(In reply to Alexander Kleinsorge from comment #13)
> for single bytes (uint8), there could be a faster way (x86 + x64).
> there are only logical ops and shifts, nothing else.
>
> static inline uint8 byte_rev(uint8 v) {
> const uint64 BREV64 = ~0x084c2a6e195d3b7fLLu; // verify this number (LUT
> like)
> uint8 a = (BREV64) >> ((v % 16u) * 4u); // from low
> uint8 b = (BREV64) >> ((v / 16u) * 4u); // from high
> return (a * 16u) | (b % 16u);
> }
Why do you think this is faster?
It is certainly larger (both x86_64 and ia32) and on x86_64 same number of
instructions:
0: 48 ba 80 c4 a2 e6 91movabs $0xf7b3d591e6a2c480,%rdx
7: d5 b3 f7
a: 89 f9 mov%edi,%ecx
c: 40 c0 ef 04 shr$0x4,%dil
10: 83 e1 0fand$0xf,%ecx
13: 48 89 d0mov%rdx,%rax
16: c1 e1 02shl$0x2,%ecx
19: 48 d3 e8shr%cl,%rax
1c: 40 0f b6 cf movzbl %dil,%ecx
20: c1 e1 02shl$0x2,%ecx
23: c1 e0 04shl$0x4,%eax
26: 48 d3 eashr%cl,%rdx
29: 83 e2 0fand$0xf,%edx
2c: 09 d0 or %edx,%eax
2e: c3 ret
vs.
0: 40 c0 c7 04 rol$0x4,%dil
4: 89 fa mov%edi,%edx
6: 83 e7 33and$0x33,%edi
9: c0 ea 02shr$0x2,%dl
c: c1 e7 02shl$0x2,%edi
f: 83 e2 33and$0x33,%edx
12: 09 fa or %edi,%edx
14: 89 d0 mov%edx,%eax
16: 83 e2 55and$0x55,%edx
19: d0 e8 shr$1,%al
1b: 01 d2 add%edx,%edx
1d: 83 e0 55and$0x55,%eax
20: 09 d0 or %edx,%eax
22: c3 ret
Both are 14 insns, but the #c13 is larger.
For ia32, it is 105 bytes vs. 39, 34 vs. 15 insns.
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481 --- Comment #28 from GCC Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:c0fd049ae75349dc273de868fd3b3e8935115418 commit r17-563-gc0fd049ae75349dc273de868fd3b3e8935115418 Author: Jakub Jelinek Date: Mon May 18 09:41:59 2026 +0200 i386: Implement bitreverse2 optab for GFNI [PR50481] The following patch implements the bitreverse2 optab for -mgfni -msse2 (SSE2 because apparently -mgfni doesn't imply -msse nor -msse2). This is done by using gf2p8affineqb insn with a special constant which reverses bits in each byte, and for modes wider than QImode also by doing a byteswap afterwards. With -m64 it emits .LC0: .byte 1, 2, 4, 8, 16, 32, 64, -128 .byte 1, 2, 4, 8, 16, 32, 64, -128 and movd%edi, %xmm0 gf2p8affineqb $0, .LC0(%rip), %xmm0 movd%xmm0, %eax for __builtin_bitreverse8, movd%edi, %xmm0 gf2p8affineqb $0, .LC0(%rip), %xmm0 movd%xmm0, %eax rolw$8, %ax for __builtin_bitreverse16, movd%edi, %xmm0 gf2p8affineqb $0, .LC0(%rip), %xmm0 movd%xmm0, %eax bswap %eax for __builtin_bitreverse32, movq%rdi, %xmm0 gf2p8affineqb $0, .LC0(%rip), %xmm0 movq%xmm0, %rax bswap %rax for __builtin_bitreverse64, and movq%rdi, %xmm0 pinsrq $1, %rsi, %xmm0 gf2p8affineqb $0, .LC0(%rip), %xmm0 movq%xmm0, %rax pextrq $1, %xmm0, %rdx bswap %rax bswap %rdx xchgq %rdx, %rax for __builtin_bitreverse128 (only the xchgq is unnecessary and surprising, some RA issue). 2026-05-18 Jakub Jelinek PR target/50481 * config/i386/i386-protos.h (ix86_expand_gfni_bitreverse): Declare. * config/i386/i386-expand.cc (ix86_expand_gfni_bitreverse): New function. * config/i386/i386.md (bitreverse2): New expander. * gcc.target/i386/gfni-builtin-bitreverse-1.c: New test. Reviewed-by: Hongtao Liu
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481 --- Comment #27 from Jakub Jelinek --- Well, guess for vectorization there is more work to do in the vectorizer and introduce an IFN for it. Anyway, I think we don't want to duplicate the QI/HI/double-word handling for arm, nvptx, loongarch, cris, maybe riscv, so handling it in generic code is better and I think we can just rename/tweak widen_bswap and expand_doubleword_bswap to handle both bswap and bitreverse by passing optab argument to those.
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481
--- Comment #26 from Drea Pinski ---
(In reply to Jakub Jelinek from comment #25)
> As for aarch64, the following works for me, but haven't tested it yet beyond
> gcc.dg/builtin-bitreverse-1.c test:
> 2026-05-16 Jakub Jelinek
>
> PR target/50481
> * config/aarch64/aarch64.md (bitreverse2): New define_expand.
>
> --- gcc/config/aarch64/aarch64.md.jj 2026-04-20 09:07:40.439840638 +0200
> +++ gcc/config/aarch64/aarch64.md 2026-05-16 21:32:02.545388156 +0200
> @@ -5786,6 +5786,24 @@ (define_insn "@aarch64_rbit"
>[(set_attr "type" "rbit")]
> )
>
> +(define_expand "bitreverse2"
> + [(set (match_operand:GPI 0 "register_operand")
> + (bitreverse:GPI (match_operand:GPI 1 "register_operand")))])
> +
> +(define_expand "bitreverse2"
> + [(match_operand:SHORT 0 "register_operand")
> + (match_operand:SHORT 1 "register_operand")]
> + ""
> + {
> +rtx rbitd = gen_reg_rtx (SImode);
> +emit_insn (gen_bitreversesi2 (rbitd, gen_lowpart (SImode,
> operands[1])));
> +rtx shiftd = gen_reg_rtx (SImode);
> +emit_insn (gen_ashrsi3 (shiftd, rbitd, GEN_INT (32 - )));
> +emit_move_insn (operands[0], gen_lowpart (mode, shiftd));
> +DONE;
> + }
> +)
> +
> (define_expand "ffs2"
>[(match_operand:GPI 0 "register_operand")
> (match_operand:GPI 1 "register_operand")]
>
>
> Guess bitreverseti2 could be handled too, dunno if in the backend or in
> generic code handle doubleword bitreverse using 2 word bitreverses and
> swapping the words.
> Given that we have widen_bswap and expand_doubleword_bswap I think we should
> just copy/adjust those in the generic code and even remove from the above
> aarch64 patch the second define_expand.
> Will handle it on Monday.
https://gcc.gnu.org/pipermail/gcc-patches/2026-May/716917.html has part of that
alredy including testcases. And even includes a vector mode change.
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481
Jakub Jelinek changed:
What|Removed |Added
Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org
Status|NEW |ASSIGNED
--- Comment #25 from Jakub Jelinek ---
As for aarch64, the following works for me, but haven't tested it yet beyond
gcc.dg/builtin-bitreverse-1.c test:
2026-05-16 Jakub Jelinek
PR target/50481
* config/aarch64/aarch64.md (bitreverse2): New define_expand.
--- gcc/config/aarch64/aarch64.md.jj2026-04-20 09:07:40.439840638 +0200
+++ gcc/config/aarch64/aarch64.md 2026-05-16 21:32:02.545388156 +0200
@@ -5786,6 +5786,24 @@ (define_insn "@aarch64_rbit"
[(set_attr "type" "rbit")]
)
+(define_expand "bitreverse2"
+ [(set (match_operand:GPI 0 "register_operand")
+ (bitreverse:GPI (match_operand:GPI 1 "register_operand")))])
+
+(define_expand "bitreverse2"
+ [(match_operand:SHORT 0 "register_operand")
+ (match_operand:SHORT 1 "register_operand")]
+ ""
+ {
+rtx rbitd = gen_reg_rtx (SImode);
+emit_insn (gen_bitreversesi2 (rbitd, gen_lowpart (SImode, operands[1])));
+rtx shiftd = gen_reg_rtx (SImode);
+emit_insn (gen_ashrsi3 (shiftd, rbitd, GEN_INT (32 - )));
+emit_move_insn (operands[0], gen_lowpart (mode, shiftd));
+DONE;
+ }
+)
+
(define_expand "ffs2"
[(match_operand:GPI 0 "register_operand")
(match_operand:GPI 1 "register_operand")]
Guess bitreverseti2 could be handled too, dunno if in the backend or in generic
code handle doubleword bitreverse using 2 word bitreverses and swapping the
words.
Given that we have widen_bswap and expand_doubleword_bswap I think we should
just copy/adjust those in the generic code and even remove from the above
aarch64 patch the second define_expand.
Will handle it on Monday.
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481 --- Comment #24 from GCC Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:18d08c0e6115e6c34eaed73f242ccf210a455240 commit r17-551-g18d08c0e6115e6c34eaed73f242ccf210a455240 Author: Jakub Jelinek Date: Sat May 16 10:51:39 2026 +0200 match.pd: Enable some __builtin_bswap* optimizations even for __builtin_bitreverse* [PR50481] Most of the bswap optimizations equally apply also to bitreverse builtins. The following patch enables those. 2026-05-16 Jakub Jelinek PR target/50481 * match.pd (BITREVERSE): New define_operator_list. Use it next to BSWAP for a subset of bswap simplifications. * gcc.dg/builtin-bitreverse-4.c: New test. Reviewed-by: Andrew Pinski
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481
--- Comment #23 from GCC Commits ---
The master branch has been updated by Jakub Jelinek :
https://gcc.gnu.org/g:54f8428e0342935d5f9c3282fbae1db63cf90ac1
commit r17-550-g54f8428e0342935d5f9c3282fbae1db63cf90ac1
Author: Jakub Jelinek
Date: Sat May 16 10:50:57 2026 +0200
Add __builtin_bitreverse128 [PR50481]
We already have __builtin_bswap{16,32,64,128}, the last one has been
added ~6 years ago. So, I think we should have also
__builtin_bitreverse128.
The following patch does that.
Note, we don't have __builtin_bswapg and I don't think we should, one can
only byteswap something which has number of bits divisible by CHAR_BIT.
For __builtin_bitreverseg that isn't a problem, but am not sure I want to
spend time handling it on say unsigned _BitInt(357). Perhaps only if there
is some real-world use-case.
2026-05-16 Jakub Jelinek
PR target/50481
* doc/extend.texi (__builtin_bitreverse32, __builtin_bitreverse64):
Tweak wording for consistency with __builtin_bswap*.
(__builtin_bitreverse128): Document.
* builtins.def (BUILT_IN_BITREVERSE128): New.
* builtins.cc (expand_builtin): Handle also BUILT_IN_BITREVERSE128.
(is_inexpensive_builtin): Likewise.
* fold-const-call.cc (fold_const_call_ss): Handle also
CFN_BUILT_IN_BITREVERSE128.
* fold-const.cc (tree_call_nonnegative_warnv_p): Likewise.
* tree-ssa-ccp.cc (evaluate_stmt): Handle also
BUILT_IN_BITREVERSE128.
* tree-ssa-phiopt.cc (empty_bb_or_one_feeding_into_p): Handle also
CFN_BUILT_IN_BITREVERSE128.
(cond_removal_in_builtin_zero_pattern): Likewise.
* gcc.dg/builtin-bitreverse-1.c: Add __builtin_bitreverse128 tests.
* gcc.dg/builtin-bitreverse-2.c: Likewise.
Reviewed-by: Andrew Pinski
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481
--- Comment #22 from GCC Commits ---
The master branch has been updated by Jakub Jelinek :
https://gcc.gnu.org/g:7ca53f9d86ef5f4c6a49f414f3cdd88d2b8a0bad
commit r17-549-g7ca53f9d86ef5f4c6a49f414f3cdd88d2b8a0bad
Author: Jakub Jelinek
Date: Sat May 16 10:50:00 2026 +0200
tree-ssa-ccp: Fix up __builtin_bitreverse* handling [PR50481]
The committed __builtin_bitreverse* patch mishandled the
bitwise CCP handling, it is true that BUILT_IN_BITREVERSE* can be
handled there similarly to BUILT_IN_BSWAP*, but not exactly, for
the latter we need (and do) bswap the value and mask constants,
while for the former we obviously need to bitreverse them instead.
2026-05-16 Jakub Jelinek
PR target/50481
* tree-ssa-ccp.cc (evaluate_stmt): Fix up
BUILT_IN_BITREVERSE{8,16,32,64} handling.
* gcc.dg/builtin-bitreverse-3.c: New test.
Reviewed-by: Andrew Pinski
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481 --- Comment #21 from Jakub Jelinek --- Created attachment 64467 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=64467&action=edit gcc17-pr50481-3.patch Untested third incremental patch.
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481 --- Comment #20 from Jakub Jelinek --- Created attachment 64466 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=64466&action=edit gcc17-pr50481-2.patch Untested second incremental patch.
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481 --- Comment #19 from Jakub Jelinek --- Created attachment 64465 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=64465&action=edit gcc17-pr50481-1.patch Untested first incremental patch.
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481 Disservin changed: What|Removed |Added CC||disservin.social at gmail dot com --- Comment #18 from Disservin --- > I have a patch to add __builtin_bitreverse128 support and hook some > bitreverse match.pd optimizations. Thanks. I'll try to prepare a patch for aarch64 soon, which lowers this to rbit when possible, this shouldn't need much since the insn already exist iirc
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #17 from Jakub Jelinek --- I have a patch to add __builtin_bitreverse128 support and hook some bitreverse match.pd optimizations.
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481
--- Comment #16 from GCC Commits ---
The master branch has been updated by Jakub Jelinek :
https://gcc.gnu.org/g:c564a8be8a15389e8a5119e51d5929f0689044be
commit r17-523-gc564a8be8a15389e8a5119e51d5929f0689044be
Author: Jakub Jelinek
Date: Fri May 15 09:50:52 2026 +0200
Add __builtin_bitreverse{8,16,32,64} builtins [PR50481]
Future work could optimize this on specific targets:
- ARM: lower to RBIT
- x86 with GFNI: lower to vgf2p8affineqb
https://wunkolo.github.io/post/2020/11/gf2p8affineqb-bit-reversal/
2026-05-15 Disservin
Jakub Jelinek
PR target/50481
* builtin-types.def (BT_FN_UINT8_UINT8): New.
* builtins.def (BUILT_IN_BITREVERSE8, BUILT_IN_BITREVERSE16,
BUILT_IN_BITREVERSE32, BUILT_IN_BITREVERSE64): New builtins.
* builtins.cc (expand_builtin, is_inexpensive_builtin): Handle
bitreverse builtins.
* fold-const-call.cc (fold_const_call_ss): Fold bitreverse
builtins.
* fold-const.cc (tree_call_nonnegative_warnv_p): Handle
bitreverse builtins.
* optabs.def (bitreverse_optab): New.
* optabs.cc (expand_bitreverse): New function.
(expand_unop): Use it for bitreverse_optab.
* tree-ssa-ccp.cc (evaluate_stmt): Handle bitreverse builtins.
* tree-ssa-phiopt.cc (empty_bb_or_one_feeding_into_p,
cond_removal_in_builtin_zero_pattern): Likewise.
* doc/extend.texi: Document __builtin_bitreverse{8,16,32,64}.
* doc/md.texi (bitreverse2): Document.
* gcc.dg/builtin-bitreverse-1.c: New test.
* gcc.dg/builtin-bitreverse-2.c: New test.
Signed-off-by: Disservin
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481 Drea Pinski changed: What|Removed |Added Keywords||patch URL||https://gcc.gnu.org/piperma ||il/gcc-patches/2026-May/715 ||490.html --- Comment #15 from Drea Pinski --- https://gcc.gnu.org/pipermail/gcc-patches/2026-May/715490.html
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481 Drea Pinski changed: What|Removed |Added Status|ASSIGNED|NEW Assignee|pinskia at gcc dot gnu.org |unassigned at gcc dot gnu.org --- Comment #14 from Drea Pinski --- .
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481
Alexander Kleinsorge changed:
What|Removed |Added
CC||aleks at physik dot
tu-berlin.de
--- Comment #13 from Alexander Kleinsorge ---
for single bytes (uint8), there could be a faster way (x86 + x64).
there are only logical ops and shifts, nothing else.
static inline uint8 byte_rev(uint8 v) {
const uint64 BREV64 = ~0x084c2a6e195d3b7fLLu; // verify this number (LUT
like)
uint8 a = (BREV64) >> ((v % 16u) * 4u); // from low
uint8 b = (BREV64) >> ((v / 16u) * 4u); // from high
return (a * 16u) | (b % 16u);
}
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481 --- Comment #12 from Andrew Pinski --- Also will add an internal function which will be used for vectorization.
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481
--- Comment #11 from Andrew Pinski ---
The builtins I am going to implement to be similar to clang:
__builtin_bitreverse{8,16,32,64,g}
The g one is not part of clang but will be used for _BitInt types.
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481 Andrew Pinski changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |pinskia at gcc dot gnu.org Keywords||missed-optimization Status|NEW |ASSIGNED --- Comment #10 from Andrew Pinski --- I am going to implement this. and add an optab too.
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481 Xi Ruoyao changed: What|Removed |Added CC||xry111 at gcc dot gnu.org --- Comment #9 from Xi Ruoyao --- Useful for LoongArch too. And now we already have bitreverse RTX code since r14-1586.
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481 Carlo Wood changed: What|Removed |Added CC||carlo at gcc dot gnu.org --- Comment #8 from Carlo Wood --- Bump - I need this too ;)
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481 Frank changed: What|Removed |Added CC||f.boesing at gmx dot de --- Comment #7 from Frank --- Would be really useful to have this.
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481 --- Comment #6 from Joseph S. Myers --- As noted in the RISC-V BoF today, this would also be useful for the RISC-V bit-manipulation extension.
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481 Wilco changed: What|Removed |Added CC||wilco at gcc dot gnu.org --- Comment #5 from Wilco --- Yes it would be good to have a generic builtin, this issue keeps coming up: https://gcc.gnu.org/ml/gcc-patches/2019-06/msg01187.html
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481 --- Comment #4 from krux --- +1 The builtins already produce better code than a generic bitreverse implementation: https://godbolt.org/z/Um2Tit But using special hardware instructions automatically is even more important imho.
[Bug middle-end/50481] builtin to reverse the bit order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481 Matthijs van Duin changed: What|Removed |Added CC||matthijsvanduin at gmail dot com --- Comment #3 from Matthijs van Duin --- Bump! Proper intrinsics for bitreverse would be much appreciated! A plain C implementation is ugly and results in equally awful code output, while using inline asm breaks portability and can't be constant-folded or used in constexpr. What makes the continued lack of a __builtin_arm_rbit() in gcc a bit bizarre is that the (identically named) Neon versions of this instruction on AArch64 actually *did* receive proper intrinsics! [1] It's worth mentioning that clang does support __builtin_arm_rbit(), and they've actually generalized this to a full set of target-independent bitreverse builtins [2]. [1] https://gcc.gnu.org/ml/gcc-patches/2014-08/msg01913.html [2] http://clang.llvm.org/docs/LanguageExtensions.html#builtin-bitreverse
[Bug middle-end/50481] builtin to reverse the bit order
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481 Tim Parker changed: What|Removed |Added CC||ptim034 at gmail dot com --- Comment #2 from Tim Parker 2012-03-29 10:46:11 UTC --- I totally don't understand what you talking about =_= http://www.buyanessay.com/
[Bug middle-end/50481] builtin to reverse the bit order
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481 Jonathan Schmidt-Dominé changed: What|Removed |Added CC||[email protected] --- Comment #1 from Jonathan Schmidt-Dominé 2011-09-23 15:30:33 UTC --- Informative web-page listing some methods: http://graphics.stanford.edu/~seander/bithacks.html#ReverseByteWith64BitsDiv
[Bug middle-end/50481] builtin to reverse the bit order
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481 Paolo Carlini changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2011-09-22 Ever Confirmed|0 |1
