[Bug target/120553] Improve code to select between -1 and various values
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120553 Andrew Pinski changed: What|Removed |Added Target Milestone|--- |16.0
[Bug target/120553] Improve code to select between -1 and various values
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120553 Jeffrey A. Law changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #11 from Jeffrey A. Law --- Fixed on the trunk.
[Bug target/120553] Improve code to select between -1 and various values
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120553 --- Comment #10 from GCC Commits --- The master branch has been updated by Jeff Law : https://gcc.gnu.org/g:ebbeaf490c56e04d2e9be25caf9522ef5fba6c72 commit r16-3350-gebbeaf490c56e04d2e9be25caf9522ef5fba6c72 Author: Jeff Law Date: Fri Aug 22 11:53:27 2025 -0600 [PR rtl-optimization/120553] Improve selecting between constants based on sign bit test While working to remove mvconst_internal I stumbled over a regression in the code to handle signed division by a power of two. In that sequence we want to select between 0, 2^n-1 by pairing a sign bit splat with a subsequent logical right shift. This can be done without branches or conditional moves. Playing with it a bit made me realize there's a handful of selections we can do based on a sign bit test. Essentially there's two broad cases. Clearing bits after the sign bit splat. So we have 0, -1, if we clear bits the 0 stays as-is, but the -1 could easily turn into 2^n-1, ~2^n-1, or some small constants. Setting bits after the sign bit splat. If we have 0, -1, setting bits the -1 stays as-is, but the 0 can turn into 2^n, a small constant, etc. Shreya and I originally started looking at target patterns to do this, essentially discovering conditional move forms of the selects and rewriting them into something more efficient. That got out of control pretty quickly and it relied on if-conversion to initially create the conditional move. The better solution is to actually discover the cases during if-conversion itself. That catches cases that were previously being missed, checks cost models, and is actually simpler since we don't have to distinguish between things like ori and bseti, instead we just emit the natural RTL and let the target figure it out. In the ifcvt implementation we put these cases just before trying the traditional conditional move sequences. Essentially these are a last attempt before trying the generalized conditional move sequence. This as been bootstrapped and regression tested on aarch64, riscv, ppc64le, s390x, alpha, m68k, sh4eb, x86_64 and probably a couple others I've forgotten. It's also been tested on the other embedded targets. Obviously the new tests are risc-v specific, so that testing was primarily to make sure we didn't ICE, generate incorrect code or regress target existing specific tests. Raphael has some changes to attack this from the gimple direction as well. I think the latest version of those is on me to push through internal review. PR rtl-optimization/120553 gcc/ * ifcvt.cc (noce_try_sign_bit_splat): New function. (noce_process_if_block): Use it. gcc/testsuite/ * gcc.target/riscv/pr120553-1.c: New test. * gcc.target/riscv/pr120553-2.c: New test. * gcc.target/riscv/pr120553-3.c: New test. * gcc.target/riscv/pr120553-4.c: New test. * gcc.target/riscv/pr120553-5.c: New test. * gcc.target/riscv/pr120553-6.c: New test. * gcc.target/riscv/pr120553-7.c: New test. * gcc.target/riscv/pr120553-8.c: New test.
[Bug target/120553] Improve code to select between -1 and various values
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120553 Jeffrey A. Law changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |law at gcc dot gnu.org --- Comment #9 from Jeffrey A. Law --- Quick update. Shreya & I cobbled together target patterns to improve these cases. But I ultimately concluded this was better handled via generic ifcvt.cc changes which I'm preparing for submission.
[Bug target/120553] Improve code to select between -1 and various values
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120553 Uroš Bizjak changed: What|Removed |Added Last reconfirmed||2025-06-05 Status|UNCONFIRMED |NEW Ever confirmed|0 |1
[Bug target/120553] Improve code to select between -1 and various values
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120553 --- Comment #8 from Uroš Bizjak --- (In reply to Richard Biener from comment #1) > might be also interesting on x86-64 when using bts can use a smaller > immediate than the now used orq and thus improve instruction size (but it > clobbers flags). Sorry for hijacking this PR to implement the above suggestion. I hope that x86 implementation can serve as an example for RISC-V. BTW: Please note that for 32-bit immediates the size difference between OR and BTS is one single byte on x86_64, but BTS is slightly slower than OR. For 64-bit exact-log2 immediates, the patched compiler can avoid CMOV and MOVABS, and even for non-exact-log2 immediates, it can avoid somehow problematic CMOV.
[Bug target/120553] Improve code to select between -1 and various values
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120553
--- Comment #7 from GCC Commits ---
The master branch has been updated by Uros Bizjak :
https://gcc.gnu.org/g:ed57e5de634eda91f32e0e61724d8f103ef648dd
commit r16-1196-ged57e5de634eda91f32e0e61724d8f103ef648dd
Author: Uros Bizjak
Date: Thu Jun 5 22:53:35 2025 +0200
[i386] Improve "movcc" expander for DImode immediates [PR120553]
"movcc" expander uses x86_64_general_operand predicate that limits
the
range of immediate operands to 32-bit size. The usage of this predicate
causes ifcvt to force out-of-range immediates to registers when converting
through noce_try_cmove. The testcase:
long long foo (long long c) { return c >= 0 ? 0x4ll : -1ll; }
compiles (-O2) to:
foo:
testq %rdi, %rdi
movq$-1, %rax
movabsq $0x4, %rdx
cmovns %rdx, %rax
ret
The above testcase can be compiled to a more optimized code without
problematic CMOV instruction if 64-bit immediates are allowed in
"movcc" expander:
foo:
movq%rdi, %rax
sarq$63, %rax
btsq$34, %rax
ret
The expander calls the ix86_expand_int_movcc function which internally
sanitizes arguments of emitted logical insns using expand_simple_binop.
The out-of-range immediates are forced to a temporary register just
before the instruction, so the instruction combiner is then able to
synthesize 64-bit BTS instruction.
The code improves even for non-exact-log2 64-bit immediates, e.g.
long long foo (long long c) { return c >= 0 ? 0x41234ll : -1ll; }
that now compiles to:
foo:
movabsq $0x41234, %rdx
movq%rdi, %rax
sarq$63, %rax
orq %rdx, %rax
ret
again avoiding problematic CMOV instruction.
PR target/120553
gcc/ChangeLog:
* config/i386/i386.md (movcc): Use "general_operand"
predicate for operands 2 and 3 for all modes.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr120553.c: New test.
[Bug target/120553] Improve code to select between -1 and various values
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120553 --- Comment #5 from Uroš Bizjak --- This patch fixes the non-optimal testcase in Comment #4 for x86_64: --cut here-- diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 40b43cf092a..8eee44756eb 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -26478,8 +26478,8 @@ (define_peephole2 (define_expand "movcc" [(set (match_operand:SWIM 0 "register_operand") (if_then_else:SWIM (match_operand 1 "comparison_operator") - (match_operand:SWIM 2 "") - (match_operand:SWIM 3 "")))] + (match_operand:SWIM 2 "general_operand") + (match_operand:SWIM 3 "general_operand")))] "" "if (ix86_expand_int_movcc (operands)) DONE; else FAIL;") --cut here-- gcc -O2: movq%rdi, %rax sarq$63, %rax btsq$34, %rax ret
[Bug target/120553] Improve code to select between -1 and various values
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120553 --- Comment #6 from Jeffrey A. Law --- Note there's a variety of other twiddles that can be done here. If we want to select between -1 and any simm12, then that's srai+ori. We can select between any constant with a single bit off and 0 using srai+bclr. We can select between a simm12 and 0 using sria+andi. We can select between any constant with just high bits set and 0 with srai+slli. We can select between any constant with just low bits set and 0 with srai+srli. The last is particularly important for division by a power of 2 and is the subject of my next patch in this space :-) And there are almost certainly some 3 instruction sequences as well, though they are harder to handle if a purely target approach is taken.
[Bug target/120553] Improve code to select between -1 and various values
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120553
--- Comment #4 from Uroš Bizjak ---
(In reply to Uroš Bizjak from comment #3)
> (In reply to Richard Biener from comment #1)
> > might be also interesting on x86-64 when using bts can use a smaller
> > immediate than the now used orq and thus improve instruction size (but it
> > clobbers flags).
> There is *iordi_1_bts pattern available, but for some reason not exercised
> for the testcase in the description.
I tried this testcase:
long foo1 (long c) { return c >= 0 ? 0x4 : -1 ; }
that resulted in:
testq %rdi, %rdi
movq$-1, %rax
movabsq $17179869184, %rdx
cmovns %rdx, %rax
ret
when the constant fits in the immediate field of the instruction (e.g.
0x4000):
movq%rdi, %rax
sarq$63, %rax
orq $1073741824, %rax
ret
[Bug target/120553] Improve code to select between -1 and various values
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120553 --- Comment #3 from Uroš Bizjak --- (In reply to Richard Biener from comment #1) > might be also interesting on x86-64 when using bts can use a smaller > immediate than the now used orq and thus improve instruction size (but it > clobbers flags). There is *iordi_1_bts pattern available, but for some reason not exercised for the testcase in the description.
[Bug target/120553] Improve code to select between -1 and various values
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120553 --- Comment #2 from Uroš Bizjak --- (In reply to Richard Biener from comment #1) > might be also interesting on x86-64 when using bts can use a smaller > immediate than the now used orq and thus improve instruction size (but it > clobbers flags). All arithmetic instructions (including orq) clobber flags on x86.
[Bug target/120553] Improve code to select between -1 and various values
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120553 Richard Biener changed: What|Removed |Added Version|unknown |16.0 Keywords||missed-optimization Target|riscv |riscv, x86-64 --- Comment #1 from Richard Biener --- might be also interesting on x86-64 when using bts can use a smaller immediate than the now used orq and thus improve instruction size (but it clobbers flags).
