[Bug target/120553] Improve code to select between -1 and various values

2025-08-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120553

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |16.0

[Bug target/120553] Improve code to select between -1 and various values

2025-08-22 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120553

Jeffrey A. Law  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #11 from Jeffrey A. Law  ---
Fixed on the trunk.

[Bug target/120553] Improve code to select between -1 and various values

2025-08-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120553

--- Comment #10 from GCC Commits  ---
The master branch has been updated by Jeff Law :

https://gcc.gnu.org/g:ebbeaf490c56e04d2e9be25caf9522ef5fba6c72

commit r16-3350-gebbeaf490c56e04d2e9be25caf9522ef5fba6c72
Author: Jeff Law 
Date:   Fri Aug 22 11:53:27 2025 -0600

[PR rtl-optimization/120553] Improve selecting between constants based on
sign bit test

While working to remove mvconst_internal I stumbled over a regression in
the code to handle signed division by a power of two.

In that sequence we want to select between 0, 2^n-1 by pairing a sign
bit splat with a subsequent logical right shift.  This can be done
without branches or conditional moves.

Playing with it a bit made me realize there's a handful of selections we
can do based on a sign bit test.  Essentially there's two broad cases.

Clearing bits after the sign bit splat.  So we have 0, -1, if we clear
bits the 0 stays as-is, but the -1 could easily turn into 2^n-1, ~2^n-1,
or some small constants.

Setting bits after the sign bit splat. If we have 0, -1, setting bits
the -1 stays as-is, but the 0 can turn into 2^n, a small constant, etc.

Shreya and I originally started looking at target patterns to do this,
essentially discovering conditional move forms of the selects and
rewriting them into something more efficient.  That got out of control
pretty quickly and it relied on if-conversion to initially create the
conditional move.

The better solution is to actually discover the cases during
if-conversion itself.  That catches cases that were previously being
missed, checks cost models, and is actually simpler since we don't have
to distinguish between things like ori and bseti, instead we just emit
the natural RTL and let the target figure it out.

In the ifcvt implementation we put these cases just before trying the
traditional conditional move sequences.  Essentially these are a last
attempt before trying the generalized conditional move sequence.

This as been bootstrapped and regression tested on aarch64, riscv,
ppc64le, s390x, alpha, m68k, sh4eb, x86_64 and probably a couple others
I've forgotten.  It's also been tested on the other embedded targets.
Obviously the new tests are risc-v specific, so that testing was
primarily to make sure we didn't ICE, generate incorrect code or regress
target existing specific tests.

Raphael has some changes to attack this from the gimple direction as
well.  I think the latest version of those is on me to push through
internal review.

PR rtl-optimization/120553
gcc/
* ifcvt.cc (noce_try_sign_bit_splat): New function.
(noce_process_if_block): Use it.

gcc/testsuite/

* gcc.target/riscv/pr120553-1.c: New test.
* gcc.target/riscv/pr120553-2.c: New test.
* gcc.target/riscv/pr120553-3.c: New test.
* gcc.target/riscv/pr120553-4.c: New test.
* gcc.target/riscv/pr120553-5.c: New test.
* gcc.target/riscv/pr120553-6.c: New test.
* gcc.target/riscv/pr120553-7.c: New test.
* gcc.target/riscv/pr120553-8.c: New test.

[Bug target/120553] Improve code to select between -1 and various values

2025-07-15 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120553

Jeffrey A. Law  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |law at gcc dot gnu.org

--- Comment #9 from Jeffrey A. Law  ---
Quick update.  Shreya & I cobbled together target patterns to improve these
cases.  But I ultimately concluded this was better handled via generic ifcvt.cc
changes which I'm preparing for submission.

[Bug target/120553] Improve code to select between -1 and various values

2025-06-05 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120553

Uroš Bizjak  changed:

   What|Removed |Added

   Last reconfirmed||2025-06-05
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

[Bug target/120553] Improve code to select between -1 and various values

2025-06-05 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120553

--- Comment #8 from Uroš Bizjak  ---
(In reply to Richard Biener from comment #1)
> might be also interesting on x86-64 when using bts can use a smaller
> immediate than the now used orq and thus improve instruction size (but it
> clobbers flags).

Sorry for hijacking this PR to implement the above suggestion. I hope that x86
implementation can serve as an example for RISC-V.

BTW: Please note that for 32-bit immediates the size difference between OR and
BTS is one single byte on x86_64, but BTS is slightly slower than OR. For
64-bit exact-log2 immediates, the patched compiler can avoid CMOV and MOVABS,
and even for non-exact-log2 immediates, it can avoid somehow problematic CMOV.

[Bug target/120553] Improve code to select between -1 and various values

2025-06-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120553

--- Comment #7 from GCC Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:ed57e5de634eda91f32e0e61724d8f103ef648dd

commit r16-1196-ged57e5de634eda91f32e0e61724d8f103ef648dd
Author: Uros Bizjak 
Date:   Thu Jun 5 22:53:35 2025 +0200

[i386] Improve "movcc" expander for DImode immediates [PR120553]

"movcc" expander uses x86_64_general_operand predicate that limits
the
range of immediate operands to 32-bit size.  The usage of this predicate
causes ifcvt to force out-of-range immediates to registers when converting
through noce_try_cmove.  The testcase:

long long foo (long long c) { return c >= 0 ? 0x4ll : -1ll; }

compiles (-O2) to:

foo:
testq   %rdi, %rdi
movq$-1, %rax
movabsq $0x4, %rdx
cmovns  %rdx, %rax
ret

The above testcase can be compiled to a more optimized code without
problematic CMOV instruction if 64-bit immediates are allowed in
"movcc" expander:

foo:
movq%rdi, %rax
sarq$63, %rax
btsq$34, %rax
ret

The expander calls the ix86_expand_int_movcc function which internally
sanitizes arguments of emitted logical insns using expand_simple_binop.
The out-of-range immediates are forced to a temporary register just
before the instruction, so the instruction combiner is then able to
synthesize 64-bit BTS instruction.

The code improves even for non-exact-log2 64-bit immediates, e.g.

long long foo (long long c) { return c >= 0 ? 0x41234ll : -1ll; }

that now compiles to:

foo:
movabsq $0x41234, %rdx
movq%rdi, %rax
sarq$63, %rax
orq %rdx, %rax
ret

again avoiding problematic CMOV instruction.

PR target/120553

gcc/ChangeLog:

* config/i386/i386.md (movcc): Use "general_operand"
predicate for operands 2 and 3 for all modes.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr120553.c: New test.

[Bug target/120553] Improve code to select between -1 and various values

2025-06-05 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120553

--- Comment #5 from Uroš Bizjak  ---
This patch fixes the non-optimal testcase in Comment #4 for x86_64:

--cut here--
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 40b43cf092a..8eee44756eb 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -26478,8 +26478,8 @@ (define_peephole2
 (define_expand "movcc"
   [(set (match_operand:SWIM 0 "register_operand")
(if_then_else:SWIM (match_operand 1 "comparison_operator")
-  (match_operand:SWIM 2 "")
-  (match_operand:SWIM 3 "")))]
+  (match_operand:SWIM 2 "general_operand")
+  (match_operand:SWIM 3 "general_operand")))]
   ""
   "if (ix86_expand_int_movcc (operands)) DONE; else FAIL;")

--cut here--

gcc -O2:

movq%rdi, %rax
sarq$63, %rax
btsq$34, %rax
ret

[Bug target/120553] Improve code to select between -1 and various values

2025-06-05 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120553

--- Comment #6 from Jeffrey A. Law  ---
Note there's a variety of other twiddles that can be done here. If we want to
select between -1 and any simm12, then that's srai+ori.

We can select between any constant with a single bit off and 0 using srai+bclr.

We can select between a simm12 and 0 using sria+andi. 

We can select between any constant with just high bits set and 0 with
srai+slli.

We can select between any constant with just low bits set and 0 with srai+srli.

The last is particularly important for division by a power of 2 and is the
subject of my next patch in this space :-)

And there are almost certainly some 3 instruction sequences as well, though
they are harder to handle if a purely target approach is taken.

[Bug target/120553] Improve code to select between -1 and various values

2025-06-05 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120553

--- Comment #4 from Uroš Bizjak  ---
(In reply to Uroš Bizjak from comment #3)
> (In reply to Richard Biener from comment #1)
> > might be also interesting on x86-64 when using bts can use a smaller
> > immediate than the now used orq and thus improve instruction size (but it
> > clobbers flags).
> There is *iordi_1_bts pattern available, but for some reason not exercised
> for the testcase in the description.
I tried this testcase:

long foo1 (long c) { return c >= 0 ? 0x4 : -1 ; }

that resulted in:

testq   %rdi, %rdi
movq$-1, %rax
movabsq $17179869184, %rdx
cmovns  %rdx, %rax
ret

when the constant fits in the immediate field of the instruction (e.g.
0x4000):

movq%rdi, %rax
sarq$63, %rax
orq $1073741824, %rax
ret

[Bug target/120553] Improve code to select between -1 and various values

2025-06-05 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120553

--- Comment #3 from Uroš Bizjak  ---
(In reply to Richard Biener from comment #1)
> might be also interesting on x86-64 when using bts can use a smaller
> immediate than the now used orq and thus improve instruction size (but it
> clobbers flags).
There is *iordi_1_bts pattern available, but for some reason not exercised for
the testcase in the description.

[Bug target/120553] Improve code to select between -1 and various values

2025-06-05 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120553

--- Comment #2 from Uroš Bizjak  ---
(In reply to Richard Biener from comment #1)
> might be also interesting on x86-64 when using bts can use a smaller
> immediate than the now used orq and thus improve instruction size (but it
> clobbers flags).
All arithmetic instructions (including orq) clobber flags on x86.

[Bug target/120553] Improve code to select between -1 and various values

2025-06-04 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120553

Richard Biener  changed:

   What|Removed |Added

Version|unknown |16.0
   Keywords||missed-optimization
 Target|riscv   |riscv, x86-64

--- Comment #1 from Richard Biener  ---
might be also interesting on x86-64 when using bts can use a smaller immediate
than the now used orq and thus improve instruction size (but it clobbers
flags).