[Bug target/122769] RISC-V: Short branch can be elided for alu op
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122769 Vineet Gupta changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #17 from Vineet Gupta --- Fixed on trunk for gcc-16
[Bug target/122769] RISC-V: Short branch can be elided for alu op
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122769 --- Comment #16 from GCC Commits --- The master branch has been updated by Vineet Gupta : https://gcc.gnu.org/g:2a84a753afcf376a5fc0339c5ce228c13cd7 commit r16-6332-g2a84a753afcf376a5fc0339c5ce228c13cd7 Author: Vineet Gupta Date: Mon Dec 22 08:54:06 2025 -0800 ifcvt: cond zero arith: elide short forward branch for signed GE 0 comparison [PR122769] BeforeAfter -+-- bge a0,zero,.L2| slti a0,a0,0 | czero.eqz a0,a0,a0 xor a1,a1,a3 | xor a0,a0,a0 .L2 | mv a0,a1 | ret| ret This is what all the prev NFC patches have been preparing to get to. Currently the cond arith code only handles EQ/NE zero conditions missing ifcvt optimization for cases such as GE zero, as show in example above. This is due to the limitation of noce_emit_czero () so switch to noce_emit_cmove () which can handle conditions other than EQ/NE and if needed generate additional supporting insns such as SLT. This also allows us to remove the constraint at the entry to limit to EQ/NE conditions, improving ifcvt outcomes in general. PR target/122769 gcc/ChangeLog: * ifcvt.cc (noce_try_cond_zero_arith): Use noce_emit_cmove. Delete noce_emit_czero () no longer used. gcc/testsuite/ChangeLog: * gcc.target/riscv/pr122769.c: New test. Co-authored-by: Philipp Tomsich Signed-off-by: Vineet Gupta
[Bug target/122769] RISC-V: Short branch can be elided for alu op
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122769 --- Comment #15 from Jeffrey A. Law --- We've got code somewhere to canonicalize other comparison codes. It may be buried in that patch I attached. It's certainly important to do so.
[Bug target/122769] RISC-V: Short branch can be elided for alu op
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122769
--- Comment #14 from Vineet Gupta ---
Just as a proof of concept and for completeness some update on approach #1:
i.e. Expand time tweaking if_then_else GE 0 to if_then_else EQ 0 (with an
additional SLT)
The patch below fixes the orig reported test.
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
@@ -5308,6 +5308,25 @@ riscv_emit_int_compare (enum rtx_code *code, rtx *op0,
rtx *op1,
*op0 = force_reg (word_mode, *op0);
if (*op1 != const0_rtx)
*op1 = force_reg (word_mode, *op1);
+
+ /* ifcvt/czero has better outcomes with EQ operator, so convert
+ if (a >= 0)
+ to
+ slti x, a, 0
+ if (x == 0)
+ slt patern expects input operands to be "X", thus needs to happen after
+riscv_extend_comparands() which extends ops to DImode if rv64. */
+
+ if (*code == GE
+ && (REG_P (*op0))
+ && *op1 == const0_rtx)
+{
+ rtx tmp = gen_reg_rtx (GET_MODE (*op0));
+ emit_insn (gen_slt_3 (LT, GET_MODE (*op0), GET_MODE (*op0), tmp, *op0,
*op1));
+ *code = EQ;
+ *op0 = tmp;
+ return;
+}
It causes 8 additional testsuite failures, but those tests were deliberately
trying to inhibit CZERO for sign bit which runs counter to the orig ask here.
2025-08-22 ebbeaf490c56 [PR rtl-optimization/120553] Improve selecting between
constants based on sign bit test
2025-06-05 1d90f8c7933e [RISC-V] Improve sequences to generate -1, 1 in some
cases.
[Bug target/122769] RISC-V: Short branch can be elided for alu op
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122769 --- Comment #13 from Jeffrey A. Law --- The VRULL subreg bits are more for the outer expression code, not internal expressions. Your "a" and "b" look very much handleable -- and it's important we do. What we want to generate is something like (set (reg:DI temp) (if_then_else (condition) (reg:DI 138) (const_int 0))) (set (dest) (ashift:DI (reg:DI 137) (subreg:QI (reg:DI temp)) THe subreg doesn't really come into play.
[Bug target/122769] RISC-V: Short branch can be elided for alu op
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122769
--- Comment #12 from Vineet Gupta ---
So for now, the test in question is in
gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c
-O2 -march=rv64gc_zbb_zicond
long
test_ShiftLeft_eqz (long x, long y, long z, long c)
{
if (c)
x = y << z;
else
x = y;
return x;
}
It is failing in ifcvt: noce_bbs_ok_for_cond_zero_arith () because it only
handles REG () for "a" and "b", while we have a subreg.
(gdb) call debug_rtx(cond)
(eq (reg/v:DI 139 [ c ])
(const_int 0 [0]))
(gdb) call debug_rtx(a)
(ashift:DI (reg/v:DI 137 [ y ])
(subreg:QI (reg/v:DI 138 [ z ]) 0))
(gdb) call debug_rtx(b)
(reg/v:DI 137 [ y ])
Looks like we need to extract subreg bits from Vrull patch afterall.
[Bug target/122769] RISC-V: Short branch can be elided for alu op
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122769
--- Comment #11 from Vineet Gupta ---
FTR, just for completeness it seems ifcvt bits to handle cond arithmetic with
zero were added back in 2023 (for gcc-14 release), specifically patch 2/5 below
for shift ops as expected by the test in the PR.
2023-12-10 ec201e2a6021 [PATCH 3/5] [ifcvt] optimize x=c ? (y AND z) : y by
RISC-V Zicond like insns
2023-12-10 5a4faf915575 [PATCH 2/5] [ifcvt] optimize x=c ? (y shift_op z):y by
RISC-V Zicond like insns
2023-12-07 9f7ad5eff3bf [PATCH 1/5][V3][ifcvt] optimize x=c ? (y op z) : y by
RISC-V Zicond like insns
Patch 2/5 even added tests to that effect:
e.g.
diff --git a/gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c
b/gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c
+long
+test_ShiftLeft_eqz (long x, long y, long z, long c)
+{
+ if (c)
+x = y << z;
+ else
+x = y;
+ return x;
+}
But weordly those tests don't codegen czero on trunk and even going back to
that commit itself (actually expected czero's wasn't bumped as part of that
change). So clearly something got lost or wasn't completely working.
[Bug target/122769] RISC-V: Short branch can be elided for alu op
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122769 --- Comment #10 from Jeffrey A. Law --- So this may be a simple matter of exending the current code. Right now it uses noce_emit_czero which ultimately is going to require the test to be EQ/NE. But instead I think we can use something like this to be far more general: /* First emit a conditional zero for one operand (such as the shift count). AND requires a bit of special handling, but is worth the effort as it happens regularly. */ target = gen_reg_rtx (mode); target = noce_emit_cmove (if_info, target, GET_CODE (if_info->cond), XEXP (if_info->cond, 0), XEXP (if_info->cond, 1), op != AND ? arg1 : const0_rtx, op != AND ? const0_rtx : arg0); The advantage is that routine is it uss emit_conditional_move which under the hood will run through the RISC-V expansion code which handles emitting an sCC style insn if presented with something other than a simple equality test against zero. Naturally that other routine (ok_for_condzero_arith or whatever it is) would need adjustment as well. That's what we're using in that patch I attached to this BZ. Let me play around a bit with that.
[Bug target/122769] RISC-V: Short branch can be elided for alu op
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122769 --- Comment #9 from Jeffrey A. Law --- The right solution is proper support for conditional zero idioms in ifcvt.cc. I've hinted we have that here but that I'm terribly unhappy with the state of that code, but could make it available for folks to take a stab at cleaning up. This is originally from VRULL and I've got Phillip's OK to contribute, so we can certainly beat on it if we feel it's salvagable. So if we have a conditonal ior, xor, shift, rotate, add, subtract, etc we can use czero to conditionally zero an operand, then do an unconditional ior, xor, whatever. For this test we get: sltia0,a0,0 czero.eqz a0,a3,a0 xor a0,a1,a0 These will usually (always?) be more efficient than the generalized condititional move sequences, so we placed this routine before the generalized conditional move support in ifcvt.cc. I could even see extracting out the certain things that are fairly clean and leaving behind things like extension and subreg support. Anyway, bits attached.
[Bug target/122769] RISC-V: Short branch can be elided for alu op
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122769 --- Comment #8 from Jeffrey A. Law --- Created attachment 62963 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=62963&action=edit condzero_arith patch from VRULL+Ventana
[Bug target/122769] RISC-V: Short branch can be elided for alu op
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122769
--- Comment #7 from Vineet Gupta ---
Some more trials and tribulations
riscv_emit_int_compare
if ((*code == GE)
&& (REG_P (*op0) || SUBREG_P (*op0))
&& *op1 == const0_rtx)
{
emit_insn (gen_slt_3 (LT, GET_MODE (*op0), GET_MODE (*op0), *op0, *op0,
*op1));
*code = EQ;
return;
}
Which requires relaxing the mode in slt pattern (below is just a hack as we
don't need two modes now, but this allows existing callers of gen_slt_3 to
remian unchanged.
-(define_insn "@slt_3"
+(define_insn "@slt_3"
[(set (match_operand:GPR 0 "register_operand" "= r")
- (any_lt:GPR (match_operand:X 1 "register_operand" " r")
- (match_operand:X 2 "arith_operand"" rI")))]
+ (any_lt:GPR (match_operand:GPR 1 "register_operand" " r")
+ (match_operand:GPR 2 "arith_operand"" rI")))]
This in turn requires the cbranch operands to be relaxed as well.
(define_insn "*branch"
[(set (pc)
(if_then_else
(match_operator 1 "ordered_comparison_operator"
-[(match_operand:X 2 "register_operand" "r")
- (match_operand:X 3 "reg_or_0_operand" "rJ")])
+[(match_operand:GPR 2 "register_operand" "r")
+ (match_operand:GPR 3 "reg_or_0_operand" "rJ")])
Per discussions on IRC, the RTX_COMPARE patterns need not be X (64-bit only for
rv64gc), so this approach might not be totally absurd, however there must be
easier ways such as using a temp DI mode reg before calling the slt pattern.
[Bug target/122769] RISC-V: Short branch can be elided for alu op
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122769 --- Comment #6 from Vineet Gupta --- And with a different attempt: + emit_insn (gen_slt_3 (LT, Xmode, Xmode, *op0, *op0, *op1)); I see a slightly diffeent error. error: unrecognizable insn: (insn 8 7 9 2 (set (reg:SI 140) (lt:DI (reg:SI 140) (const_int 0 [0]))) (nil)) during RTL pass: vregs I have a feeling we need to fix the the mode iterators in these patterns.
[Bug target/122769] RISC-V: Short branch can be elided for alu op
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122769
--- Comment #5 from Vineet Gupta ---
ifcvt: noce_try_cond_zero_arith () currently only handles COND EQ and NE 0.
while the condition we get due to testing sign-bit is GE 0.
It seems we can transform this to EQ 0 with an intermediate SLTI
The patch below fixes the orig test...
+ /* ifcvt/czero has better outcomes with EQ operator, so convert
+ "if (a >= 0)"
+ to
+ "slti a, a, 0"
+ "if (a == 0)". */
+ if (*code == GE
+ && REG_P (*op0)
+ && *op1 == CONST0_RTX (GET_MODE (*op0)))
+{
+ if (GET_MODE (*op0) == DImode)
+emit_insn (gen_slt_didi3 (*op0, *op0, *op1));
+ else
+emit_insn (gen_slt_sisi3 (*op0, *op0, *op1));
+ *code = EQ;
+ return;
+}
... but trips up when building glibc
error: unrecognizable insn:
(insn 11 10 12 2 (set (reg:SI 165)
(lt:SI (reg:SI 165)
(const_int 0 [0])))
The existing mode iterators in slt pattern are a bit confusing though
(define_insn "@slt_3"
[(set (match_operand:GPR 0 "register_operand" "= r")
(any_lt:GPR (match_operand:X 1 "register_operand" " r")
(match_operand:X 2 "arith_operand"" rI")))]
""
"slt%i2\t%0,%1,%2"
[(set_attr "type" "slt")
(set_attr "mode" "")])
specifically "any_lt:GPR"
[Bug target/122769] RISC-V: Short branch can be elided for alu op
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122769
--- Comment #4 from Andrew Pinski ---
(In reply to Vineet Gupta from comment #3)
> (In reply to Andrew Pinski from comment #2)
> > .
> >
> > sltia0,a0,0
> > czero.eqz a0,a3,a0
> > xor a0,a0,a1
> >
> > Which I can get via:
> > ```
> > uint64_t f1(uint64_t val, uint64_t val2, int bits, uint64_t n) {
> > uint64_t t = -(((int64_t)val) < 0);
> > val2 ^= (t & n);
> > return val2;
> > }
> > ```
> > Which does the right thing there ...
>
> That's because it already expands to a czero
>
> (insn 11 10 12 2 (set (reg:SI 144 [ _9 ])
> (if_then_else:SI (ne:DI (reg:DI 145)
> (const_int 0 [0]))
> (reg:SI 146)
> (const_int 0 [0]))) "test4.c":6:14 40364 {*czero.eqz.sidi}
> (nil))
>
> OTOH, llvm seems to be transforming
>
> if (val & (1ULL << 63))
> val2 ^= n;
>
> to something like (note that @n has actually moved where its been used)
>
> uint64_t tmp = (val >> 63) & n;
> val2 ^= tmp;
>
>
> I need some guidance as to where to tackle this ?
In ifcvt.cc is the best place to start. The next place is figuring out how to
hook up some target specific match patterns to phiopt for the late one (you
can/should use fold_before_rtl_expansion_p () function to say it is late).
[Bug target/122769] RISC-V: Short branch can be elided for alu op
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122769
--- Comment #3 from Vineet Gupta ---
(In reply to Andrew Pinski from comment #2)
> .
>
> sltia0,a0,0
> czero.eqz a0,a3,a0
> xor a0,a0,a1
>
> Which I can get via:
> ```
> uint64_t f1(uint64_t val, uint64_t val2, int bits, uint64_t n) {
> uint64_t t = -(((int64_t)val) < 0);
> val2 ^= (t & n);
> return val2;
> }
> ```
> Which does the right thing there ...
That's because it already expands to a czero
(insn 11 10 12 2 (set (reg:SI 144 [ _9 ])
(if_then_else:SI (ne:DI (reg:DI 145)
(const_int 0 [0]))
(reg:SI 146)
(const_int 0 [0]))) "test4.c":6:14 40364 {*czero.eqz.sidi}
(nil))
OTOH, llvm seems to be transforming
if (val & (1ULL << 63))
val2 ^= n;
to something like (note that @n has actually moved where its been used)
uint64_t tmp = (val >> 63) & n;
val2 ^= tmp;
I need some guidance as to where to tackle this ?
[Bug target/122769] RISC-V: Short branch can be elided for alu op
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122769
Andrew Pinski changed:
What|Removed |Added
Status|UNCONFIRMED |NEW
Last reconfirmed||2025-11-20
Ever confirmed|0 |1
--- Comment #2 from Andrew Pinski ---
.
sltia0,a0,0
czero.eqz a0,a3,a0
xor a0,a0,a1
Which I can get via:
```
uint64_t f1(uint64_t val, uint64_t val2, int bits, uint64_t n) {
uint64_t t = -(((int64_t)val) < 0);
val2 ^= (t&n);
return val2;
}
```
Which does the right thing there ...
[Bug target/122769] RISC-V: Short branch can be elided for alu op
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122769 --- Comment #1 from Vineet Gupta --- This doesn't seem costing related as I've hacked branch_cost in generic tune structure to absurd high number (16 vs. 4). For the optim-OK case (1 << 62), ifcvt is able to do the transform via noce_try_cond_zero_arith(). ``` (eq (reg:DI 141 [ _1 ]) (const_int 0 [0])) ``` For the missed-optim case (1 << 63), the same call chain fails noce_try_cond_zero_arith() noce_bbs_ok_for_cond_zero_arith() if (GET_CODE (cond) != NE && GET_CODE (cond) != EQ) ``` (ge (reg/v:DI 137 [ val ]) (const_int 0 [0])) ``` The fallthough noce_try_cmove_arith () tries to transform this as -1 based which fails costing in canonicalize_comparison () ``` IF-THEN-JOIN block found, pass 1, test 3, then 4, join 5 ;; cmp: lt, old cst: (const_int 0 [0]) new cst: (const_int -1 [0x]) ;; old cst cost: 0, new cst cost: 4 ;; cmp: ge, old cst: (const_int 0 [0]) new cst: (const_int -1 [0x]) ;; old cst cost: 0, new cst cost: 4 ```
