[Bug target/122769] RISC-V: Short branch can be elided for alu op

2025-12-22 Thread vineetg at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122769

Vineet Gupta  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #17 from Vineet Gupta  ---
Fixed on trunk for gcc-16

[Bug target/122769] RISC-V: Short branch can be elided for alu op

2025-12-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122769

--- Comment #16 from GCC Commits  ---
The master branch has been updated by Vineet Gupta :

https://gcc.gnu.org/g:2a84a753afcf376a5fc0339c5ce228c13cd7

commit r16-6332-g2a84a753afcf376a5fc0339c5ce228c13cd7
Author: Vineet Gupta 
Date:   Mon Dec 22 08:54:06 2025 -0800

ifcvt: cond zero arith: elide short forward branch for signed GE 0
comparison [PR122769]

 BeforeAfter
  -+--
bge a0,zero,.L2| slti  a0,a0,0
   | czero.eqz a0,a0,a0
xor a1,a1,a3   | xor   a0,a0,a0
  .L2  |
mv  a0,a1  |
ret| ret

This is what all the prev NFC patches have been preparing to get to.

Currently the cond arith code only handles EQ/NE zero conditions missing
ifcvt optimization for cases such as GE zero, as show in example above.
This is due to the limitation of noce_emit_czero () so switch to
noce_emit_cmove () which can handle conditions other than EQ/NE and
if needed generate additional supporting insns such as SLT.

This also allows us to remove the constraint at the entry to limit to EQ/NE
conditions, improving ifcvt outcomes in general.

PR target/122769

gcc/ChangeLog:

* ifcvt.cc (noce_try_cond_zero_arith): Use noce_emit_cmove.
Delete noce_emit_czero () no longer used.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr122769.c: New test.

Co-authored-by: Philipp Tomsich 
Signed-off-by: Vineet Gupta 

[Bug target/122769] RISC-V: Short branch can be elided for alu op

2025-12-16 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122769

--- Comment #15 from Jeffrey A. Law  ---
We've got code somewhere to canonicalize other comparison codes.  It may be
buried in that patch I attached.  It's certainly important to do so.

[Bug target/122769] RISC-V: Short branch can be elided for alu op

2025-12-08 Thread vineetg at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122769

--- Comment #14 from Vineet Gupta  ---
Just as a proof of concept and for completeness some  update on approach #1:
i.e. Expand time tweaking if_then_else GE 0 to if_then_else EQ 0 (with an
additional SLT)

The patch below fixes the orig reported test.

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
@@ -5308,6 +5308,25 @@ riscv_emit_int_compare (enum rtx_code *code, rtx *op0,
rtx *op1,
   *op0 = force_reg (word_mode, *op0);
   if (*op1 != const0_rtx)
 *op1 = force_reg (word_mode, *op1);
+
+  /* ifcvt/czero has better outcomes with EQ operator, so convert
+   if (a >= 0)
+   to
+   slti x, a, 0
+   if (x == 0)
+ slt patern expects input operands to be "X", thus needs to happen after
+riscv_extend_comparands() which extends ops to DImode if rv64.  */
+
+  if (*code == GE
+  && (REG_P (*op0))
+  && *op1 == const0_rtx)
+{
+  rtx tmp = gen_reg_rtx (GET_MODE (*op0));
+  emit_insn (gen_slt_3 (LT, GET_MODE (*op0), GET_MODE (*op0), tmp, *op0,
*op1));
+  *code = EQ;
+  *op0 = tmp;
+  return;
+}

It causes 8 additional testsuite failures, but those tests were deliberately
trying to inhibit CZERO for sign bit which runs counter to the orig ask here.

2025-08-22 ebbeaf490c56 [PR rtl-optimization/120553] Improve selecting between
constants based on sign bit test  
2025-06-05 1d90f8c7933e [RISC-V] Improve sequences to generate -1, 1 in some
cases.

[Bug target/122769] RISC-V: Short branch can be elided for alu op

2025-12-02 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122769

--- Comment #13 from Jeffrey A. Law  ---
The VRULL subreg bits are more for the outer expression code, not internal
expressions.  Your "a" and "b" look very much handleable -- and it's important
we do.

What we want to generate is something like

(set (reg:DI temp) (if_then_else (condition)
  (reg:DI 138)
  (const_int 0)))
(set (dest) (ashift:DI (reg:DI 137) (subreg:QI (reg:DI temp))

THe subreg doesn't really come into play.

[Bug target/122769] RISC-V: Short branch can be elided for alu op

2025-12-02 Thread vineetg at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122769

--- Comment #12 from Vineet Gupta  ---
So for now, the test in question is in
gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c

-O2 -march=rv64gc_zbb_zicond

long
test_ShiftLeft_eqz (long x, long y, long z, long c)
{
  if (c)
x = y << z;
  else
x = y;
  return x;
}

It is failing in ifcvt: noce_bbs_ok_for_cond_zero_arith () because it only
handles REG () for "a" and "b", while we have a subreg.

(gdb) call debug_rtx(cond)
(eq (reg/v:DI 139 [ c ])
(const_int 0 [0]))

(gdb) call debug_rtx(a)
(ashift:DI (reg/v:DI 137 [ y ])
(subreg:QI (reg/v:DI 138 [ z ]) 0))

(gdb) call debug_rtx(b)
(reg/v:DI 137 [ y ])

Looks like we need to extract subreg bits from Vrull patch afterall.

[Bug target/122769] RISC-V: Short branch can be elided for alu op

2025-12-02 Thread vineetg at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122769

--- Comment #11 from Vineet Gupta  ---
FTR, just for completeness it seems ifcvt bits to handle cond arithmetic with
zero were added back in 2023 (for gcc-14 release), specifically patch 2/5 below
for shift ops as expected by the test in the PR.

2023-12-10 ec201e2a6021 [PATCH 3/5] [ifcvt] optimize x=c ? (y AND z) : y by
RISC-V Zicond like insns  
2023-12-10 5a4faf915575 [PATCH 2/5] [ifcvt] optimize x=c ? (y shift_op z):y by
RISC-V Zicond like insns  
2023-12-07 9f7ad5eff3bf [PATCH 1/5][V3][ifcvt] optimize x=c ? (y op z) : y by
RISC-V Zicond like insns  

Patch 2/5 even added tests to that effect:
e.g.
diff --git a/gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c
b/gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c

+long
+test_ShiftLeft_eqz (long x, long y, long z, long c)
+{
+  if (c)
+x = y << z;
+  else
+x = y;
+  return x;
+}

But weordly those tests don't codegen czero on trunk and even going back to
that commit itself (actually expected czero's wasn't bumped as part of that
change). So clearly something got lost or wasn't completely working.

[Bug target/122769] RISC-V: Short branch can be elided for alu op

2025-12-02 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122769

--- Comment #10 from Jeffrey A. Law  ---
So this may be a simple matter of exending the current code.

Right now it uses noce_emit_czero which ultimately is going to require the test
to be EQ/NE.  But instead I think we can use something like this to be far more
general:

  /* First emit a conditional zero for one operand (such as the shift count).
 AND requires a bit of special handling, but is worth the effort as it
 happens regularly.  */
  target = gen_reg_rtx (mode);
  target = noce_emit_cmove (if_info, target, GET_CODE (if_info->cond),
XEXP (if_info->cond, 0), XEXP (if_info->cond, 1),
op != AND ? arg1 : const0_rtx, op != AND ?
const0_rtx : arg0);


The advantage is that routine is it uss emit_conditional_move which under the
hood will run through the RISC-V expansion code which handles emitting an sCC
style insn if presented with something other than a simple equality test
against zero.  Naturally that other routine (ok_for_condzero_arith or whatever
it is) would need adjustment as well.

That's what we're using in that patch I attached to this BZ.  Let me play
around a bit with that.

[Bug target/122769] RISC-V: Short branch can be elided for alu op

2025-12-01 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122769

--- Comment #9 from Jeffrey A. Law  ---
The right solution is proper support for conditional zero idioms in ifcvt.cc. 
I've hinted we have that here but that I'm terribly unhappy with the state of
that code, but could make it available for folks to take a stab at cleaning up.

This is originally from VRULL and I've got Phillip's OK to contribute, so we
can certainly beat on it if we feel it's salvagable.

So if we have a conditonal ior, xor, shift, rotate, add, subtract, etc we can
use czero to conditionally zero an operand, then do an unconditional ior, xor,
whatever.  For this test we get:

sltia0,a0,0
czero.eqz   a0,a3,a0
xor a0,a1,a0


These will usually (always?) be more efficient than the generalized
condititional move sequences, so we placed this routine before the generalized
conditional move support in ifcvt.cc.

I could even see extracting out the certain things that are fairly clean and
leaving behind things like extension and subreg support.

Anyway, bits attached.

[Bug target/122769] RISC-V: Short branch can be elided for alu op

2025-12-01 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122769

--- Comment #8 from Jeffrey A. Law  ---
Created attachment 62963
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=62963&action=edit
condzero_arith patch from VRULL+Ventana

[Bug target/122769] RISC-V: Short branch can be elided for alu op

2025-12-01 Thread vineetg at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122769

--- Comment #7 from Vineet Gupta  ---
Some more trials and tribulations

riscv_emit_int_compare

  if ((*code == GE)
  && (REG_P (*op0) || SUBREG_P (*op0))
  && *op1 == const0_rtx)
{
  emit_insn (gen_slt_3 (LT, GET_MODE (*op0), GET_MODE (*op0), *op0, *op0,
*op1));
  *code = EQ;
  return;
}

Which requires relaxing the mode in slt pattern (below is just a hack as we
don't need two modes now, but this allows existing callers of gen_slt_3 to
remian unchanged.

-(define_insn "@slt_3"
+(define_insn "@slt_3"
   [(set (match_operand:GPR   0 "register_operand" "= r")
-   (any_lt:GPR (match_operand:X 1 "register_operand" "  r")
-   (match_operand:X 2 "arith_operand"" rI")))]
+   (any_lt:GPR (match_operand:GPR 1 "register_operand" "  r")
+   (match_operand:GPR 2 "arith_operand"" rI")))]

This in turn requires the cbranch operands to be relaxed as well.

(define_insn "*branch"
   [(set (pc)
(if_then_else
 (match_operator 1 "ordered_comparison_operator"
-[(match_operand:X 2 "register_operand" "r")
- (match_operand:X 3 "reg_or_0_operand" "rJ")])
+[(match_operand:GPR 2 "register_operand" "r")
+ (match_operand:GPR 3 "reg_or_0_operand" "rJ")])

Per discussions on IRC, the RTX_COMPARE patterns need not be X (64-bit only for
rv64gc), so this approach might not be totally absurd, however there must be
easier ways such as using a temp DI mode reg before calling the slt pattern.

[Bug target/122769] RISC-V: Short branch can be elided for alu op

2025-11-24 Thread vineetg at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122769

--- Comment #6 from Vineet Gupta  ---
And with a different attempt:

+  emit_insn (gen_slt_3 (LT, Xmode, Xmode, *op0, *op0, *op1));

I see a slightly diffeent error.

error: unrecognizable insn:

(insn 8 7 9 2 (set (reg:SI 140)
(lt:DI (reg:SI 140)
(const_int 0 [0])))
 (nil))
during RTL pass: vregs

I have a feeling we need to fix the the mode iterators in these patterns.

[Bug target/122769] RISC-V: Short branch can be elided for alu op

2025-11-24 Thread vineetg at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122769

--- Comment #5 from Vineet Gupta  ---
ifcvt: noce_try_cond_zero_arith () currently only handles COND EQ and NE 0.
while the condition we get due to testing sign-bit is GE 0.
It seems we can transform this to EQ 0 with an intermediate SLTI

The patch below fixes the orig test...

+  /* ifcvt/czero has better outcomes with EQ operator, so convert
+   "if (a >= 0)"
+ to
+   "slti a, a, 0"
+   "if (a == 0)".  */
+  if (*code == GE
+  && REG_P (*op0)
+  && *op1 == CONST0_RTX (GET_MODE (*op0)))
+{
+  if (GET_MODE (*op0) == DImode)
+emit_insn (gen_slt_didi3 (*op0, *op0, *op1));
+  else
+emit_insn (gen_slt_sisi3 (*op0, *op0, *op1));
+  *code = EQ;
+  return;
+}

... but trips up when building glibc

error: unrecognizable insn:

(insn 11 10 12 2 (set (reg:SI 165)
(lt:SI (reg:SI 165)
(const_int 0 [0])))

The existing mode iterators in slt pattern are a bit confusing though

(define_insn "@slt_3"
  [(set (match_operand:GPR   0 "register_operand" "= r")
(any_lt:GPR (match_operand:X 1 "register_operand" "  r")
(match_operand:X 2 "arith_operand"" rI")))]
  ""
  "slt%i2\t%0,%1,%2"
  [(set_attr "type" "slt")
   (set_attr "mode" "")])

specifically "any_lt:GPR"

[Bug target/122769] RISC-V: Short branch can be elided for alu op

2025-11-20 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122769

--- Comment #4 from Andrew Pinski  ---
(In reply to Vineet Gupta from comment #3)
> (In reply to Andrew Pinski from comment #2)
> > .
> > 
> > sltia0,a0,0
> > czero.eqz   a0,a3,a0
> > xor a0,a0,a1
> > 
> > Which I can get via:
> > ```
> > uint64_t f1(uint64_t val, uint64_t val2, int bits, uint64_t n) {
> >   uint64_t t = -(((int64_t)val) < 0);
> >   val2 ^= (t & n);
> >   return val2;
> > }
> > ```
> > Which does the right thing there ...
> 
> That's because it already expands to a czero
> 
> (insn 11 10 12 2 (set (reg:SI 144 [ _9 ])
> (if_then_else:SI (ne:DI (reg:DI 145)
> (const_int 0 [0]))
> (reg:SI 146)
> (const_int 0 [0]))) "test4.c":6:14 40364 {*czero.eqz.sidi}
>  (nil))
> 
> OTOH, llvm seems to be transforming
> 
>   if (val & (1ULL << 63))
>  val2 ^= n;
> 
> to something like (note that @n has actually moved where its been used)
> 
>   uint64_t tmp = (val >> 63) & n;
>   val2 ^= tmp;
> 
>   
> I need some guidance as to where to tackle this ?

In ifcvt.cc is the best place to start. The next place is figuring out how to
hook up some target specific match patterns to phiopt for the late one (you
can/should use fold_before_rtl_expansion_p () function to say it is late).

[Bug target/122769] RISC-V: Short branch can be elided for alu op

2025-11-20 Thread vineetg at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122769

--- Comment #3 from Vineet Gupta  ---
(In reply to Andrew Pinski from comment #2)
> .
> 
> sltia0,a0,0
> czero.eqz   a0,a3,a0
> xor a0,a0,a1
> 
> Which I can get via:
> ```
> uint64_t f1(uint64_t val, uint64_t val2, int bits, uint64_t n) {
>   uint64_t t = -(((int64_t)val) < 0);
>   val2 ^= (t & n);
>   return val2;
> }
> ```
> Which does the right thing there ...

That's because it already expands to a czero

(insn 11 10 12 2 (set (reg:SI 144 [ _9 ])
(if_then_else:SI (ne:DI (reg:DI 145)
(const_int 0 [0]))
(reg:SI 146)
(const_int 0 [0]))) "test4.c":6:14 40364 {*czero.eqz.sidi}
 (nil))

OTOH, llvm seems to be transforming

  if (val & (1ULL << 63))
 val2 ^= n;

to something like (note that @n has actually moved where its been used)

  uint64_t tmp = (val >> 63) & n;
  val2 ^= tmp;


I need some guidance as to where to tackle this ?

[Bug target/122769] RISC-V: Short branch can be elided for alu op

2025-11-19 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122769

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2025-11-20
 Ever confirmed|0   |1

--- Comment #2 from Andrew Pinski  ---
.

sltia0,a0,0
czero.eqz   a0,a3,a0
xor a0,a0,a1

Which I can get via:
```
uint64_t f1(uint64_t val, uint64_t val2, int bits, uint64_t n) {
  uint64_t t = -(((int64_t)val) < 0);
  val2 ^= (t&n);
  return val2;
}
```
Which does the right thing there ...

[Bug target/122769] RISC-V: Short branch can be elided for alu op

2025-11-19 Thread vineetg at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122769

--- Comment #1 from Vineet Gupta  ---
This doesn't seem costing related as I've hacked branch_cost in generic tune
structure to absurd high number (16 vs. 4).

For the optim-OK case (1 << 62), ifcvt is able to do the transform via
noce_try_cond_zero_arith().
```
  (eq (reg:DI 141 [ _1 ])
(const_int 0 [0]))
```

For the missed-optim case (1 << 63), the same call chain fails

noce_try_cond_zero_arith()
   noce_bbs_ok_for_cond_zero_arith()
 if (GET_CODE (cond) != NE && GET_CODE (cond) != EQ)

```
  (ge (reg/v:DI 137 [ val ])
(const_int 0 [0]))
```

The fallthough noce_try_cmove_arith () tries to transform this as -1 based
which fails costing in canonicalize_comparison ()

```
IF-THEN-JOIN block found, pass 1, test 3, then 4, join 5
;; cmp: lt, old cst: (const_int 0 [0]) new cst: (const_int -1
[0x])
;; old cst cost: 0, new cst cost: 4
;; cmp: ge, old cst: (const_int 0 [0]) new cst: (const_int -1
[0x])
;; old cst cost: 0, new cst cost: 4
```