[gcc(refs/vendors/riscv/heads/gcc-15-with-riscv-opts)] [riscv] vec_dup immediate constants in pred_broadcast expand [PR118182]

2026-01-29 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:a86b40023fc0826760d1545e42cfa78269fbedb0

commit a86b40023fc0826760d1545e42cfa78269fbedb0
Author: Alexandre Oliva 
Date:   Mon Apr 21 22:48:55 2025 -0300

[riscv] vec_dup immediate constants in pred_broadcast expand [PR118182]

pr118182-2.c fails on gcc-14 because it lacks the late_combine passes,
particularly the one that runs after register allocation.

Even in the trunk, the predicate broadcast for the add reduction is
expanded and register-allocated as _zvfh, taking up an unneeded scalar
register to hold the constant to be vec_duplicated.

It is the late combine pass after register allocation that substitutes
this unneeded scalar register into the vec_duplicate, resolving to the
_zero or _imm insns.

It's easy enough and more efficient to expand pred_broadcast to the
insns that take the already-duplicated vector constant, when the
operands satisfy the predicates of the _zero or _imm insns.

for  gcc/ChangeLog

PR target/118182
* config/riscv/vector.md (@pred_broadcast): Expand to
_zero and _imm variants without vec_duplicate.

(cherry picked from commit 14fa625bcb91028cb97f3575d2e394401bbb4a3a)

Diff:
---
 gcc/config/riscv/vector.md | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index e92a51b5aab5..a57468c574d3 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -2139,18 +2139,34 @@
 (match_operand 7 "const_int_operand")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
- (vec_duplicate:V_VLS
-   (match_operand: 3 "direct_broadcast_operand"))
+ ;; (vec_duplicate:V_VLS ;; wrapper activated by wrap_vec_dup below.
+ (match_operand: 3 "direct_broadcast_operand") ;; )
  (match_operand:V_VLS 2 "vector_merge_operand")))]
   "TARGET_VECTOR"
 {
   /* Transform vmv.v.x/vfmv.v.f (avl = 1) into vmv.s.x since vmv.s.x/vfmv.s.f
  has better chances to do vsetvl fusion in vsetvl pass.  */
+  bool wrap_vec_dup = true;
+  rtx vec_cst = NULL_RTX;
   if (riscv_vector::splat_to_scalar_move_p (operands))
 {
   operands[1] = riscv_vector::gen_scalar_move_mask (mode);
   operands[3] = force_reg (mode, operands[3]);
 }
+  else if (immediate_operand (operands[3], mode)
+  && (vec_cst = gen_const_vec_duplicate (mode, operands[3]))
+  && (/* -> pred_broadcast_zero */
+  (vector_least_significant_set_mask_operand (operands[1],
+  mode)
+   && vector_const_0_operand (vec_cst, mode))
+  || (/* pred_broadcast_imm */
+  vector_all_trues_mask_operand (operands[1], mode)
+  && vector_const_int_or_double_0_operand (vec_cst,
+   mode
+{
+  operands[3] = vec_cst;
+  wrap_vec_dup = false;
+}
   /* Handle vmv.s.x instruction (Wb1 mask) which has memory scalar.  */
   else if (satisfies_constraint_Wdm (operands[3]))
 {
@@ -2194,6 +2210,8 @@
 ;
   else
 operands[3] = force_reg (mode, operands[3]);
+  if (wrap_vec_dup)
+operands[3] = gen_rtx_VEC_DUPLICATE (mode, operands[3]);
 })
 
 (define_insn_and_split "*pred_broadcast"


[gcc(refs/vendors/riscv/heads/gcc-15-with-riscv-opts)] [riscv] vec_dup immediate constants in pred_broadcast expand [PR118182]

2026-01-16 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:3a1daa0b4857ffb26787ca893bf4e4e03ad8cb3b

commit 3a1daa0b4857ffb26787ca893bf4e4e03ad8cb3b
Author: Alexandre Oliva 
Date:   Mon Apr 21 22:48:55 2025 -0300

[riscv] vec_dup immediate constants in pred_broadcast expand [PR118182]

pr118182-2.c fails on gcc-14 because it lacks the late_combine passes,
particularly the one that runs after register allocation.

Even in the trunk, the predicate broadcast for the add reduction is
expanded and register-allocated as _zvfh, taking up an unneeded scalar
register to hold the constant to be vec_duplicated.

It is the late combine pass after register allocation that substitutes
this unneeded scalar register into the vec_duplicate, resolving to the
_zero or _imm insns.

It's easy enough and more efficient to expand pred_broadcast to the
insns that take the already-duplicated vector constant, when the
operands satisfy the predicates of the _zero or _imm insns.

for  gcc/ChangeLog

PR target/118182
* config/riscv/vector.md (@pred_broadcast): Expand to
_zero and _imm variants without vec_duplicate.

(cherry picked from commit 14fa625bcb91028cb97f3575d2e394401bbb4a3a)

Diff:
---
 gcc/config/riscv/vector.md | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index e92a51b5aab5..a57468c574d3 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -2139,18 +2139,34 @@
 (match_operand 7 "const_int_operand")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
- (vec_duplicate:V_VLS
-   (match_operand: 3 "direct_broadcast_operand"))
+ ;; (vec_duplicate:V_VLS ;; wrapper activated by wrap_vec_dup below.
+ (match_operand: 3 "direct_broadcast_operand") ;; )
  (match_operand:V_VLS 2 "vector_merge_operand")))]
   "TARGET_VECTOR"
 {
   /* Transform vmv.v.x/vfmv.v.f (avl = 1) into vmv.s.x since vmv.s.x/vfmv.s.f
  has better chances to do vsetvl fusion in vsetvl pass.  */
+  bool wrap_vec_dup = true;
+  rtx vec_cst = NULL_RTX;
   if (riscv_vector::splat_to_scalar_move_p (operands))
 {
   operands[1] = riscv_vector::gen_scalar_move_mask (mode);
   operands[3] = force_reg (mode, operands[3]);
 }
+  else if (immediate_operand (operands[3], mode)
+  && (vec_cst = gen_const_vec_duplicate (mode, operands[3]))
+  && (/* -> pred_broadcast_zero */
+  (vector_least_significant_set_mask_operand (operands[1],
+  mode)
+   && vector_const_0_operand (vec_cst, mode))
+  || (/* pred_broadcast_imm */
+  vector_all_trues_mask_operand (operands[1], mode)
+  && vector_const_int_or_double_0_operand (vec_cst,
+   mode
+{
+  operands[3] = vec_cst;
+  wrap_vec_dup = false;
+}
   /* Handle vmv.s.x instruction (Wb1 mask) which has memory scalar.  */
   else if (satisfies_constraint_Wdm (operands[3]))
 {
@@ -2194,6 +2210,8 @@
 ;
   else
 operands[3] = force_reg (mode, operands[3]);
+  if (wrap_vec_dup)
+operands[3] = gen_rtx_VEC_DUPLICATE (mode, operands[3]);
 })
 
 (define_insn_and_split "*pred_broadcast"


[gcc(refs/vendors/riscv/heads/gcc-15-with-riscv-opts)] [riscv] vec_dup immediate constants in pred_broadcast expand [PR118182]

2025-11-07 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:885f7fa2ab93d7f06e11f4fbfd3f242303f52286

commit 885f7fa2ab93d7f06e11f4fbfd3f242303f52286
Author: Alexandre Oliva 
Date:   Mon Apr 21 22:48:55 2025 -0300

[riscv] vec_dup immediate constants in pred_broadcast expand [PR118182]

pr118182-2.c fails on gcc-14 because it lacks the late_combine passes,
particularly the one that runs after register allocation.

Even in the trunk, the predicate broadcast for the add reduction is
expanded and register-allocated as _zvfh, taking up an unneeded scalar
register to hold the constant to be vec_duplicated.

It is the late combine pass after register allocation that substitutes
this unneeded scalar register into the vec_duplicate, resolving to the
_zero or _imm insns.

It's easy enough and more efficient to expand pred_broadcast to the
insns that take the already-duplicated vector constant, when the
operands satisfy the predicates of the _zero or _imm insns.

for  gcc/ChangeLog

PR target/118182
* config/riscv/vector.md (@pred_broadcast): Expand to
_zero and _imm variants without vec_duplicate.

(cherry picked from commit 14fa625bcb91028cb97f3575d2e394401bbb4a3a)

Diff:
---
 gcc/config/riscv/vector.md | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 9c2571899ae8..e4cfbb136aa7 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -2139,18 +2139,34 @@
 (match_operand 7 "const_int_operand")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
- (vec_duplicate:V_VLS
-   (match_operand: 3 "direct_broadcast_operand"))
+ ;; (vec_duplicate:V_VLS ;; wrapper activated by wrap_vec_dup below.
+ (match_operand: 3 "direct_broadcast_operand") ;; )
  (match_operand:V_VLS 2 "vector_merge_operand")))]
   "TARGET_VECTOR"
 {
   /* Transform vmv.v.x/vfmv.v.f (avl = 1) into vmv.s.x since vmv.s.x/vfmv.s.f
  has better chances to do vsetvl fusion in vsetvl pass.  */
+  bool wrap_vec_dup = true;
+  rtx vec_cst = NULL_RTX;
   if (riscv_vector::splat_to_scalar_move_p (operands))
 {
   operands[1] = riscv_vector::gen_scalar_move_mask (mode);
   operands[3] = force_reg (mode, operands[3]);
 }
+  else if (immediate_operand (operands[3], mode)
+  && (vec_cst = gen_const_vec_duplicate (mode, operands[3]))
+  && (/* -> pred_broadcast_zero */
+  (vector_least_significant_set_mask_operand (operands[1],
+  mode)
+   && vector_const_0_operand (vec_cst, mode))
+  || (/* pred_broadcast_imm */
+  vector_all_trues_mask_operand (operands[1], mode)
+  && vector_const_int_or_double_0_operand (vec_cst,
+   mode
+{
+  operands[3] = vec_cst;
+  wrap_vec_dup = false;
+}
   /* Handle vmv.s.x instruction (Wb1 mask) which has memory scalar.  */
   else if (satisfies_constraint_Wdm (operands[3]))
 {
@@ -2194,6 +2210,8 @@
 ;
   else
 operands[3] = force_reg (mode, operands[3]);
+  if (wrap_vec_dup)
+operands[3] = gen_rtx_VEC_DUPLICATE (mode, operands[3]);
 })
 
 (define_insn_and_split "*pred_broadcast"


[gcc(refs/vendors/riscv/heads/gcc-15-with-riscv-opts)] [riscv] vec_dup immediate constants in pred_broadcast expand [PR118182]

2025-11-01 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:d8e409a436b5a287d98cc0569e706cbca4721c6a

commit d8e409a436b5a287d98cc0569e706cbca4721c6a
Author: Alexandre Oliva 
Date:   Mon Apr 21 22:48:55 2025 -0300

[riscv] vec_dup immediate constants in pred_broadcast expand [PR118182]

pr118182-2.c fails on gcc-14 because it lacks the late_combine passes,
particularly the one that runs after register allocation.

Even in the trunk, the predicate broadcast for the add reduction is
expanded and register-allocated as _zvfh, taking up an unneeded scalar
register to hold the constant to be vec_duplicated.

It is the late combine pass after register allocation that substitutes
this unneeded scalar register into the vec_duplicate, resolving to the
_zero or _imm insns.

It's easy enough and more efficient to expand pred_broadcast to the
insns that take the already-duplicated vector constant, when the
operands satisfy the predicates of the _zero or _imm insns.

for  gcc/ChangeLog

PR target/118182
* config/riscv/vector.md (@pred_broadcast): Expand to
_zero and _imm variants without vec_duplicate.

(cherry picked from commit 14fa625bcb91028cb97f3575d2e394401bbb4a3a)

Diff:
---
 gcc/config/riscv/vector.md | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 9c2571899ae8..e4cfbb136aa7 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -2139,18 +2139,34 @@
 (match_operand 7 "const_int_operand")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
- (vec_duplicate:V_VLS
-   (match_operand: 3 "direct_broadcast_operand"))
+ ;; (vec_duplicate:V_VLS ;; wrapper activated by wrap_vec_dup below.
+ (match_operand: 3 "direct_broadcast_operand") ;; )
  (match_operand:V_VLS 2 "vector_merge_operand")))]
   "TARGET_VECTOR"
 {
   /* Transform vmv.v.x/vfmv.v.f (avl = 1) into vmv.s.x since vmv.s.x/vfmv.s.f
  has better chances to do vsetvl fusion in vsetvl pass.  */
+  bool wrap_vec_dup = true;
+  rtx vec_cst = NULL_RTX;
   if (riscv_vector::splat_to_scalar_move_p (operands))
 {
   operands[1] = riscv_vector::gen_scalar_move_mask (mode);
   operands[3] = force_reg (mode, operands[3]);
 }
+  else if (immediate_operand (operands[3], mode)
+  && (vec_cst = gen_const_vec_duplicate (mode, operands[3]))
+  && (/* -> pred_broadcast_zero */
+  (vector_least_significant_set_mask_operand (operands[1],
+  mode)
+   && vector_const_0_operand (vec_cst, mode))
+  || (/* pred_broadcast_imm */
+  vector_all_trues_mask_operand (operands[1], mode)
+  && vector_const_int_or_double_0_operand (vec_cst,
+   mode
+{
+  operands[3] = vec_cst;
+  wrap_vec_dup = false;
+}
   /* Handle vmv.s.x instruction (Wb1 mask) which has memory scalar.  */
   else if (satisfies_constraint_Wdm (operands[3]))
 {
@@ -2194,6 +2210,8 @@
 ;
   else
 operands[3] = force_reg (mode, operands[3]);
+  if (wrap_vec_dup)
+operands[3] = gen_rtx_VEC_DUPLICATE (mode, operands[3]);
 })
 
 (define_insn_and_split "*pred_broadcast"


[gcc(refs/vendors/riscv/heads/gcc-15-with-riscv-opts)] [riscv] vec_dup immediate constants in pred_broadcast expand [PR118182]

2025-10-17 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:70451a0318fc7144f1ace4a02d0c972162e72592

commit 70451a0318fc7144f1ace4a02d0c972162e72592
Author: Alexandre Oliva 
Date:   Mon Apr 21 22:48:55 2025 -0300

[riscv] vec_dup immediate constants in pred_broadcast expand [PR118182]

pr118182-2.c fails on gcc-14 because it lacks the late_combine passes,
particularly the one that runs after register allocation.

Even in the trunk, the predicate broadcast for the add reduction is
expanded and register-allocated as _zvfh, taking up an unneeded scalar
register to hold the constant to be vec_duplicated.

It is the late combine pass after register allocation that substitutes
this unneeded scalar register into the vec_duplicate, resolving to the
_zero or _imm insns.

It's easy enough and more efficient to expand pred_broadcast to the
insns that take the already-duplicated vector constant, when the
operands satisfy the predicates of the _zero or _imm insns.

for  gcc/ChangeLog

PR target/118182
* config/riscv/vector.md (@pred_broadcast): Expand to
_zero and _imm variants without vec_duplicate.

(cherry picked from commit 14fa625bcb91028cb97f3575d2e394401bbb4a3a)

Diff:
---
 gcc/config/riscv/vector.md | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 9c2571899ae8..e4cfbb136aa7 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -2139,18 +2139,34 @@
 (match_operand 7 "const_int_operand")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
- (vec_duplicate:V_VLS
-   (match_operand: 3 "direct_broadcast_operand"))
+ ;; (vec_duplicate:V_VLS ;; wrapper activated by wrap_vec_dup below.
+ (match_operand: 3 "direct_broadcast_operand") ;; )
  (match_operand:V_VLS 2 "vector_merge_operand")))]
   "TARGET_VECTOR"
 {
   /* Transform vmv.v.x/vfmv.v.f (avl = 1) into vmv.s.x since vmv.s.x/vfmv.s.f
  has better chances to do vsetvl fusion in vsetvl pass.  */
+  bool wrap_vec_dup = true;
+  rtx vec_cst = NULL_RTX;
   if (riscv_vector::splat_to_scalar_move_p (operands))
 {
   operands[1] = riscv_vector::gen_scalar_move_mask (mode);
   operands[3] = force_reg (mode, operands[3]);
 }
+  else if (immediate_operand (operands[3], mode)
+  && (vec_cst = gen_const_vec_duplicate (mode, operands[3]))
+  && (/* -> pred_broadcast_zero */
+  (vector_least_significant_set_mask_operand (operands[1],
+  mode)
+   && vector_const_0_operand (vec_cst, mode))
+  || (/* pred_broadcast_imm */
+  vector_all_trues_mask_operand (operands[1], mode)
+  && vector_const_int_or_double_0_operand (vec_cst,
+   mode
+{
+  operands[3] = vec_cst;
+  wrap_vec_dup = false;
+}
   /* Handle vmv.s.x instruction (Wb1 mask) which has memory scalar.  */
   else if (satisfies_constraint_Wdm (operands[3]))
 {
@@ -2194,6 +2210,8 @@
 ;
   else
 operands[3] = force_reg (mode, operands[3]);
+  if (wrap_vec_dup)
+operands[3] = gen_rtx_VEC_DUPLICATE (mode, operands[3]);
 })
 
 (define_insn_and_split "*pred_broadcast"


[gcc(refs/vendors/riscv/heads/gcc-15-with-riscv-opts)] [riscv] vec_dup immediate constants in pred_broadcast expand [PR118182]

2025-09-13 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:08180c375cb2ef202053b32cbcd2303228bb964d

commit 08180c375cb2ef202053b32cbcd2303228bb964d
Author: Alexandre Oliva 
Date:   Mon Apr 21 22:48:55 2025 -0300

[riscv] vec_dup immediate constants in pred_broadcast expand [PR118182]

pr118182-2.c fails on gcc-14 because it lacks the late_combine passes,
particularly the one that runs after register allocation.

Even in the trunk, the predicate broadcast for the add reduction is
expanded and register-allocated as _zvfh, taking up an unneeded scalar
register to hold the constant to be vec_duplicated.

It is the late combine pass after register allocation that substitutes
this unneeded scalar register into the vec_duplicate, resolving to the
_zero or _imm insns.

It's easy enough and more efficient to expand pred_broadcast to the
insns that take the already-duplicated vector constant, when the
operands satisfy the predicates of the _zero or _imm insns.

for  gcc/ChangeLog

PR target/118182
* config/riscv/vector.md (@pred_broadcast): Expand to
_zero and _imm variants without vec_duplicate.

(cherry picked from commit 14fa625bcb91028cb97f3575d2e394401bbb4a3a)

Diff:
---
 gcc/config/riscv/vector.md | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 9c2571899ae8..e4cfbb136aa7 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -2139,18 +2139,34 @@
 (match_operand 7 "const_int_operand")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
- (vec_duplicate:V_VLS
-   (match_operand: 3 "direct_broadcast_operand"))
+ ;; (vec_duplicate:V_VLS ;; wrapper activated by wrap_vec_dup below.
+ (match_operand: 3 "direct_broadcast_operand") ;; )
  (match_operand:V_VLS 2 "vector_merge_operand")))]
   "TARGET_VECTOR"
 {
   /* Transform vmv.v.x/vfmv.v.f (avl = 1) into vmv.s.x since vmv.s.x/vfmv.s.f
  has better chances to do vsetvl fusion in vsetvl pass.  */
+  bool wrap_vec_dup = true;
+  rtx vec_cst = NULL_RTX;
   if (riscv_vector::splat_to_scalar_move_p (operands))
 {
   operands[1] = riscv_vector::gen_scalar_move_mask (mode);
   operands[3] = force_reg (mode, operands[3]);
 }
+  else if (immediate_operand (operands[3], mode)
+  && (vec_cst = gen_const_vec_duplicate (mode, operands[3]))
+  && (/* -> pred_broadcast_zero */
+  (vector_least_significant_set_mask_operand (operands[1],
+  mode)
+   && vector_const_0_operand (vec_cst, mode))
+  || (/* pred_broadcast_imm */
+  vector_all_trues_mask_operand (operands[1], mode)
+  && vector_const_int_or_double_0_operand (vec_cst,
+   mode
+{
+  operands[3] = vec_cst;
+  wrap_vec_dup = false;
+}
   /* Handle vmv.s.x instruction (Wb1 mask) which has memory scalar.  */
   else if (satisfies_constraint_Wdm (operands[3]))
 {
@@ -2194,6 +2210,8 @@
 ;
   else
 operands[3] = force_reg (mode, operands[3]);
+  if (wrap_vec_dup)
+operands[3] = gen_rtx_VEC_DUPLICATE (mode, operands[3]);
 })
 
 (define_insn_and_split "*pred_broadcast"


[gcc(refs/vendors/riscv/heads/gcc-15-with-riscv-opts)] [riscv] vec_dup immediate constants in pred_broadcast expand [PR118182]

2025-09-06 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:2844769b933fa6779761771d95f5f5ba15f69f3f

commit 2844769b933fa6779761771d95f5f5ba15f69f3f
Author: Alexandre Oliva 
Date:   Mon Apr 21 22:48:55 2025 -0300

[riscv] vec_dup immediate constants in pred_broadcast expand [PR118182]

pr118182-2.c fails on gcc-14 because it lacks the late_combine passes,
particularly the one that runs after register allocation.

Even in the trunk, the predicate broadcast for the add reduction is
expanded and register-allocated as _zvfh, taking up an unneeded scalar
register to hold the constant to be vec_duplicated.

It is the late combine pass after register allocation that substitutes
this unneeded scalar register into the vec_duplicate, resolving to the
_zero or _imm insns.

It's easy enough and more efficient to expand pred_broadcast to the
insns that take the already-duplicated vector constant, when the
operands satisfy the predicates of the _zero or _imm insns.

for  gcc/ChangeLog

PR target/118182
* config/riscv/vector.md (@pred_broadcast): Expand to
_zero and _imm variants without vec_duplicate.

(cherry picked from commit 14fa625bcb91028cb97f3575d2e394401bbb4a3a)

Diff:
---
 gcc/config/riscv/vector.md | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 9c2571899ae8..e4cfbb136aa7 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -2139,18 +2139,34 @@
 (match_operand 7 "const_int_operand")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
- (vec_duplicate:V_VLS
-   (match_operand: 3 "direct_broadcast_operand"))
+ ;; (vec_duplicate:V_VLS ;; wrapper activated by wrap_vec_dup below.
+ (match_operand: 3 "direct_broadcast_operand") ;; )
  (match_operand:V_VLS 2 "vector_merge_operand")))]
   "TARGET_VECTOR"
 {
   /* Transform vmv.v.x/vfmv.v.f (avl = 1) into vmv.s.x since vmv.s.x/vfmv.s.f
  has better chances to do vsetvl fusion in vsetvl pass.  */
+  bool wrap_vec_dup = true;
+  rtx vec_cst = NULL_RTX;
   if (riscv_vector::splat_to_scalar_move_p (operands))
 {
   operands[1] = riscv_vector::gen_scalar_move_mask (mode);
   operands[3] = force_reg (mode, operands[3]);
 }
+  else if (immediate_operand (operands[3], mode)
+  && (vec_cst = gen_const_vec_duplicate (mode, operands[3]))
+  && (/* -> pred_broadcast_zero */
+  (vector_least_significant_set_mask_operand (operands[1],
+  mode)
+   && vector_const_0_operand (vec_cst, mode))
+  || (/* pred_broadcast_imm */
+  vector_all_trues_mask_operand (operands[1], mode)
+  && vector_const_int_or_double_0_operand (vec_cst,
+   mode
+{
+  operands[3] = vec_cst;
+  wrap_vec_dup = false;
+}
   /* Handle vmv.s.x instruction (Wb1 mask) which has memory scalar.  */
   else if (satisfies_constraint_Wdm (operands[3]))
 {
@@ -2194,6 +2210,8 @@
 ;
   else
 operands[3] = force_reg (mode, operands[3]);
+  if (wrap_vec_dup)
+operands[3] = gen_rtx_VEC_DUPLICATE (mode, operands[3]);
 })
 
 (define_insn_and_split "*pred_broadcast"


[gcc(refs/vendors/riscv/heads/gcc-15-with-riscv-opts)] [riscv] vec_dup immediate constants in pred_broadcast expand [PR118182]

2025-08-25 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:91f70090efcbe2146435aa792dffc0474e898d5c

commit 91f70090efcbe2146435aa792dffc0474e898d5c
Author: Alexandre Oliva 
Date:   Mon Apr 21 22:48:55 2025 -0300

[riscv] vec_dup immediate constants in pred_broadcast expand [PR118182]

pr118182-2.c fails on gcc-14 because it lacks the late_combine passes,
particularly the one that runs after register allocation.

Even in the trunk, the predicate broadcast for the add reduction is
expanded and register-allocated as _zvfh, taking up an unneeded scalar
register to hold the constant to be vec_duplicated.

It is the late combine pass after register allocation that substitutes
this unneeded scalar register into the vec_duplicate, resolving to the
_zero or _imm insns.

It's easy enough and more efficient to expand pred_broadcast to the
insns that take the already-duplicated vector constant, when the
operands satisfy the predicates of the _zero or _imm insns.

for  gcc/ChangeLog

PR target/118182
* config/riscv/vector.md (@pred_broadcast): Expand to
_zero and _imm variants without vec_duplicate.

(cherry picked from commit 14fa625bcb91028cb97f3575d2e394401bbb4a3a)

Diff:
---
 gcc/config/riscv/vector.md | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 51eb64fb1226..3ab4d76e6c6a 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -2136,18 +2136,34 @@
 (match_operand 7 "const_int_operand")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
- (vec_duplicate:V_VLS
-   (match_operand: 3 "direct_broadcast_operand"))
+ ;; (vec_duplicate:V_VLS ;; wrapper activated by wrap_vec_dup below.
+ (match_operand: 3 "direct_broadcast_operand") ;; )
  (match_operand:V_VLS 2 "vector_merge_operand")))]
   "TARGET_VECTOR"
 {
   /* Transform vmv.v.x/vfmv.v.f (avl = 1) into vmv.s.x since vmv.s.x/vfmv.s.f
  has better chances to do vsetvl fusion in vsetvl pass.  */
+  bool wrap_vec_dup = true;
+  rtx vec_cst = NULL_RTX;
   if (riscv_vector::splat_to_scalar_move_p (operands))
 {
   operands[1] = riscv_vector::gen_scalar_move_mask (mode);
   operands[3] = force_reg (mode, operands[3]);
 }
+  else if (immediate_operand (operands[3], mode)
+  && (vec_cst = gen_const_vec_duplicate (mode, operands[3]))
+  && (/* -> pred_broadcast_zero */
+  (vector_least_significant_set_mask_operand (operands[1],
+  mode)
+   && vector_const_0_operand (vec_cst, mode))
+  || (/* pred_broadcast_imm */
+  vector_all_trues_mask_operand (operands[1], mode)
+  && vector_const_int_or_double_0_operand (vec_cst,
+   mode
+{
+  operands[3] = vec_cst;
+  wrap_vec_dup = false;
+}
   /* Handle vmv.s.x instruction (Wb1 mask) which has memory scalar.  */
   else if (satisfies_constraint_Wdm (operands[3]))
 {
@@ -2191,6 +2207,8 @@
 ;
   else
 operands[3] = force_reg (mode, operands[3]);
+  if (wrap_vec_dup)
+operands[3] = gen_rtx_VEC_DUPLICATE (mode, operands[3]);
 })
 
 (define_insn_and_split "*pred_broadcast"


[gcc(refs/vendors/riscv/heads/gcc-15-with-riscv-opts)] [riscv] vec_dup immediate constants in pred_broadcast expand [PR118182]

2025-07-23 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:d3bc131bd42d9f8db4084780f5b0326b173191c1

commit d3bc131bd42d9f8db4084780f5b0326b173191c1
Author: Alexandre Oliva 
Date:   Mon Apr 21 22:48:55 2025 -0300

[riscv] vec_dup immediate constants in pred_broadcast expand [PR118182]

pr118182-2.c fails on gcc-14 because it lacks the late_combine passes,
particularly the one that runs after register allocation.

Even in the trunk, the predicate broadcast for the add reduction is
expanded and register-allocated as _zvfh, taking up an unneeded scalar
register to hold the constant to be vec_duplicated.

It is the late combine pass after register allocation that substitutes
this unneeded scalar register into the vec_duplicate, resolving to the
_zero or _imm insns.

It's easy enough and more efficient to expand pred_broadcast to the
insns that take the already-duplicated vector constant, when the
operands satisfy the predicates of the _zero or _imm insns.

for  gcc/ChangeLog

PR target/118182
* config/riscv/vector.md (@pred_broadcast): Expand to
_zero and _imm variants without vec_duplicate.

(cherry picked from commit 14fa625bcb91028cb97f3575d2e394401bbb4a3a)

Diff:
---
 gcc/config/riscv/vector.md | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 51eb64fb1226..3ab4d76e6c6a 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -2136,18 +2136,34 @@
 (match_operand 7 "const_int_operand")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
- (vec_duplicate:V_VLS
-   (match_operand: 3 "direct_broadcast_operand"))
+ ;; (vec_duplicate:V_VLS ;; wrapper activated by wrap_vec_dup below.
+ (match_operand: 3 "direct_broadcast_operand") ;; )
  (match_operand:V_VLS 2 "vector_merge_operand")))]
   "TARGET_VECTOR"
 {
   /* Transform vmv.v.x/vfmv.v.f (avl = 1) into vmv.s.x since vmv.s.x/vfmv.s.f
  has better chances to do vsetvl fusion in vsetvl pass.  */
+  bool wrap_vec_dup = true;
+  rtx vec_cst = NULL_RTX;
   if (riscv_vector::splat_to_scalar_move_p (operands))
 {
   operands[1] = riscv_vector::gen_scalar_move_mask (mode);
   operands[3] = force_reg (mode, operands[3]);
 }
+  else if (immediate_operand (operands[3], mode)
+  && (vec_cst = gen_const_vec_duplicate (mode, operands[3]))
+  && (/* -> pred_broadcast_zero */
+  (vector_least_significant_set_mask_operand (operands[1],
+  mode)
+   && vector_const_0_operand (vec_cst, mode))
+  || (/* pred_broadcast_imm */
+  vector_all_trues_mask_operand (operands[1], mode)
+  && vector_const_int_or_double_0_operand (vec_cst,
+   mode
+{
+  operands[3] = vec_cst;
+  wrap_vec_dup = false;
+}
   /* Handle vmv.s.x instruction (Wb1 mask) which has memory scalar.  */
   else if (satisfies_constraint_Wdm (operands[3]))
 {
@@ -2191,6 +2207,8 @@
 ;
   else
 operands[3] = force_reg (mode, operands[3]);
+  if (wrap_vec_dup)
+operands[3] = gen_rtx_VEC_DUPLICATE (mode, operands[3]);
 })
 
 (define_insn_and_split "*pred_broadcast"


[gcc(refs/vendors/riscv/heads/gcc-15-with-riscv-opts)] [riscv] vec_dup immediate constants in pred_broadcast expand [PR118182]

2025-07-05 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:b5f4ff62f6deb977f592870dd7f0327877a53c26

commit b5f4ff62f6deb977f592870dd7f0327877a53c26
Author: Alexandre Oliva 
Date:   Mon Apr 21 22:48:55 2025 -0300

[riscv] vec_dup immediate constants in pred_broadcast expand [PR118182]

pr118182-2.c fails on gcc-14 because it lacks the late_combine passes,
particularly the one that runs after register allocation.

Even in the trunk, the predicate broadcast for the add reduction is
expanded and register-allocated as _zvfh, taking up an unneeded scalar
register to hold the constant to be vec_duplicated.

It is the late combine pass after register allocation that substitutes
this unneeded scalar register into the vec_duplicate, resolving to the
_zero or _imm insns.

It's easy enough and more efficient to expand pred_broadcast to the
insns that take the already-duplicated vector constant, when the
operands satisfy the predicates of the _zero or _imm insns.

for  gcc/ChangeLog

PR target/118182
* config/riscv/vector.md (@pred_broadcast): Expand to
_zero and _imm variants without vec_duplicate.

(cherry picked from commit 14fa625bcb91028cb97f3575d2e394401bbb4a3a)

Diff:
---
 gcc/config/riscv/vector.md | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 51eb64fb1226..3ab4d76e6c6a 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -2136,18 +2136,34 @@
 (match_operand 7 "const_int_operand")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
- (vec_duplicate:V_VLS
-   (match_operand: 3 "direct_broadcast_operand"))
+ ;; (vec_duplicate:V_VLS ;; wrapper activated by wrap_vec_dup below.
+ (match_operand: 3 "direct_broadcast_operand") ;; )
  (match_operand:V_VLS 2 "vector_merge_operand")))]
   "TARGET_VECTOR"
 {
   /* Transform vmv.v.x/vfmv.v.f (avl = 1) into vmv.s.x since vmv.s.x/vfmv.s.f
  has better chances to do vsetvl fusion in vsetvl pass.  */
+  bool wrap_vec_dup = true;
+  rtx vec_cst = NULL_RTX;
   if (riscv_vector::splat_to_scalar_move_p (operands))
 {
   operands[1] = riscv_vector::gen_scalar_move_mask (mode);
   operands[3] = force_reg (mode, operands[3]);
 }
+  else if (immediate_operand (operands[3], mode)
+  && (vec_cst = gen_const_vec_duplicate (mode, operands[3]))
+  && (/* -> pred_broadcast_zero */
+  (vector_least_significant_set_mask_operand (operands[1],
+  mode)
+   && vector_const_0_operand (vec_cst, mode))
+  || (/* pred_broadcast_imm */
+  vector_all_trues_mask_operand (operands[1], mode)
+  && vector_const_int_or_double_0_operand (vec_cst,
+   mode
+{
+  operands[3] = vec_cst;
+  wrap_vec_dup = false;
+}
   /* Handle vmv.s.x instruction (Wb1 mask) which has memory scalar.  */
   else if (satisfies_constraint_Wdm (operands[3]))
 {
@@ -2191,6 +2207,8 @@
 ;
   else
 operands[3] = force_reg (mode, operands[3]);
+  if (wrap_vec_dup)
+operands[3] = gen_rtx_VEC_DUPLICATE (mode, operands[3]);
 })
 
 (define_insn_and_split "*pred_broadcast"


[gcc(refs/vendors/riscv/heads/gcc-15-with-riscv-opts)] [riscv] vec_dup immediate constants in pred_broadcast expand [PR118182]

2025-05-18 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:9849e5ffa9cec8664b4ff4232b74ee0e33b4a537

commit 9849e5ffa9cec8664b4ff4232b74ee0e33b4a537
Author: Alexandre Oliva 
Date:   Mon Apr 21 22:48:55 2025 -0300

[riscv] vec_dup immediate constants in pred_broadcast expand [PR118182]

pr118182-2.c fails on gcc-14 because it lacks the late_combine passes,
particularly the one that runs after register allocation.

Even in the trunk, the predicate broadcast for the add reduction is
expanded and register-allocated as _zvfh, taking up an unneeded scalar
register to hold the constant to be vec_duplicated.

It is the late combine pass after register allocation that substitutes
this unneeded scalar register into the vec_duplicate, resolving to the
_zero or _imm insns.

It's easy enough and more efficient to expand pred_broadcast to the
insns that take the already-duplicated vector constant, when the
operands satisfy the predicates of the _zero or _imm insns.

for  gcc/ChangeLog

PR target/118182
* config/riscv/vector.md (@pred_broadcast): Expand to
_zero and _imm variants without vec_duplicate.

(cherry picked from commit 14fa625bcb91028cb97f3575d2e394401bbb4a3a)

Diff:
---
 gcc/config/riscv/vector.md | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 51eb64fb1226..3ab4d76e6c6a 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -2136,18 +2136,34 @@
 (match_operand 7 "const_int_operand")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
- (vec_duplicate:V_VLS
-   (match_operand: 3 "direct_broadcast_operand"))
+ ;; (vec_duplicate:V_VLS ;; wrapper activated by wrap_vec_dup below.
+ (match_operand: 3 "direct_broadcast_operand") ;; )
  (match_operand:V_VLS 2 "vector_merge_operand")))]
   "TARGET_VECTOR"
 {
   /* Transform vmv.v.x/vfmv.v.f (avl = 1) into vmv.s.x since vmv.s.x/vfmv.s.f
  has better chances to do vsetvl fusion in vsetvl pass.  */
+  bool wrap_vec_dup = true;
+  rtx vec_cst = NULL_RTX;
   if (riscv_vector::splat_to_scalar_move_p (operands))
 {
   operands[1] = riscv_vector::gen_scalar_move_mask (mode);
   operands[3] = force_reg (mode, operands[3]);
 }
+  else if (immediate_operand (operands[3], mode)
+  && (vec_cst = gen_const_vec_duplicate (mode, operands[3]))
+  && (/* -> pred_broadcast_zero */
+  (vector_least_significant_set_mask_operand (operands[1],
+  mode)
+   && vector_const_0_operand (vec_cst, mode))
+  || (/* pred_broadcast_imm */
+  vector_all_trues_mask_operand (operands[1], mode)
+  && vector_const_int_or_double_0_operand (vec_cst,
+   mode
+{
+  operands[3] = vec_cst;
+  wrap_vec_dup = false;
+}
   /* Handle vmv.s.x instruction (Wb1 mask) which has memory scalar.  */
   else if (satisfies_constraint_Wdm (operands[3]))
 {
@@ -2191,6 +2207,8 @@
 ;
   else
 operands[3] = force_reg (mode, operands[3]);
+  if (wrap_vec_dup)
+operands[3] = gen_rtx_VEC_DUPLICATE (mode, operands[3]);
 })
 
 (define_insn_and_split "*pred_broadcast"