[PATCH v1 1/3] RISC-V: Combine vec_duplicate + vnmsac.vv to vnmsac.vx on GR2VR cost

2025-08-27 Thread pan2 . li
From: Pan Li This patch would like to combine the vec_duplicate + vnmsac.vv to the vnmsac.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if

[PATCH v1 3/3] RISC-V: Add test for vec_duplicate + vnmsac.vv unsigned combine with GR2VR cost 0, 1 and 15

2025-08-27 Thread pan2 . li
From: Pan Li Add asm dump check and run test for vec_duplicate + vnmsac.vvm combine to vnmsac.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check for vnmsac.vx. * gcc.target/riscv/rvv/autovec/vx_

[PATCH v1 2/3] RISC-V: Add test for vec_duplicate + vnmsac.vv signed combine with GR2VR cost 0, 1 and 15

2025-08-27 Thread pan2 . li
From: Pan Li Add asm dump check and run test for vec_duplicate + vnmsac.vvm combine to vnmsac.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check for vnmsac.vx. * gcc.target/riscv/rvv/autovec/vx_

[PATCH v1 0/3] RISC-V: Combine vec_duplicate + vnmsac.vv to vnmsac.vx on GR2VR cost

2025-08-27 Thread pan2 . li
From: Pan Li This patch would like to introduce the combine of vec_dup + vnmsac.vv into vnmsac.vx on the cost value of GR2VR. The late-combine will take place if the cost of GR2VRlike 1, 2, 15 in test. From: | ... | vmv.v.x | L1: | vnmsac.vv | J L1 | ... To: | ... | L1: |

[PATCH v1] RISC-V: Add test case for unsigned scalar SAT_MUL form 4

2025-08-24 Thread pan2 . li
From: Pan Li The form 4 of unsigned scalar SAT_MUL is covered in middle-expand alreay, add test case here to cover form 4. The below test suites are passed for this patch series. * The rv64gcv fully regression test. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat/sat_arith.h: Add test

[PATCH v2 1/3] RISC-V: Combine vec_duplicate + vmacc.vv to vmacc.vx on GR2VR cost

2025-08-23 Thread pan2 . li
From: Pan Li This patch would like to combine the vec_duplicate + vmacc.vv to the vmacc.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if th

[PATCH v2 2/3] RISC-V: Add test for vec_duplicate + vmacc.vv signed combine with GR2VR cost 0, 1 and 15

2025-08-23 Thread pan2 . li
From: Pan Li Add asm dump check and run test for vec_duplicate + vmacc.vvm combine to vmacc.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check for vx combine. * gcc.target/riscv/rvv/autovec/vx_v

[PATCH v2 3/3] RISC-V: Add test for vec_duplicate + vmacc.vv unsigned combine with GR2VR cost 0, 1 and 15

2025-08-23 Thread pan2 . li
From: Pan Li Add asm dump check and run test for vec_duplicate + vmacc.vvm combine to vmacc.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check for vx combine. * gcc.target/riscv/rvv/autovec/vx_v

[PATCH v2 0/3] RISC-V: Combine vec_duplicate + vmacc.vv to vmacc.vx on GR2VR cost

2025-08-23 Thread pan2 . li
From: Pan Li This patch would like to introduce the combine of vec_dup + vmacc.vv into vmacc.vx on the cost value of GR2VR. The late-combine will take place if the cost of GR2VR is zero, or reject the combine if non-zero like 1, 2, 15 in test. From: | ... | vmv.v.x | L1: | vmacc.vv |

[PATCH v1 0/4] RISC-V: Combine vec_duplicate + vmacc.vv to vmacc.vx on GR2VR cost

2025-08-18 Thread pan2 . li
From: Pan Li This patch would like to introduce the combine of vec_dup + vmacc.vv into vmacc.vx on the cost value of GR2VR. The late-combine will take place if the cost of GR2VR is zero, or reject the combine if non-zero like 1, 2, 15 in test. From: | ... | vmv.v.x | L1: | vmacc.vv |

[PATCH v1 1/4] RISC-V: Combine vec_duplicate + vmacc.vv to vmacc.vx on GR2VR cost

2025-08-18 Thread pan2 . li
From: Pan Li This patch would like to combine the vec_duplicate + vmacc.vv to the vmacc.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if th

[PATCH v1 4/4] RISC-V: Adjust the asm check after enable vmacc.vx combine

2025-08-18 Thread pan2 . li
From: Pan Li After enable the vmacc.vx by introducing the define_insn, the below asm need to adjust for this change. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/ternop_vx_constraint-4.c: Adjust asm check for vx. * gcc.target/riscv/rvv/base/ternop_vx_constraint-5

[PATCH v1 3/4] RISC-V: Add test for vec_duplicate + vmacc.vv unsigned combine with GR2VR cost 0, 1 and 15

2025-08-18 Thread pan2 . li
From: Pan Li Add asm dump check and run test for vec_duplicate + vmacc.vvm combine to vmacc.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check for vx combine. * gcc.target/riscv/rvv/autovec/vx_v

[PATCH v1 2/4] RISC-V: Add test for vec_duplicate + vmacc.vv signed combine with GR2VR cost 0, 1 and 15

2025-08-18 Thread pan2 . li
From: Pan Li Add asm dump check and run test for vec_duplicate + vmacc.vvm combine to vmacc.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check for vx combine. * gcc.target/riscv/rvv/autovec/vx_v

[PATCH v1 0/2] Refine the vx combine pattern

2025-08-16 Thread pan2 . li
From: Pan Li Add the missed DONE indicator and update comments Pan Li (2): RISC-V: Add missed DONE for vx combine pattern [NFC] RISC-V: Update the comments of vx combine [NFC] gcc/config/riscv/autovec-opt.md | 24 1 file changed, 24 insertions(+) -- 2.43.0

[PATCH v1 2/2] RISC-V: Update the comments of vx combine [NFC]

2025-08-16 Thread pan2 . li
From: Pan Li The supported insn of vx combine is out of date, update all insn supported for now. gcc/ChangeLog: * config/riscv/autovec-opt.md: Add supported insn of vx combine. Signed-off-by: Pan Li --- gcc/config/riscv/autovec-opt.md | 20 1 file changed

[PATCH v1 1/2] RISC-V: Add missed DONE for vx combine pattern [NFC]

2025-08-16 Thread pan2 . li
From: Pan Li The previous patch missed the DONE indicator of the vx combine pattern. Thus add it back. gcc/ChangeLog: * config/riscv/autovec-opt.md: Add missed DONE for vx combine pattern. Signed-off-by: Pan Li --- gcc/config/riscv/autovec-opt.md | 4 1 file changed, 4

[PATCH v2 0/2] Support unsigned scalar SAT_MUL form 3

2025-08-13 Thread pan2 . li
From: Pan Li This patch would like to try to match the the unsigned SAT_MUL form 3, aka below: #define DEF_SAT_U_MUL_FMT_3(NT, WT) \ NT __attribute__((noinline))\ sat_u_mul_##NT##_from_##WT##_fmt_3 (NT a, NT b) \ {

[PATCH v2 2/2] RISC-V: Add testcase for scalar unsigned SAT_MUL form 3

2025-08-13 Thread pan2 . li
From: Pan Li Add run and asm check test cases for scalar unsigned SAT_MUL form 3. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat/sat_arith.h: Add test helper macros. * gcc.target/riscv/sat/sat_u_mul-4-u16-from-u128.c: New test. * gcc.target/riscv/sat/sat_u_mul-4-u16-fro

[PATCH v2 1/2] Match: Add form 3 for unsigned SAT_MUL

2025-08-13 Thread pan2 . li
From: Pan Li This patch would like to try to match the the unsigned SAT_MUL form 3, aka below: #define DEF_SAT_U_MUL_FMT_3(NT, WT) \ NT __attribute__((noinline))\ sat_u_mul_##NT##_from_##WT##_fmt_3 (NT a, NT b) \ {

[PATCH v2 1/2] RISC-V: Combine vec_duplicate + vmerge.vv to vmerge.vx on GR2VR cost

2025-08-11 Thread pan2 . li
From: Pan Li This patch would like to combine the vec_duplicate + vaadd.vv to the vaadd.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if th

[PATCH v2 2/2] RISC-V: RISC-V: Add test for vec_duplicate + vmerge.vvm combine with GR2VR cost 0, 1 and 15

2025-08-11 Thread pan2 . li
From: Pan Li Add asm dump check and run test for vec_duplicate + vmerge.vvm combine to vmerge.vxm, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: Add test helper macros. * gcc.target/riscv/rvv/autovec/vx_vf/

[PATCH v2 0/2] RISC-V: Combine vec_duplicate + vmerge.vvm to vmerge.vxm on GR2VR cost

2025-08-11 Thread pan2 . li
From: Pan Li This patch would like to introduce the combine of vec_dup + vmerge.vvm into vmerge.vxm on the cost value of GR2VR. The late-combine will take place if the cost of GR2VR is zero, or reject the combine if non-zero like 1, 2, 15 in test. From: | ... | vmv.v.x | L1: | vaadd.v

[PATCH v1] RISC-V: Refactor the vec_duplicate cost on gpr/fpr2vr-cost param

2025-08-06 Thread pan2 . li
From: Pan Li The previous cost value for vec_duplicate almost bases on the operators like add/minus. The rtx_cost function try to match them case by case and find if it has vec_duplicate, then update the cost values. It is Ok when we initially add it but looks confused/redundant as more and mor

[PATCH v1 2/2] RISC-V: RISC-V: Add test for vec_duplicate + vmerge.vvm combine with GR2VR cost 0, 1 and 15

2025-08-03 Thread pan2 . li
From: Pan Li Add asm dump check and run test for vec_duplicate + vmerge.vvm combine to vmerge.vxm, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: Add test helper macros. * gcc.target/riscv/rvv/autovec/vx_vf/

[PATCH v1 1/2] RISC-V: Combine vec_duplicate + vmerge.vv to vmerge.vx on GR2VR cost

2025-08-03 Thread pan2 . li
From: Pan Li This patch would like to combine the vec_duplicate + vaadd.vv to the vaadd.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if th

[PATCH v1 0/2] RISC-V: Combine vec_duplicate + vmerge.vvm to vmerge.vxm on GR2VR cost

2025-08-03 Thread pan2 . li
From: Pan Li This patch would like to introduce the combine of vec_dup + vmerge.vv into vmerge.vx on the cost value of GR2VR. The late-combine will take place if the cost of GR2VR is zero, or reject the combine if non-zero like 1, 2, 15 in test. From: | ... | vmv.v.x | L1: | vaadd.vv

[PATCH v1 2/3] Widening-Mul: Support unsigned scalar SAT_MUL 2

2025-08-01 Thread pan2 . li
From: Pan Li For mul_overflow api, we will have PHI node similar as below: _6 = .MUL_OVERFLOW (a_4(D), b_5(D)); _2 = IMAGPART_EXPR <_6>; if (_2 != 0) goto ; [35.00%] else goto ; [65.00%] [local count: 697932184]: _1 = REALPART_EXPR <_6>; [local count: 1073741824]: # _

[PATCH v1 3/3] RISC-V: Add testcase for scalar unsigned SAT_MUL form 2

2025-08-01 Thread pan2 . li
From: Pan Li Add run and asm check test cases for scalar unsigned SAT_MUL form 2. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat/sat_arith.h: Add test helper macros. * gcc.target/riscv/sat/sat_u_mul-3-u16.c: New test. * gcc.target/riscv/sat/sat_u_mul-3-u32.c: New test.

[PATCH v1 0/3] Support unsigned scalar SAT_MUL form 2

2025-08-01 Thread pan2 . li
From: Pan Li This patch would like to try to match the the unsigned SAT_MUL form 2, aka below: #define DEF_SAT_U_MUL_FMT_2(T) \ T __attribute__((noinline)) \ sat_u_mul_##T##_fmt_2 (T a, T b) \ {\ T

[PATCH v1 1/3] Match: Add form 2 for unsigned SAT_MUL

2025-08-01 Thread pan2 . li
From: Pan Li This patch would like to try to match the the unsigned SAT_MUL form 2, aka below: #define DEF_SAT_U_MUL_FMT_2(T) \ T __attribute__((noinline)) \ sat_u_mul_##T##_fmt_2 (T a, T b) \ {\ T

[PATCH v1] RISC-V: Fix scalar code-gen of unsigned SAT_MUL

2025-07-30 Thread pan2 . li
From: Pan Li The previous code-gen of scalar unsigned SAT_MUL, aka usmul. Leverage the mulhs by mistake, it should be mulhu for the hight bit result of mul. Thus, this patch would like to make it correct. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_expand_xmode_usmul): Take u

[PATCH v1] RISC-V: Add testcases for signed avg ceil vx combine

2025-07-29 Thread pan2 . li
From: Pan Li The unsigned avg ceil share the vaaddx.vx for the vx combine, so add the test case to make sure it works well as expected. The below test suites are passed for this patch series. * The rv64gcv fully regression test. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/v

[PATCH v2] RISC-V: Add testcases for unsigned avg ceil vx combine.

2025-07-28 Thread pan2 . li
From: Pan Li The unsigned avg ceil share the vaaddux.vx for the vx combine, so add the test case to make sure it works well as expected. The below test suites are passed for this patch series. * The rv64gcv fully regression test. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/

[PATCH v2 2/2] RISC-V: Add test cases for mul based unsigned scalar SAT_MUL

2025-07-28 Thread pan2 . li
From: Pan Li Add run and tree-optimized check for mul based unsigned scalar SAT_MUL instead of the widen_mul. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat/sat_u_mul-run-1-u16-from-u64.c: Add rv64 target for run. * gcc.target/riscv/sat/sat_u_mul-run-1-u32-from-u64.c: D

[PATCH v2 1/2] Match: Introduce mul based pattern for unsigned SAT_MUL

2025-07-28 Thread pan2 . li
From: Pan Li Like widen_mul based pattern, we would like introduce the mul based pattern as well. The pattern is quite simple compares to the widen_mul, thus add new instead of the for loop in match.pd. gcc/ChangeLog: * match.pd: Add mul based unsigned SAT_MUL. Signed-off-by: Pan Li

[PATCH v2 0/2] Add mul based unsigned SAT_MUL for form 1

2025-07-28 Thread pan2 . li
From: Pan Li This patch series would like to support the unsigned SAT_MUL with the help of mul, instead of the widen_mul. Aka: NT __attribute__((noinline)) sat_u_mul_##NT##_fmt_1 (NT a, NT b) { uint64_t x = (uint64_t)a * (uint64_t)b; NT max = -1; if (x > (uint64_t)(max)) return max;

[PATCH v1] RISC-V: Add testcases for unsigned avg ceil vx combine.

2025-07-28 Thread pan2 . li
From: Pan Li The unsigned avg ceil share the vaaddux.vx for the vx combine, so add the test case to make sure it works well as expected. The below test suites are passed for this patch series. * The rv64gcv fully regression test. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/

[PATCH v1 0/2] Add mul based unsigned SAT_MUL for form 1

2025-07-26 Thread pan2 . li
From: Pan Li This patch series would like to support the unsigned SAT_MUL with the help of mul, instead of the widen_mul. Aka: NT __attribute__((noinline)) sat_u_mul_##NT##_fmt_1 (NT a, NT b) { uint64_t x = (uint64_t)a * (uint64_t)b; NT max = -1; if (x > (uint64_t)(max)) return max;

[PATCH v1 2/2] RISC-V: Add test cases for mul based unsigned scalar SAT_MUL

2025-07-26 Thread pan2 . li
From: Pan Li Add run and tree-optimized check for mul based unsigned scalar SAT_MUL instead of the widen_mul. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat/sat_u_mul-run-1-u16-from-u64.c: Add rv64 target for run. * gcc.target/riscv/sat/sat_u_mul-run-1-u32-from-u64.c: D

[PATCH v1 1/2] Match: Introduce mul based pattern for unsigned SAT_MUL

2025-07-26 Thread pan2 . li
From: Pan Li Like widen_mul based pattern, we would like introduce the mul based pattern as well. The pattern is quite simple compares to the widen_mul, thus add new instead of the for loop in match.pd. gcc/ChangeLog: * match.pd: Add mul based unsigned SAT_MUL. Signed-off-by: Pan Li

[PATCH v1] RISC-V: Fix another vf FP16 combine run test failures

2025-07-25 Thread pan2 . li
From: Pan Li Like Robin's fix for vf combine f16.c run tests, there is still another failures similar. This patch would like to fix it as previous. will commit it directly if the CI agrees. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwnmacc-run-1-f16.c:

[PATCH v1 4/4] RISC-V: Add test case for vaadd.vx combine polluting VXRM

2025-07-25 Thread pan2 . li
From: Pan Li Add asm check to make sure vx combine of vaadd.vx will not pollute the vxrm. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-fixed-vxrm-1-i16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx-fixed-vxrm-1-i32.c: New test. * gcc.target/ris

[PATCH v1 3/4] RISC-V: Add test for vec_duplicate + vaadd.vv combine case 1 with GR2VR cost 0, 1 and 2

2025-07-25 Thread pan2 . li
From: Pan Li Add asm dump check test for vec_duplicate + vaadd.vv combine to vaadd.vx, with the GR2VR cost is 0, 1 and 2 gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c: Add asm check. * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c: Ditto. * gcc

[PATCH v1 1/4] RISC-V: Combine vec_duplicate + vaadd.vv to vaadd.vx on GR2VR cost

2025-07-25 Thread pan2 . li
From: Pan Li This patch would like to combine the vec_duplicate + vaadd.vv to the vaadd.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if th

[PATCH v1 2/4] RISC-V: Add test for vec_duplicate + vaadd.vv combine case 0 with GR2VR cost 0, 1 and 15

2025-07-25 Thread pan2 . li
From: Pan Li Add asm dump check and run test for vec_duplicate + vaadd.vv combine to vaadd.vx, with the GR2VR cost is 0, 2 and 15 gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto.

[PATCH v1 0/4] RISC-V: Combine vec_duplicate + vaadd.vv to vaadd.vx on GR2VR cost

2025-07-25 Thread pan2 . li
From: Pan Li This patch would like to introduce the combine of vec_dup + vaadd.vv into vaadd.vx on the cost value of GR2VR. The late-combine will take place if the cost of GR2VR is zero, or reject the combine if non-zero like 1, 2, 15 in test. There will be two cases for the combine: Case 0:

[PATCH v1 2/2] RISC-V: Add test case for vx combine polluting VXRM

2025-07-22 Thread pan2 . li
From: Pan Li Add asm check to make sure vx combine of vaaddu.vx will not pollute the vxrm. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-fixed-vxrm-1-u16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx-fixed-vxrm-1-u32.c: New test. * gcc.target/ri

[PATCH v1 1/2] RISC-V: Avoid vaaddu.vx combine pattern pollute VXRM csr

2025-07-22 Thread pan2 . li
From: Pan Li The vaaddu.vx combine almost comes from avg_floor, it will requires the vxrm to be RDN. But not all vaaddu.vx should depends on the RDN. The vaaddu.vx combine should leverage the VXRM value as is instead of pollute them all to RDN. This patch would like to fix this and set it as i

[PATCH v1 0/2] Avoid RVV fixed insn VX combine pollute VXRM

2025-07-22 Thread pan2 . li
From: Pan Li The RVV fixed point insn VX combine should focus on the insn itself, instead of any standard name like avg_floor, the vxrm should be the value of insn as is. The below test suites are passed for this patch series. * The rv64gcv fully regression test. Pan Li (2): RISC-V: Avoid vaa

[PATCH v1 5/5] RISC-V: Add test for vec_duplicate + vaaddu.vv combine for DImode

2025-07-20 Thread pan2 . li
From: Pan Li Add asm dump check and run test for vec_duplicate + vaaddu.vv combine to vaaddu.vx, with the GR2VR cost is 0, 1, 2 and 15 for the case 0 and case 1. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Add asm check. * gcc.target/riscv/rvv/autov

[PATCH v1 3/5] RISC-V: Add test for vec_duplicate + vaaddu.vv combine case 1 with GR2VR cost 0, 1 and 2 for QI, HI and SI mode

2025-07-20 Thread pan2 . li
From: Pan Li Add asm dump check test for vec_duplicate + vaaddu.vv combine to vaaddu.vx, with the GR2VR cost is 0, 1 and 2. Please note DImode is not included. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Add asm check. * gcc.target/riscv/rvv/autove

[PATCH v1 4/5] RISC-V: Allow VLS DImode for sat_op vx DImode pattern

2025-07-20 Thread pan2 . li
From: Pan Li When try to introduce the vaaddu.vx combine for DImode, we will meet ICE like below: 0x4889763 internal_error(char const*, ...) .../riscv-gnu-toolchain/gcc/__build__/../gcc/diagnostic-global-context.cc:517 0x4842f98 fancy_abort(char const*, int, char const*) .../ris

[PATCH v1 2/5] RISC-V: Add test for vec_duplicate + vaaddu.vv combine case 0 with GR2VR cost 0, 2 and 15 for QI, HI and SI mode

2025-07-20 Thread pan2 . li
From: Pan Li Add asm dump check and run test for vec_duplicate + vaaddu.vv combine to vaaddu.vx, with the GR2VR cost is 0, 2 and 15. Please note DImode is not included here. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check. * gcc.target/ri

[PATCH v1 1/5] RISC-V: Combine vec_duplicate + vaaddu.vv to vaaddu.vx on GR2VR cost for HI, QI and SI mode

2025-07-20 Thread pan2 . li
From: Pan Li This patch would like to combine the vec_duplicate + vaaddu.vv to the vaaddu.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if

[PATCH v1 0/5] RISC-V: Combine vec_duplicate + vaaddu.vv to vaaddu.vx on GR2VR cost

2025-07-20 Thread pan2 . li
From: Pan Li This patch would like to introduce the combine of vec_dup + vaaddu.vv into vaaddu.vx on the cost value of GR2VR. The late-combine will take place if the cost of GR2VR is zero, or reject the combine if non-zero like 1, 2, 15 in test. There will be two cases for the combine: Case 0:

[PATCH v1] RISC-V: Add ashiftrt operand 2 for vector avg_floor and avg_ceil

2025-07-19 Thread pan2 . li
From: Pan Li According to the semantics of the avg_floor and avg_ceil as below: floor: op0 = (narrow) (((wide) op1 + (wide) op2) >> 1); ceil: op0 = (narrow) (((wide) op1 + (wide) op2 + 1) >> 1); Aka we have (const_int 1) as the op2 of the ashiftrt but seems missed. Thus, add it back to align t

[PATCH v1] RISC-V: Refine the test case for vector avg_floor and avg_ceil [NFC]

2025-07-18 Thread pan2 . li
From: Pan Li The previous test case doesn't leverage the right test helper macro, it should be DEF_AVG_0_WRAP instead of DEF_AVG_0. We prefer the test function name is test_avg_floor_int64_t_int32_t_0 instead of test_avg_floor_WT_NT_0 for DEF_AVG_0(WT, NT). The below test suites are passed for

[PATCH v1] RISC-V: Support RVVDImode for avg3_ceil auto vect

2025-07-16 Thread pan2 . li
From: Pan Li Like the avg3_floor pattern, the avg3_ceil has the similar issue that lack of the RVV DImode support. Thus, this patch would like to support the DImode by the standard name, with the iterator V_VLSI_D. The below test suites are passed for this patch series. * The rv64gcv fully regr

[PATCH v2] RISC-V: Support RVVDImode for avg3_floor auto vect

2025-07-14 Thread pan2 . li
From: Pan Li The avg3_floor pattern leverage the add and shift rtl with the DOUBLE_TRUNC mode iterator. Aka, RVVDImode iterator will generate avg3rvvsimode_floor, only the element size QI, HI and SI are allowed. Thus, this patch would like to support the DImode by the standard name, with the it

[PATCH v1] RISC-V: Support RVVDImode for avg3_floor auto vect

2025-07-14 Thread pan2 . li
From: Pan Li The avg3_floor pattern leverage the add and shift rtl with the DOUBLE_TRUNC mode iterator. Aka, RVVDImode iterator will generate avg3rvvsimode_floor, only the element size QI, HI and SI are allowed. Thus, this patch would like to support the DImode by the standard name, with the it

[PATCH v2 2/2] RISC-V: Add testcase for rv32 SAT_MUL from uint64

2025-07-12 Thread pan2 . li
From: Pan Li Add the run and asm testcase for rv32 SAT_MUL, widen mul from uint8_t, uint16_t, uint32_t to uint64_t. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat/sat_u_mul-1-u16-from-u64.c: New test. * gcc.target/riscv/sat/sat_u_mul-1-u32-from-u64.c: New test. * gcc.ta

[PATCH v2 1/2] Match: Refine the widen mul check for SAT_MUL pattern

2025-07-12 Thread pan2 . li
From: Pan Li The widen mul will have source type from N-bits to dest type 2N-bits. The previous check only focus on the HOST_WIDE_INT but not working for QI => HI, HI => SI and SI to DImode. Thus, refine the widen mul precision check as dest has twice bits of input. gcc/ChangeLog: * m

[PATCH v2 0/2] Match: Refine the widen mul check for SAT_MUL pattern

2025-07-12 Thread pan2 . li
From: Pan Li The widen mul will have source type from N-bits to dest type 2N-bits. The previous check only focus on the HOST_WIDE_INT but not working for QI => HI, HI => SI and SI => DI. Thus, refine the widen mul precision check, aka dest has twice bits of input. The below test suites are pas

[PATCH v1 2/2] RISC-V: Add testcase for rv32 SAT_MUL from uint64

2025-07-10 Thread pan2 . li
From: Pan Li Add the run and asm testcase for rv32 SAT_MUL, widen mul from uint8_t, uint16_t, uint32_t to uint64_t. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat/sat_u_mul-1-u16-from-u64.c: New test. * gcc.target/riscv/sat/sat_u_mul-1-u32-from-u64.c: New test. * gcc.ta

[PATCH v1 1/2] Match: Leverage BITS_PER_WORD for unsigned SAT_MUL pattern

2025-07-10 Thread pan2 . li
From: Pan Li The widen mul has different source type for differnt platform, like rv32 or rv64. For rv32, the source of widen mul is 32-bits while 64-bits in rv64. Thus, leverage HOST_WIDE_INT is not that correct and result in the pattern match failures in 32-bits system like rv32. Thus, levera

[PATCH v1 0/2] Refine the unsigned SAT_MUL for 32-bits like rv32

2025-07-10 Thread pan2 . li
From: Pan Li The widen mul has different source type for differnt machines, like rv32 or rv64. The SAT_MUL pattern doesn't works well for backend like rv32 in previous, thus we would like to refine it by BITS_PER_WORD for precision check. The below test suites are passed for this patch: 1. The

[PATCH v1] RISCV: Remove the v extension requirement for sat scalar run test

2025-07-08 Thread pan2 . li
From: Pan Li The sat scalar run test should not require the v extension, thus take rv32 || rv64 instead of riscv_v for the requirement. The below test suites are passed for this patch series. * The rv64gcv fully regression test. * The rv32gcv fully regression test. gcc/testsuite/ChangeLog:

[PATCH v1] RISC-V: Disable uint128_t testcase of SAT_MUL when rv32

2025-07-07 Thread pan2 . li
From: Pan Li The rv32 doesn't support __uint128, and then we will have error like below during test. error: '__int128' is not supported on this target. Thus, we disable the uint128_t related test when rv32. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat/sat_arith.h: Add xlen check fo

[PATCH v1 2/3] RISC-V: Add test for vec_duplicate + vssub.vv combine case 0 with GR2VR cost 0, 2 and 15

2025-07-07 Thread pan2 . li
From: Pan Li Add asm dump check and run test for vec_duplicate + vssub.vv combine to vssub.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto.

[PATCH v1 3/3] RISC-V: Add test for vec_duplicate + vssub.vv combine case 1 with GR2VR cost 0, 1 and 2

2025-07-07 Thread pan2 . li
From: Pan Li Add asm dump check test for vec_duplicate + vssub.vv combine to vssub.vx, with the GR2VR cost is 0, 1 and 2. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c: Add asm check. * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c: Ditto. * gc

[PATCH v1 1/3] RISC-V: Combine vec_duplicate + vssub.vv to vssub.vx on GR2VR cost

2025-07-07 Thread pan2 . li
From: Pan Li This patch would like to combine the vec_duplicate + vssub.vv to the vssub.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if th

[PATCH v1 0/3] RISC-V: Combine vec_duplicate + vssub.vv to vssub.vx on GR2VR cost

2025-07-07 Thread pan2 . li
From: Pan Li This patch would like to introduce the combine of vec_dup + vssub.vv into vssub.vx on the cost value of GR2VR. The late-combine will take place if the cost of GR2VR is zero, or reject the combine if non-zero like 1, 2, 15 in test. There will be two cases for the combine: Case 0:

[PATCH v3 3/3] RISC-V: Add test for vec_duplicate + vsadd.vv combine case 1 with GR2VR cost 0, 1 and 2

2025-07-03 Thread pan2 . li
From: Pan Li Add asm dump check test for vec_duplicate + vsadd.vv combine to vsadd.vx, with the GR2VR cost is 0, 1 and 2. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c: Add asm check. * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c: Ditto. * gc

[PATCH v3 2/3] RISC-V: Add test for vec_duplicate + vsadd.vv combine case 0 with GR2VR cost 0, 2 and 15

2025-07-03 Thread pan2 . li
From: Pan Li Add asm dump check and run test for vec_duplicate + vsadd.vv combine to vsadd.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto.

[PATCH v3 1/3] RISC-V: Combine vec_duplicate + vsadd.vv to vsadd.vx on GR2VR cost

2025-07-03 Thread pan2 . li
From: Pan Li This patch would like to combine the vec_duplicate + vsadd.vv to the vsadd.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if th

[PATCH v3 0/3] RISC-V: Combine vec_duplicate + vsadd.vv to vsadd.vx on GR2VR cost

2025-07-03 Thread pan2 . li
From: Pan Li This patch would like to introduce the combine of vec_dup + vsadd.vv into vsadd.vx on the cost value of GR2VR. The late-combine will take place if the cost of GR2VR is zero, or reject the combine if non-zero like 1, 2, 15 in test. There will be two cases for the combine: Case 0:

[PATCH v3 4/4] RISC-V: Add test cases for unsigned scalar SAT_MUL from uint128_t

2025-07-01 Thread pan2 . li
From: Pan Li Add run and tree-optimized check for unsigned scalar SAT_MUL from uint128_t. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat/sat_arith.h: Add test helper macros. * gcc.target/riscv/sat/sat_arith_data.h: Add test data for run test. * gcc.target/riscv/

[PATCH v3 3/4] RISC-V: Implement unsigned scalar SAT_MUL from uint128_t

2025-07-01 Thread pan2 . li
From: Pan Li This patch would like to implement the SAT_MUL scalar unsigned from uint128_t, aka: NT __attribute__((noinline)) sat_u_mul_##NT##_fmt_1 (NT a, NT b) { uint128_t x = (uint128_t)a * (uint128_t)b; NT max = -1; if (x > (uint128_t)(max)) return max; else

[PATCH v3 0/4] Support unsigned scalar SAT_MUL from uint128_t

2025-07-01 Thread pan2 . li
From: Pan Li This patch series would like to support the unsigned SAT_MUL with the help of uint128_t. Aka: NT __attribute__((noinline)) sat_u_mul_##NT##_fmt_1 (NT a, NT b) { uint128_t x = (uint128_t)a * (uint128_t)b; NT max = -1; if (x > (uint128_t)(max)) return max; else return

[PATCH v3 2/4] Widening-Mul: Support unsigned scalar SAT_MUL form 1

2025-07-01 Thread pan2 . li
From: Pan Li This patch would like to try to match the SAT_MUL during widening-mul pass, aka below pattern. NT __attribute__((noinline)) sat_u_mul_##NT##_fmt_1 (NT a, NT b) { uint128_t x = (uint128_t)a * (uint128_t)b; NT max = -1; if (x > (uint128_t)(max)) return max;

[PATCH v3 1/4] Internal-fn: Introduce new IFN_SAT_MUL for unsigned int

2025-07-01 Thread pan2 . li
From: Pan Li This patch would like to add the middle-end presentation for the unsigend saturation mul. Aka set the result of mul to the max when overflow. Take uint8_t as example, we will have: * SAT_MUL (1, 127) => 127. * SAT_MUL (2, 127) => 254. * SAT_MUL (3, 127) => 255. * SAT_MUL (25

[PATCH v3 4/4] RISC-V: Add test for vec_duplicate + vssubu.vv combine case 1 with GR2VR cost 0, 1 and 2

2025-06-27 Thread pan2 . li
From: Pan Li Add asm dump check test for vec_duplicate + vssubu.vv combine to vssubu.vx, with the GR2VR cost is 0, 1 and 2. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Add asm check for vssubu.vx combine. * gcc.target/riscv/rvv/autovec/vx_vf

[PATCH v3 3/4] RISC-V: Add test for vec_duplicate + vssubu.vv combine case 0 with GR2VR cost 0, 2 and 15

2025-06-27 Thread pan2 . li
From: Pan Li Add asm dump check and run test for vec_duplicate + vssubu.vv combine to vssubu.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.

[PATCH v3 2/4] RISC-V: Reconcile the existing test due to cost model change

2025-06-27 Thread pan2 . li
From: Pan Li The cost model change will make the default cost of vx to 2, thus reconcile the asm check for this change. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c: Update the asm check due to cost model change. * gcc.target/ri

[PATCH v3 1/4] RISC-V: Combine vec_duplicate + vssubu.vv to vssubu.vx on GR2VR cost

2025-06-27 Thread pan2 . li
From: Pan Li This patch would like to combine the vec_duplicate + vssubu.vv to the vssubu.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if

[PATCH v3 0/4] RISC-V: Combine vec_duplicate + vssubu.vv to vssubu.vx on GR2VR cost

2025-06-27 Thread pan2 . li
From: Pan Li This patch would like to introduce the combine of vec_dup + vssubu.vv into vssubu.vx on the cost value of GR2VR. The late-combine will take place if the cost of GR2VR is zero, or reject the combine if non-zero like 1, 2, 15 in test. There will be two cases for the combine: Case 0:

[PATCH v2 2/4] RISC-V: Add test for vec_duplicate + vssubu.vv combine case 0 with GR2VR cost 0, 2 and 15

2025-06-27 Thread pan2 . li
From: Pan Li Add asm dump check and run test for vec_duplicate + vssubu.vv combine to vssubu.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.

[PATCH v2 1/4] RISC-V: Combine vec_duplicate + vssubu.vv to vssubu.vx on GR2VR cost

2025-06-27 Thread pan2 . li
From: Pan Li This patch would like to combine the vec_duplicate + vssubu.vv to the vssubu.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if

[PATCH v2 3/4] RISC-V: Add test for vec_duplicate + vssubu.vv combine case 1 with GR2VR cost 0, 1 and 2

2025-06-27 Thread pan2 . li
From: Pan Li Add asm dump check test for vec_duplicate + vssubu.vv combine to vssubu.vx, with the GR2VR cost is 0, 1 and 2. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Add asm check for vssubu.vx combine. * gcc.target/riscv/rvv/autovec/vx_vf

[PATCH v2 0/4] RISC-V: Combine vec_duplicate + vssubu.vv to vssubu.vx on GR2VR cost

2025-06-27 Thread pan2 . li
From: Pan Li This patch would like to introduce the combine of vec_dup + vssubu.vv into vssubu.vx on the cost value of GR2VR. The late-combine will take place if the cost of GR2VR is zero, or reject the combine if non-zero like 1, 2, 15 in test. There will be two cases for the combine: Case 0:

[PATCH v2 4/4] RISC-V: Reconcile the existing test due to cost model change

2025-06-27 Thread pan2 . li
From: Pan Li The cost model change will make the default cost of vx to 2, thus reconcile the asm check for this change. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c: Update the asm check due to cost model change. * gcc.target/ri

[PATCH v1 2/4] RISC-V: Add test for vec_duplicate + vssubu.vv combine case 0 with GR2VR cost 0, 2 and 15

2025-06-26 Thread pan2 . li
From: Pan Li Add asm dump check and run test for vec_duplicate + vssubu.vv combine to vssubu.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.

[PATCH v1 3/4] RISC-V: Add test for vec_duplicate + vssubu.vv combine case 1 with GR2VR cost 0, 1 and 2

2025-06-26 Thread pan2 . li
From: Pan Li Add asm dump check test for vec_duplicate + vssubu.vv combine to vssubu.vx, with the GR2VR cost is 0, 1 and 2. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Add asm check for vssubu.vx combine. * gcc.target/riscv/rvv/autovec/vx_vf

[PATCH v1 4/4] RISC-V: Reconcile the existing test due to cost model change

2025-06-26 Thread pan2 . li
From: Pan Li The cost model change will make the default cost of vx to 2, thus reconcile the asm check for this change. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c: Update the asm check due to cost model change. * gcc.target/ri

[PATCH v1 1/4] RISC-V: Combine vec_duplicate + vssubu.vv to vssubu.vx on GR2VR cost

2025-06-26 Thread pan2 . li
From: Pan Li This patch would like to combine the vec_duplicate + vssubu.vv to the vssubu.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if

[PATCH v1 0/4] RISC-V: Combine vec_duplicate + vssubu.vv to vssubu.vx on GR2VR cost

2025-06-26 Thread pan2 . li
From: Pan Li This patch would like to introduce the combine of vec_dup + vssubu.vv into vssubu.vx on the cost value of GR2VR. The late-combine will take place if the cost of GR2VR is zero, or reject the combine if non-zero like 1, 2, 15 in test. There will be two cases for the combine: Case 0:

[PATCH v1 3/3] RISC-V: Add test for vec_duplicate + vsaddu.vv combine case 1 with GR2VR cost 0, 1 and 2

2025-06-21 Thread pan2 . li
From: Pan Li Add asm dump check test for vec_duplicate + vsaddu.vv combine to vsaddu.vx, with the GR2VR cost is 0, 1 and 2. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Add asm check for vsaddu.vx combine. * gcc.target/riscv/rvv/autovec/vx_vf

[PATCH v1 2/3] RISC-V: Add test for vec_duplicate + vsaddu.vv combine case 0 with GR2VR cost 0, 2 and 15

2025-06-21 Thread pan2 . li
From: Pan Li Add asm dump check and run test for vec_duplicate + vsaddu.vv combine to vsaddu.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.

[PATCH v1 1/3] RISC-V: Combine vec_duplicate + vsaddu.vv to vsaddu.vx on GR2VR cost

2025-06-20 Thread pan2 . li
From: Pan Li This patch would like to combine the vec_duplicate + vsaddu.vv to the vsaddu.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if

  1   2   3   4   5   6   7   8   >