From: Pan Li
This patch would like to combine the vec_duplicate + vnmsac.vv to the
vnmsac.vx. From example as below code. The related pattern will depend
on the cost of vec_duplicate from GR2VR. Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if
From: Pan Li
Add asm dump check and run test for vec_duplicate + vnmsac.vvm
combine to vnmsac.vx, with the GR2VR cost is 0, 2 and 15.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check
for vnmsac.vx.
* gcc.target/riscv/rvv/autovec/vx_
From: Pan Li
Add asm dump check and run test for vec_duplicate + vnmsac.vvm
combine to vnmsac.vx, with the GR2VR cost is 0, 2 and 15.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check
for vnmsac.vx.
* gcc.target/riscv/rvv/autovec/vx_
From: Pan Li
This patch would like to introduce the combine of vec_dup + vnmsac.vv
into vnmsac.vx on the cost value of GR2VR. The late-combine will take
place if the cost of GR2VRlike 1, 2, 15 in test.
From:
| ...
| vmv.v.x
| L1:
| vnmsac.vv
| J L1
| ...
To:
| ...
| L1:
|
From: Pan Li
The form 4 of unsigned scalar SAT_MUL is covered in middle-expand
alreay, add test case here to cover form 4.
The below test suites are passed for this patch series.
* The rv64gcv fully regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat/sat_arith.h: Add test
From: Pan Li
This patch would like to combine the vec_duplicate + vmacc.vv to the
vmacc.vx. From example as below code. The related pattern will depend
on the cost of vec_duplicate from GR2VR. Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if th
From: Pan Li
Add asm dump check and run test for vec_duplicate + vmacc.vvm
combine to vmacc.vx, with the GR2VR cost is 0, 2 and 15.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check
for vx combine.
* gcc.target/riscv/rvv/autovec/vx_v
From: Pan Li
Add asm dump check and run test for vec_duplicate + vmacc.vvm
combine to vmacc.vx, with the GR2VR cost is 0, 2 and 15.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check
for vx combine.
* gcc.target/riscv/rvv/autovec/vx_v
From: Pan Li
This patch would like to introduce the combine of vec_dup + vmacc.vv
into vmacc.vx on the cost value of GR2VR. The late-combine will take
place if the cost of GR2VR is zero, or reject the combine if non-zero
like 1, 2, 15 in test.
From:
| ...
| vmv.v.x
| L1:
| vmacc.vv
|
From: Pan Li
This patch would like to introduce the combine of vec_dup + vmacc.vv
into vmacc.vx on the cost value of GR2VR. The late-combine will take
place if the cost of GR2VR is zero, or reject the combine if non-zero
like 1, 2, 15 in test.
From:
| ...
| vmv.v.x
| L1:
| vmacc.vv
|
From: Pan Li
This patch would like to combine the vec_duplicate + vmacc.vv to the
vmacc.vx. From example as below code. The related pattern will depend
on the cost of vec_duplicate from GR2VR. Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if th
From: Pan Li
After enable the vmacc.vx by introducing the define_insn, the
below asm need to adjust for this change.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/ternop_vx_constraint-4.c: Adjust
asm check for vx.
* gcc.target/riscv/rvv/base/ternop_vx_constraint-5
From: Pan Li
Add asm dump check and run test for vec_duplicate + vmacc.vvm
combine to vmacc.vx, with the GR2VR cost is 0, 2 and 15.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check
for vx combine.
* gcc.target/riscv/rvv/autovec/vx_v
From: Pan Li
Add asm dump check and run test for vec_duplicate + vmacc.vvm
combine to vmacc.vx, with the GR2VR cost is 0, 2 and 15.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check
for vx combine.
* gcc.target/riscv/rvv/autovec/vx_v
From: Pan Li
Add the missed DONE indicator and update comments
Pan Li (2):
RISC-V: Add missed DONE for vx combine pattern [NFC]
RISC-V: Update the comments of vx combine [NFC]
gcc/config/riscv/autovec-opt.md | 24
1 file changed, 24 insertions(+)
--
2.43.0
From: Pan Li
The supported insn of vx combine is out of date, update all
insn supported for now.
gcc/ChangeLog:
* config/riscv/autovec-opt.md: Add supported insn
of vx combine.
Signed-off-by: Pan Li
---
gcc/config/riscv/autovec-opt.md | 20
1 file changed
From: Pan Li
The previous patch missed the DONE indicator of the vx
combine pattern. Thus add it back.
gcc/ChangeLog:
* config/riscv/autovec-opt.md: Add missed DONE
for vx combine pattern.
Signed-off-by: Pan Li
---
gcc/config/riscv/autovec-opt.md | 4
1 file changed, 4
From: Pan Li
This patch would like to try to match the the unsigned
SAT_MUL form 3, aka below:
#define DEF_SAT_U_MUL_FMT_3(NT, WT) \
NT __attribute__((noinline))\
sat_u_mul_##NT##_from_##WT##_fmt_3 (NT a, NT b) \
{
From: Pan Li
Add run and asm check test cases for scalar unsigned SAT_MUL form 3.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat/sat_u_mul-4-u16-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-4-u16-fro
From: Pan Li
This patch would like to try to match the the unsigned
SAT_MUL form 3, aka below:
#define DEF_SAT_U_MUL_FMT_3(NT, WT) \
NT __attribute__((noinline))\
sat_u_mul_##NT##_from_##WT##_fmt_3 (NT a, NT b) \
{
From: Pan Li
This patch would like to combine the vec_duplicate + vaadd.vv to the
vaadd.vx. From example as below code. The related pattern will depend
on the cost of vec_duplicate from GR2VR. Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if th
From: Pan Li
Add asm dump check and run test for vec_duplicate + vmerge.vvm
combine to vmerge.vxm, with the GR2VR cost is 0, 2 and 15.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: Add test
helper macros.
* gcc.target/riscv/rvv/autovec/vx_vf/
From: Pan Li
This patch would like to introduce the combine of vec_dup + vmerge.vvm
into vmerge.vxm on the cost value of GR2VR. The late-combine will take
place if the cost of GR2VR is zero, or reject the combine if non-zero
like 1, 2, 15 in test.
From:
| ...
| vmv.v.x
| L1:
| vaadd.v
From: Pan Li
The previous cost value for vec_duplicate almost bases on the operators
like add/minus. The rtx_cost function try to match them case by case
and find if it has vec_duplicate, then update the cost values.
It is Ok when we initially add it but looks confused/redundant as more
and mor
From: Pan Li
Add asm dump check and run test for vec_duplicate + vmerge.vvm
combine to vmerge.vxm, with the GR2VR cost is 0, 2 and 15.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: Add test
helper macros.
* gcc.target/riscv/rvv/autovec/vx_vf/
From: Pan Li
This patch would like to combine the vec_duplicate + vaadd.vv to the
vaadd.vx. From example as below code. The related pattern will depend
on the cost of vec_duplicate from GR2VR. Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if th
From: Pan Li
This patch would like to introduce the combine of vec_dup + vmerge.vv
into vmerge.vx on the cost value of GR2VR. The late-combine will take
place if the cost of GR2VR is zero, or reject the combine if non-zero
like 1, 2, 15 in test.
From:
| ...
| vmv.v.x
| L1:
| vaadd.vv
From: Pan Li
For mul_overflow api, we will have PHI node similar as below:
_6 = .MUL_OVERFLOW (a_4(D), b_5(D));
_2 = IMAGPART_EXPR <_6>;
if (_2 != 0)
goto ; [35.00%]
else
goto ; [65.00%]
[local count: 697932184]:
_1 = REALPART_EXPR <_6>;
[local count: 1073741824]:
# _
From: Pan Li
Add run and asm check test cases for scalar unsigned
SAT_MUL form 2.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat/sat_u_mul-3-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-3-u32.c: New test.
From: Pan Li
This patch would like to try to match the the unsigned
SAT_MUL form 2, aka below:
#define DEF_SAT_U_MUL_FMT_2(T) \
T __attribute__((noinline)) \
sat_u_mul_##T##_fmt_2 (T a, T b) \
{\
T
From: Pan Li
This patch would like to try to match the the unsigned
SAT_MUL form 2, aka below:
#define DEF_SAT_U_MUL_FMT_2(T) \
T __attribute__((noinline)) \
sat_u_mul_##T##_fmt_2 (T a, T b) \
{\
T
From: Pan Li
The previous code-gen of scalar unsigned SAT_MUL, aka usmul.
Leverage the mulhs by mistake, it should be mulhu for the
hight bit result of mul. Thus, this patch would like to make
it correct.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_expand_xmode_usmul): Take
u
From: Pan Li
The unsigned avg ceil share the vaaddx.vx for the vx combine,
so add the test case to make sure it works well as expected.
The below test suites are passed for this patch series.
* The rv64gcv fully regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/v
From: Pan Li
The unsigned avg ceil share the vaaddux.vx for the vx combine,
so add the test case to make sure it works well as expected.
The below test suites are passed for this patch series.
* The rv64gcv fully regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/
From: Pan Li
Add run and tree-optimized check for mul based unsigned scalar SAT_MUL
instead of the widen_mul.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat/sat_u_mul-run-1-u16-from-u64.c: Add rv64
target for run.
* gcc.target/riscv/sat/sat_u_mul-run-1-u32-from-u64.c: D
From: Pan Li
Like widen_mul based pattern, we would like introduce the mul based
pattern as well. The pattern is quite simple compares to the
widen_mul, thus add new instead of the for loop in match.pd.
gcc/ChangeLog:
* match.pd: Add mul based unsigned SAT_MUL.
Signed-off-by: Pan Li
From: Pan Li
This patch series would like to support the unsigned SAT_MUL with
the help of mul, instead of the widen_mul. Aka:
NT __attribute__((noinline))
sat_u_mul_##NT##_fmt_1 (NT a, NT b)
{
uint64_t x = (uint64_t)a * (uint64_t)b;
NT max = -1;
if (x > (uint64_t)(max))
return max;
From: Pan Li
The unsigned avg ceil share the vaaddux.vx for the vx combine,
so add the test case to make sure it works well as expected.
The below test suites are passed for this patch series.
* The rv64gcv fully regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/
From: Pan Li
This patch series would like to support the unsigned SAT_MUL with
the help of mul, instead of the widen_mul. Aka:
NT __attribute__((noinline))
sat_u_mul_##NT##_fmt_1 (NT a, NT b)
{
uint64_t x = (uint64_t)a * (uint64_t)b;
NT max = -1;
if (x > (uint64_t)(max))
return max;
From: Pan Li
Add run and tree-optimized check for mul based unsigned scalar SAT_MUL
instead of the widen_mul.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat/sat_u_mul-run-1-u16-from-u64.c: Add rv64
target for run.
* gcc.target/riscv/sat/sat_u_mul-run-1-u32-from-u64.c: D
From: Pan Li
Like widen_mul based pattern, we would like introduce the mul based
pattern as well. The pattern is quite simple compares to the
widen_mul, thus add new instead of the for loop in match.pd.
gcc/ChangeLog:
* match.pd: Add mul based unsigned SAT_MUL.
Signed-off-by: Pan Li
From: Pan Li
Like Robin's fix for vf combine f16.c run tests, there is still
another failures similar. This patch would like to fix it as
previous.
will commit it directly if the CI agrees.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwnmacc-run-1-f16.c:
From: Pan Li
Add asm check to make sure vx combine of vaadd.vx will not pollute
the vxrm.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-fixed-vxrm-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-fixed-vxrm-1-i32.c: New test.
* gcc.target/ris
From: Pan Li
Add asm dump check test for vec_duplicate + vaadd.vv combine to
vaadd.vx, with the GR2VR cost is 0, 1 and 2
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c: Add asm check.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c: Ditto.
* gcc
From: Pan Li
This patch would like to combine the vec_duplicate + vaadd.vv to the
vaadd.vx. From example as below code. The related pattern will depend
on the cost of vec_duplicate from GR2VR. Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if th
From: Pan Li
Add asm dump check and run test for vec_duplicate + vaadd.vv
combine to vaadd.vx, with the GR2VR cost is 0, 2 and 15
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto.
From: Pan Li
This patch would like to introduce the combine of vec_dup + vaadd.vv
into vaadd.vx on the cost value of GR2VR. The late-combine will take
place if the cost of GR2VR is zero, or reject the combine if non-zero
like 1, 2, 15 in test. There will be two cases for the combine:
Case 0:
From: Pan Li
Add asm check to make sure vx combine of vaaddu.vx will not pollute
the vxrm.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-fixed-vxrm-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-fixed-vxrm-1-u32.c: New test.
* gcc.target/ri
From: Pan Li
The vaaddu.vx combine almost comes from avg_floor, it will
requires the vxrm to be RDN. But not all vaaddu.vx should
depends on the RDN. The vaaddu.vx combine should leverage
the VXRM value as is instead of pollute them all to RDN.
This patch would like to fix this and set it as i
From: Pan Li
The RVV fixed point insn VX combine should focus on the insn itself,
instead of any standard name like avg_floor, the vxrm should be
the value of insn as is.
The below test suites are passed for this patch series.
* The rv64gcv fully regression test.
Pan Li (2):
RISC-V: Avoid vaa
From: Pan Li
Add asm dump check and run test for vec_duplicate + vaaddu.vv
combine to vaaddu.vx, with the GR2VR cost is 0, 1, 2 and 15 for
the case 0 and case 1.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Add asm check.
* gcc.target/riscv/rvv/autov
From: Pan Li
Add asm dump check test for vec_duplicate + vaaddu.vv combine to
vaaddu.vx, with the GR2VR cost is 0, 1 and 2. Please note DImode
is not included.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Add asm check.
* gcc.target/riscv/rvv/autove
From: Pan Li
When try to introduce the vaaddu.vx combine for DImode, we will meet
ICE like below:
0x4889763 internal_error(char const*, ...)
.../riscv-gnu-toolchain/gcc/__build__/../gcc/diagnostic-global-context.cc:517
0x4842f98 fancy_abort(char const*, int, char const*)
.../ris
From: Pan Li
Add asm dump check and run test for vec_duplicate + vaaddu.vv
combine to vaaddu.vx, with the GR2VR cost is 0, 2 and 15. Please
note DImode is not included here.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check.
* gcc.target/ri
From: Pan Li
This patch would like to combine the vec_duplicate + vaaddu.vv to the
vaaddu.vx. From example as below code. The related pattern will depend
on the cost of vec_duplicate from GR2VR. Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if
From: Pan Li
This patch would like to introduce the combine of vec_dup + vaaddu.vv
into vaaddu.vx on the cost value of GR2VR. The late-combine will take
place if the cost of GR2VR is zero, or reject the combine if non-zero
like 1, 2, 15 in test. There will be two cases for the combine:
Case 0:
From: Pan Li
According to the semantics of the avg_floor and avg_ceil as below:
floor: op0 = (narrow) (((wide) op1 + (wide) op2) >> 1);
ceil: op0 = (narrow) (((wide) op1 + (wide) op2 + 1) >> 1);
Aka we have (const_int 1) as the op2 of the ashiftrt but seems missed.
Thus, add it back to align t
From: Pan Li
The previous test case doesn't leverage the right test helper macro,
it should be DEF_AVG_0_WRAP instead of DEF_AVG_0. We prefer the
test function name is test_avg_floor_int64_t_int32_t_0 instead
of test_avg_floor_WT_NT_0 for DEF_AVG_0(WT, NT).
The below test suites are passed for
From: Pan Li
Like the avg3_floor pattern, the avg3_ceil has the
similar issue that lack of the RVV DImode support.
Thus, this patch would like to support the DImode by
the standard name, with the iterator V_VLSI_D.
The below test suites are passed for this patch series.
* The rv64gcv fully regr
From: Pan Li
The avg3_floor pattern leverage the add and shift rtl
with the DOUBLE_TRUNC mode iterator. Aka, RVVDImode
iterator will generate avg3rvvsimode_floor, only the
element size QI, HI and SI are allowed.
Thus, this patch would like to support the DImode by
the standard name, with the it
From: Pan Li
The avg3_floor pattern leverage the add and shift rtl
with the DOUBLE_TRUNC mode iterator. Aka, RVVDImode
iterator will generate avg3rvvsimode_floor, only the
element size QI, HI and SI are allowed.
Thus, this patch would like to support the DImode by
the standard name, with the it
From: Pan Li
Add the run and asm testcase for rv32 SAT_MUL, widen mul from
uint8_t, uint16_t, uint32_t to uint64_t.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat/sat_u_mul-1-u16-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-1-u32-from-u64.c: New test.
* gcc.ta
From: Pan Li
The widen mul will have source type from N-bits to
dest type 2N-bits. The previous check only focus on
the HOST_WIDE_INT but not working for QI => HI, HI => SI
and SI to DImode. Thus, refine the widen mul precision
check as dest has twice bits of input.
gcc/ChangeLog:
* m
From: Pan Li
The widen mul will have source type from N-bits to
dest type 2N-bits. The previous check only focus on
the HOST_WIDE_INT but not working for QI => HI, HI => SI
and SI => DI. Thus, refine the widen mul precision
check, aka dest has twice bits of input.
The below test suites are pas
From: Pan Li
Add the run and asm testcase for rv32 SAT_MUL, widen mul from
uint8_t, uint16_t, uint32_t to uint64_t.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat/sat_u_mul-1-u16-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-1-u32-from-u64.c: New test.
* gcc.ta
From: Pan Li
The widen mul has different source type for differnt platform,
like rv32 or rv64. For rv32, the source of widen mul is 32-bits
while 64-bits in rv64. Thus, leverage HOST_WIDE_INT is not that
correct and result in the pattern match failures in 32-bits system
like rv32.
Thus, levera
From: Pan Li
The widen mul has different source type for differnt machines,
like rv32 or rv64. The SAT_MUL pattern doesn't works well for
backend like rv32 in previous, thus we would like to refine it
by BITS_PER_WORD for precision check.
The below test suites are passed for this patch:
1. The
From: Pan Li
The sat scalar run test should not require the v extension, thus
take rv32 || rv64 instead of riscv_v for the requirement.
The below test suites are passed for this patch series.
* The rv64gcv fully regression test.
* The rv32gcv fully regression test.
gcc/testsuite/ChangeLog:
From: Pan Li
The rv32 doesn't support __uint128, and then we will have
error like below during test.
error: '__int128' is not supported on this target.
Thus, we disable the uint128_t related test when rv32.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat/sat_arith.h: Add xlen check fo
From: Pan Li
Add asm dump check and run test for vec_duplicate + vssub.vv
combine to vssub.vx, with the GR2VR cost is 0, 2 and 15.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto.
From: Pan Li
Add asm dump check test for vec_duplicate + vssub.vv combine to
vssub.vx, with the GR2VR cost is 0, 1 and 2.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c: Add asm check.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c: Ditto.
* gc
From: Pan Li
This patch would like to combine the vec_duplicate + vssub.vv to the
vssub.vx. From example as below code. The related pattern will depend
on the cost of vec_duplicate from GR2VR. Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if th
From: Pan Li
This patch would like to introduce the combine of vec_dup + vssub.vv
into vssub.vx on the cost value of GR2VR. The late-combine will take
place if the cost of GR2VR is zero, or reject the combine if non-zero
like 1, 2, 15 in test. There will be two cases for the combine:
Case 0:
From: Pan Li
Add asm dump check test for vec_duplicate + vsadd.vv combine to
vsadd.vx, with the GR2VR cost is 0, 1 and 2.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c: Add asm check.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c: Ditto.
* gc
From: Pan Li
Add asm dump check and run test for vec_duplicate + vsadd.vv
combine to vsadd.vx, with the GR2VR cost is 0, 2 and 15.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto.
From: Pan Li
This patch would like to combine the vec_duplicate + vsadd.vv to the
vsadd.vx. From example as below code. The related pattern will depend
on the cost of vec_duplicate from GR2VR. Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if th
From: Pan Li
This patch would like to introduce the combine of vec_dup + vsadd.vv
into vsadd.vx on the cost value of GR2VR. The late-combine will take
place if the cost of GR2VR is zero, or reject the combine if non-zero
like 1, 2, 15 in test. There will be two cases for the combine:
Case 0:
From: Pan Li
Add run and tree-optimized check for unsigned scalar SAT_MUL from
uint128_t.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat/sat_arith_data.h: Add test data for
run test.
* gcc.target/riscv/
From: Pan Li
This patch would like to implement the SAT_MUL scalar unsigned from
uint128_t, aka:
NT __attribute__((noinline))
sat_u_mul_##NT##_fmt_1 (NT a, NT b)
{
uint128_t x = (uint128_t)a * (uint128_t)b;
NT max = -1;
if (x > (uint128_t)(max))
return max;
else
From: Pan Li
This patch series would like to support the unsigned SAT_MUL with
the help of uint128_t. Aka:
NT __attribute__((noinline))
sat_u_mul_##NT##_fmt_1 (NT a, NT b)
{
uint128_t x = (uint128_t)a * (uint128_t)b;
NT max = -1;
if (x > (uint128_t)(max))
return max;
else
return
From: Pan Li
This patch would like to try to match the SAT_MUL during
widening-mul pass, aka below pattern.
NT __attribute__((noinline))
sat_u_mul_##NT##_fmt_1 (NT a, NT b)
{
uint128_t x = (uint128_t)a * (uint128_t)b;
NT max = -1;
if (x > (uint128_t)(max))
return max;
From: Pan Li
This patch would like to add the middle-end presentation for the
unsigend saturation mul. Aka set the result of mul to the max
when overflow.
Take uint8_t as example, we will have:
* SAT_MUL (1, 127) => 127.
* SAT_MUL (2, 127) => 254.
* SAT_MUL (3, 127) => 255.
* SAT_MUL (25
From: Pan Li
Add asm dump check test for vec_duplicate + vssubu.vv combine to
vssubu.vx, with the GR2VR cost is 0, 1 and 2.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Add asm check
for vssubu.vx combine.
* gcc.target/riscv/rvv/autovec/vx_vf
From: Pan Li
Add asm dump check and run test for vec_duplicate + vssubu.vv
combine to vssubu.vx, with the GR2VR cost is 0, 2 and 15.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.
From: Pan Li
The cost model change will make the default cost of vx to 2, thus
reconcile the asm check for this change.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c:
Update the asm check due to cost model change.
* gcc.target/ri
From: Pan Li
This patch would like to combine the vec_duplicate + vssubu.vv to the
vssubu.vx. From example as below code. The related pattern will depend
on the cost of vec_duplicate from GR2VR. Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if
From: Pan Li
This patch would like to introduce the combine of vec_dup + vssubu.vv
into vssubu.vx on the cost value of GR2VR. The late-combine will take
place if the cost of GR2VR is zero, or reject the combine if non-zero
like 1, 2, 15 in test. There will be two cases for the combine:
Case 0:
From: Pan Li
Add asm dump check and run test for vec_duplicate + vssubu.vv
combine to vssubu.vx, with the GR2VR cost is 0, 2 and 15.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.
From: Pan Li
This patch would like to combine the vec_duplicate + vssubu.vv to the
vssubu.vx. From example as below code. The related pattern will depend
on the cost of vec_duplicate from GR2VR. Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if
From: Pan Li
Add asm dump check test for vec_duplicate + vssubu.vv combine to
vssubu.vx, with the GR2VR cost is 0, 1 and 2.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Add asm check
for vssubu.vx combine.
* gcc.target/riscv/rvv/autovec/vx_vf
From: Pan Li
This patch would like to introduce the combine of vec_dup + vssubu.vv
into vssubu.vx on the cost value of GR2VR. The late-combine will take
place if the cost of GR2VR is zero, or reject the combine if non-zero
like 1, 2, 15 in test. There will be two cases for the combine:
Case 0:
From: Pan Li
The cost model change will make the default cost of vx to 2, thus
reconcile the asm check for this change.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c:
Update the asm check due to cost model change.
* gcc.target/ri
From: Pan Li
Add asm dump check and run test for vec_duplicate + vssubu.vv
combine to vssubu.vx, with the GR2VR cost is 0, 2 and 15.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.
From: Pan Li
Add asm dump check test for vec_duplicate + vssubu.vv combine to
vssubu.vx, with the GR2VR cost is 0, 1 and 2.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Add asm check
for vssubu.vx combine.
* gcc.target/riscv/rvv/autovec/vx_vf
From: Pan Li
The cost model change will make the default cost of vx to 2, thus
reconcile the asm check for this change.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c:
Update the asm check due to cost model change.
* gcc.target/ri
From: Pan Li
This patch would like to combine the vec_duplicate + vssubu.vv to the
vssubu.vx. From example as below code. The related pattern will depend
on the cost of vec_duplicate from GR2VR. Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if
From: Pan Li
This patch would like to introduce the combine of vec_dup + vssubu.vv
into vssubu.vx on the cost value of GR2VR. The late-combine will take
place if the cost of GR2VR is zero, or reject the combine if non-zero
like 1, 2, 15 in test. There will be two cases for the combine:
Case 0:
From: Pan Li
Add asm dump check test for vec_duplicate + vsaddu.vv combine to
vsaddu.vx, with the GR2VR cost is 0, 1 and 2.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Add asm check
for vsaddu.vx combine.
* gcc.target/riscv/rvv/autovec/vx_vf
From: Pan Li
Add asm dump check and run test for vec_duplicate + vsaddu.vv
combine to vsaddu.vx, with the GR2VR cost is 0, 2 and 15.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.
From: Pan Li
This patch would like to combine the vec_duplicate + vsaddu.vv to the
vsaddu.vx. From example as below code. The related pattern will depend
on the cost of vec_duplicate from GR2VR. Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if
1 - 100 of 779 matches
Mail list logo