[PATCH v1] LoongArch: testsuite: Fix gcc.dg/vect/vect-reduc-mul_{1, 2}.c FAIL.

2024-02-01 Thread Li Wei
This FAIL was introduced from r14-6908. The reason is that when merging constant vector permutation implementations, the 128-bit matching situation was not fully considered. In fact, the expansion of 128-bit vectors after merging only supports value-based 4 elements set shuffle, so this time is a

[PATCH v2] LoongArch: Adjust cost of vector_stmt that match multiply-add pattern.

2024-01-26 Thread Li Wei
We found that when only 128-bit vectorization was enabled, 549.fotonik3d_r failed to vectorize effectively. For this reason, we adjust the cost of 128-bit vector_stmt that match the multiply-add pattern to facilitate 128-bit vectorization. The experimental results show that after the modification,

[PATCH v1] LoongArch: Optimize implementation of single-precision floating-point approximate division.

2024-01-24 Thread Li Wei
We found that in the spec17 521.wrf program, some loop invariant code generated from single-precision floating-point approximate division calculation failed to propose a loop. This is because the pseudo-register that stores the intermediate temporary calculation results is rewritten in the

[PATCH v1] LoongArch: Adjust cost of vector_stmt that match multiply-add pattern.

2024-01-24 Thread Li Wei
We found that when only 128-bit vectorization was enabled, 549.fotonik3d_r failed to vectorize effectively. For this reason, we adjust the cost of 128-bit vector_stmt that match the multiply-add pattern to facilitate 128-bit vectorization. The experimental results show that after the modification,

[PATCH v2 2/2] LoongArch: Redundant sign extension elimination optimization 2.

2024-01-11 Thread Li Wei
Eliminate the redundant sign extension that exists after the conditional move when the target register is SImode. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_expand_conditional_move): Adjust. gcc/testsuite/ChangeLog: * gcc.target/loongarch/sign-extend-2.c:

[PATCH v2 1/2] LoongArch: Redundant sign extension elimination optimization.

2024-01-11 Thread Li Wei
We found that the current combine optimization pass in gcc cannot handle the following redundant sign extension situations: (insn 77 76 78 5 (set (reg:SI 143) (plus:SI (subreg/s/u:SI (reg/v:DI 104 [ len ]) 0) (const_int 1 [0x1]))) {addsi3} (expr_list:REG_DEAD (reg/v:DI 104

[PATCH v2] LoongArch: Merge constant vector permuatation implementations.

2023-12-28 Thread Li Wei
There are currently two versions of the implementations of constant vector permutation: loongarch_expand_vec_perm_const_1 and loongarch_expand_vec_perm_const_2. The implementations of the two versions are different. Currently, only the implementation of loongarch_expand_vec_perm_const_1 is used

[PATCH v1] LoongArch: Merge constant vector permuatation implementations.

2023-12-28 Thread Li Wei
There are currently two versions of the implementations of constant vector permutation: loongarch_expand_vec_perm_const_1 and loongarch_expand_vec_perm_const_2. The implementations of the two versions are different. Currently, only the implementation of loongarch_expand_vec_perm_const_1 is used

[PATCH v1] LoongArch: Fixed bug in *bstrins__for_ior_mask template.

2023-12-24 Thread Li Wei
We found that using the latest compiled gcc will cause a miscompare error when running spec2006 400.perlbench test with -flto turned on. After testing, it was found that only the LoongArch architecture will report errors. The first error commit was located through the git bisect command as

[PATCH v1] LoongArch: Remove duplicate definition of CLZ_DEFINED_VALUE_AT_ZERO.

2023-11-27 Thread Li Wei
In the r14-5547 commit, C[LT]Z_DEFINED_VALUE_AT_ZERO were defined at the same time, but in fact, CLZ_DEFINED_VALUE_AT_ZERO has already been defined, so remove the duplicate definition. gcc/ChangeLog: * config/loongarch/loongarch.h (CTZ_DEFINED_VALUE_AT_ZERO): Add description.

[PATCH v1 2/2] LoongArch: Optimize vector constant extract-{even/odd} permutation.

2023-11-27 Thread Li Wei
For vector constant extract-{even/odd} permutation replace the default [x]vshuf instruction combination with [x]vilv{l/h} instruction, which can reduce instructions and improves performance. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_is_odd_extraction):

[PATCH v1 1/2] LoongArch: Accelerate optimization of scalar signed/unsigned popcount.

2023-11-27 Thread Li Wei
In LoongArch, the vector popcount has corresponding instructions, while the scalar does not. Currently, the scalar popcount is calculated through a loop, and the value of a non-power of two needs to be iterated several times, so the vector popcount instruction is considered for optimization.

[PATCH v2] LoongArch: Implement C[LT]Z_DEFINED_VALUE_AT_ZERO

2023-11-16 Thread Li Wei
The LoongArch has defined ctz and clz on the backend, but if we want GCC do CTZ transformation optimization in forwprop2 pass, GCC need to know the value of c[lt]z at zero, which may be beneficial for some test cases (like spec2017 deepsjeng_r). After implementing the macro, we test dynamic

[PATCH v1] LoongArch: Implement C[LT]Z_DEFINED_VALUE_AT_ZERO

2023-11-16 Thread Li Wei
The LoongArch has defined ctz and clz on the backend, but if we want GCC do CTZ transformation optimization in forwprop2 pass, GCC need to know the value of c[lt]z at zero, which may be beneficial for some test cases (like spec2017 deepsjeng_r). After implementing the macro, we test dynamic

[PATCH v1] LoongArch: Adjust the vector cost model for better performance

2023-09-18 Thread Li Wei
gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_builtin_vectorization_cost): --- gcc/config/loongarch/loongarch.cc | 21 ++--- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc