Re: [pushed][PATCH v2] LoongArch: Add code generation support for call36 function calls.

2023-11-18 Thread Xi Ruoyao
cc.target/loongarch/func-call-medium-4.c scan-assembler test2:.*la.local\\t.*l\\n\\tjirl Some strange thing is happening: with -mexplicit-relocs=auto or always I get pcalau12i + jirl as expected, but with -mexplicit-relocs=none I get "pcaddu18i $r1,%call36(g)" and jirl. This seems irony (!). -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH] LoongArch: Optimize the loading of immediate numbers with the same high and low 32-bit values

2023-11-18 Thread Xi Ruoyao
e << 32" may trigger a left-shift of negative value. C++11 doesn't allow shifting left any negative value. Yes it's allowed as a GCC extension and it's also allowed by C++23, but GCC codebase is still C++11. So it may break GCC if bootstrapping from a different compiler, and --with-build-config=bootstrap-ubsan will complain. Otherwise LGTM. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH] LoongArch: Add libstdc++ check-abi support.

2023-11-18 Thread Xi Ruoyao
re register is used as TP on this target. But anyway TLS may be disabled via --disable-tls, though I don't know it this configuration really works on loongarch64-linux-gnu (nobody have really tested it, I guess). -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

[PATCH] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]

2023-11-17 Thread Xi Ruoyao
The usage LSX and LASX frint/ftint instructions had some problems: 1. These instructions raises FE_INEXACT, which is not allowed with -fno-fp-int-builtin-inexact for most C2x section F.10.6 functions (the only exceptions are rint, lrint, and llrint). 2. The "frint" instruction without

[PATCH v2 3/6] LoongArch: Add evolution features of base ISA revisions

2023-11-17 Thread Xi Ruoyao
* config/loongarch/loongarch-def.h: (loongarch_isa_base_features): Declare. Define it in ... * config/loongarch/loongarch-cpu.cc (loongarch_isa_base_features): ... here. (fill_native_cpu_config): If we know the base ISA of the CPU model from PRID,

[PATCH v2 4/6] LoongArch: Take the advantage of -mdiv32 if it's enabled

2023-11-17 Thread Xi Ruoyao
With -mdiv32, we can assume div.w[u] and mod.w[u] works on low 32 bits of a 64-bit GPR even if it's not sign-extended. gcc/ChangeLog: * config/loongarch/loongarch.md (DIV): New mode iterator. (3): Don't expand if TARGET_DIV32. (di3_fake): Disable if TARGET_DIV32.

[PATCH v2 6/6] LoongArch: Add fine-grained control for LAM_BH and LAMCAS

2023-11-17 Thread Xi Ruoyao
gcc/ChangeLog: * config/loongarch/genopts/isa-evolution.in: (lam-bh, lamcas): Add. * config/loongarch/loongarch-str.h: Regenerate. * config/loongarch/loongarch.opt: Regenerate. * config/loongarch/loongarch-cpucfg-map.h: Regenerate. *

[PATCH v2 5/6] LoongArch: Don't emit dbar 0x700 if -mld-seq-sa

2023-11-17 Thread Xi Ruoyao
This option (CPUCFG word 0x3 bit 23) means "the hardware guarantee that two loads on the same address won't be reordered with each other". Thus we can omit the "load-load" barrier dbar 0x700. This is only a micro-optimization because dbar 0x700 is already treated as nop if the hardware supports

[PATCH v2 2/6] LoongArch: genopts: Add infrastructure to generate code for new features in ISA evolution

2023-11-17 Thread Xi Ruoyao
LoongArch v1.10 introduced the concept of ISA evolution. During ISA evolution, many independent features can be added and enumerated via CPUCFG. Add a data file into genopts storing the CPUCFG word, bit, the name of the command line option controlling if this feature should be used for

[PATCH v2 1/6] LoongArch: Fix internal error running "gcc -march=native" on LA664

2023-11-17 Thread Xi Ruoyao
On LA664, the PRID preset is ISA_BASE_LA64V110 but the base architecture is guessed ISA_BASE_LA64V100. This causes a warning to be outputed: cc1: warning: base architecture 'la64' differs from PRID preset '?' But we've not set the "?" above in loongarch_isa_base_strings, thus it's a nullptr

[PATCH v2 0/6] Add LoongArch v1.1 div32 and ld-seq-sa support

2023-11-17 Thread Xi Ruoyao
erbose-asm. It's helpful for testing and debugging. Xi Ruoyao (6): LoongArch: Fix internal error running "gcc -march=native" on LA664 LoongArch: genopts: Add infrastructure to generate code for new features in ISA evolution LoongArch: Add evolution features of base ISA revisio

Re: [PATCH v1 1/3] LoongArch: Add LA664 support.

2023-11-17 Thread Xi Ruoyao
kahead[N_TUNE_TYPES] = { const char* loongarch_isa_base_strings[N_ISA_BASE_TYPES] = { [ISA_BASE_LA64V100] = STR_ISA_BASE_LA64V100, + [ISA_BASE_LA64V110] = STR_ISA_BASE_LA64V110, }; const char* -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v1 0/3] Add LoongarchV1.1 instructions support.

2023-11-17 Thread Xi Ruoyao
    |   7 +- >  gcc/config/loongarch/loongarch.opt    |   3 + >  gcc/config/loongarch/sync.md  | 256 ++--- > - >  12 files changed, 263 insertions(+), 67 deletions(-) I'll rebase my patches for div32 and ld-seq-sa on top of this. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH 0/5] LoongArch: Initial LA664 support

2023-11-16 Thread Xi Ruoyao
ction set, what do you think? I'll add it too. I had misread section 1.5 paragraph 1 of the spec so I didn't consider this a good idea, but after reading it again I think it should be added. > 在 2023/11/16 下午9:18, Xi Ruoyao 写道: > > Loongson 3A6000 processor will be shipped to

[PATCH 5/5] LoongArch: Add -march=la664 and -mtune=la664

2023-11-16 Thread Xi Ruoyao
Allow using -march=la664 and -mtune=la664. -march=la664 implies -mdiv32 and -mld-seq-sa. -mtune=la664 is currently same as -mtune=la464 and it may need an update later. gcc/ChangeLog: * config/loongarch/genopts/loongarch-strings: Add la664 as STR_CPU_LA664. *

[PATCH 1/5] LoongArch: Switch loongarch-def to C++

2023-11-16 Thread Xi Ruoyao
We'll use HOST_WIDE_INT in LoongArch static properties in following patches. Switch loongarch-def from C to C++ to make it possible. To keep the same readability as C99 designated initializers, create a std::array like data structure with position setter function, and add field setter functions

[PATCH 2/5] LoongArch: genopts: Add infrastructure to generate code for new features in ISA evolution

2023-11-16 Thread Xi Ruoyao
LoongArch v1.10 introduced the concept of ISA evolution. During ISA evolution, many independent features can be added and enumerated via CPUCFG. Add a data file into genopts storing the CPUCFG word, bit, the name of the command line option controlling if this feature should be used for

[PATCH 4/5] LoongArch: Don't emit dbar 0x700 if -mld-seq-sa

2023-11-16 Thread Xi Ruoyao
This option (CPUCFG word 0x3 bit 23) means "the hardware guarantee that two loads on the same address won't be reordered with each other". Thus we can omit the "load-load" barrier dbar 0x700. This is only a micro-optimization because dbar 0x700 is already treated as nop if the hardware supports

[PATCH 3/5] LoongArch: Take the advantage of -mdiv32 if it's enabled

2023-11-16 Thread Xi Ruoyao
With -mdiv32, we can assume div.w[u] and mod.w[u] works on low 32 bits of a 64-bit GPR even if it's not sign-extended. gcc/ChangeLog: * config/loongarch/loongarch.md (DIV): New mode iterator. (3): Don't expand if TARGET_DIV32. (di3_fake): Disable if TARGET_DIV32.

[PATCH 0/5] LoongArch: Initial LA664 support

2023-11-16 Thread Xi Ruoyao
results later. Bootstrapped and regtested on a LA664 with BOOT_CFLAGS="-march=la664 -O2", a LA464 with BOOT_CFLAGS="-march=native -O2". And manually verified -march=native probing on LA664 and LA464. Xi Ruoyao (5): LoongArch: Switch loongarch-def to C++ LoongArch: genopts

Re: [PATCH v1] LoongArch: Implement C[LT]Z_DEFINED_VALUE_AT_ZERO

2023-11-16 Thread Xi Ruoyao
} */ > + > +/* { dg-final { scan-tree-dump-times {= \.CTZ} 4 "forwprop2" { target { > loongarch64*-*-* } } } } */ > +/* { dg-final { scan-assembler-times "ctz.d\t" 1 { target { loongarch64*-*-* > } } } } */ > +/* { dg-final { scan-assembler-times "ctz.w\t&

Re: [PATCH] LoongArch: Fix scan-assembler-times of lasx/lsx test case.

2023-11-16 Thread Xi Ruoyao
.sle\.d} 6 } } */ > +/* { dg-final { scan-assembler-times {\tvfcmp\.cor\.s} 3 } } */ > +/* { dg-final { scan-assembler-times {\tvfcmp\.cor\.d} 3 } } */ > +/* { dg-final { scan-assembler-times {\tvfcmp\.cun\.s} 3 } } */ > +/* { dg-final { scan-assembler-times {\tvfcmp\.cun\.d} 3 } } */ > +/*

Re: [PATCH v2] LoongArch: Remove redundant barrier instructions before LL-SC loops

2023-11-16 Thread Xi Ruoyao
On Thu, 2023-11-16 at 09:18 +0800, chenglulu wrote: > > 在 2023/11/15 下午7:38, Xi Ruoyao 写道: > > Pushed r14-5486. > > > > /* snip */ > > > > > > * gcc.target/loongarch/cas-acquire.c: New test. > > This test fails with GCC 12/13 on LA664, an

Re: [PATCH v3 0/2] Replace intl/ with out-of-tree GNU gettext

2023-11-15 Thread Xi Ruoyao
RGET +#undef HAVE_DCGETTEXT +#endif +/* Define if the GNU gettext() function is already present or preinstalled. */ +#ifndef USED_FOR_TARGET +#undef HAVE_GETTEXT +#endif I don't know if they are related to the issue on AIX though. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v2] LoongArch: Remove redundant barrier instructions before LL-SC loops

2023-11-15 Thread Xi Ruoyao
Pushed r14-5486. /* snip */ > > * gcc.target/loongarch/cas-acquire.c: New test. This test fails with GCC 12/13 on LA664, and it indicates a correctness issue. May I backport this patch to 12/13 as well? -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

[PATCH v2] LoongArch: Remove redundant barrier instructions before LL-SC loops

2023-11-14 Thread Xi Ruoyao
This is isomorphic to the LLVM changes [1-2]. On LoongArch, the LL and SC instructions has memory barrier semantics: - LL: + - SC: + But the compare and swap operation is allowed to fail, and if it fails the SC instruction is not executed, thus the guarantee of acquiring semantics cannot be

Re: [PATCH] Only allow (copysign x, NEG_CONST) -> (fneg (fabs x)) simplification for constant folding [PR112483]

2023-11-14 Thread Xi Ruoyao
t. > And I wonder when that happens - I suppose when op0 is CONST_DOUBLE only? Yes, it's Andrew's intention. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v1] LoongArch: Added code generation support for call36 function calls.

2023-11-14 Thread Xi Ruoyao
On Wed, 2023-11-15 at 04:42 +0800, Xi Ruoyao wrote: > > There seems a better solution as suggested by the GCC internal doc. > > Section 18.9.16 mentions -fipa-ra: > > > >  -- Target Hook: bool TARGET_CALL_FUSAGE_CONTAINS_NON_CALLEE_CLOBBERS > > Set to true if e

Re: [PATCH v1] LoongArch: Added code generation support for call36 function calls.

2023-11-14 Thread Xi Ruoyao
On Wed, 2023-11-15 at 04:26 +0800, Xi Ruoyao wrote: > On Tue, 2023-11-14 at 20:46 +0800, chenglulu wrote: > > > > 在 2023/11/14 下午5:55, Xi Ruoyao 写道: > > > On Tue, 2023-11-14 at 17:45 +0800, Lulu Cheng wrote: > > > > +  /* When function calls

Re: [PATCH v1] LoongArch: Added code generation support for call36 function calls.

2023-11-14 Thread Xi Ruoyao
On Tue, 2023-11-14 at 20:46 +0800, chenglulu wrote: > > 在 2023/11/14 下午5:55, Xi Ruoyao 写道: > > On Tue, 2023-11-14 at 17:45 +0800, Lulu Cheng wrote: > > > +  /* When function calls are made through call36, t0 register > > > will be > > > + implicitly mod

[PATCH] Only allow (copysign x, NEG_CONST) -> (fneg (fabs x)) simplification for constant folding [PR112483]

2023-11-14 Thread Xi Ruoyao
From: Andrew Pinski On targets with native copysign instructions, (copysign x, -1) is usually more efficient than (fneg (fabs x)). Since r14-5284, in the middle end we always optimize (fneg (fabs x)) to (copysign x, -1), not vice versa. If the target does not support native fcopysign,

Re: [PATCH v1] LoongArch: Added code generation support for call36 function calls.

2023-11-14 Thread Xi Ruoyao
->x_flag_ipa_ra = 0; > + break; Maybe we can add a (clobber (reg:P 12)) to the related insns instead? -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH] LoongArch: Disable relaxation if the assembler don't support conditional branch relaxation [PR112330]

2023-11-14 Thread Xi Ruoyao
Ping. I've tested this with Binutils 2.41 and 2.41.50.202311xx several times so it should be OK. On Mon, 2023-11-06 at 15:50 +0800, Xi Ruoyao wrote: /* snip */ > Bootstrapped and regtested on loongarch64-linux-gnu twice: once with > Binutils 2.41, another with Binutils 2.41.50.20

Pushed: [PATCH v2] LoongArch: Use finer-grained DBAR hints

2023-11-14 Thread Xi Ruoyao
UME gcc/config gcc/config/aarch64/aarch64.cc gcc/config/riscv/riscv.cc gcc/config/ia64/ia64.cc gcc/config/ia64/sync.md gcc/config/gcn/gcn.md gcc/config/loongarch/loongarch.cc gcc/config/rs6000/rs6000.cc gcc/config/rs6000/sync.md gcc/config/nvptx/nvptx.cc Maybe all of them are redundant? -- Xi Ru

[PATCH] LoongArch: Use finer-grained DBAR hints

2023-11-13 Thread Xi Ruoyao
LA664 defines DBAR hints 0x1 - 0x1f (except 0xf and 0x1f) as follows [1-2]: - Bit 4: kind of constraint (0: completion, 1: ordering) - Bit 3: barrier for previous read (0: true, 1: false) - Bit 2: barrier for previous write (0: true, 1: false) - Bit 1: barrier for succeeding read (0: true, 1:

[PATCH] LoongArch: Handle vectorized copysign (x, -1) expansion efficiently

2023-11-13 Thread Xi Ruoyao
With LSX or LASX, copysign (x[i], -1) (or any negative constant) can be vectorized using [x]vbitseti.{w/d} instructions to directly set the signbits. Inspired by Tamar Christina's "AArch64: Handle copysign (x, -1) expansion efficiently" (r14-5289). gcc/ChangeLog: *

Re: [PATCH] LoongArch: Remove redundant barrier instructions before LL-SC loops

2023-11-13 Thread Xi Ruoyao
On Wed, 2023-11-08 at 16:27 +0800, Xi Ruoyao wrote: > On Wed, 2023-11-08 at 09:49 +0800, chenglulu wrote: > > > > 在 2023/11/6 下午7:36, Xi Ruoyao 写道: > > > This is isomorphic to the LLVM changes [1-2]. > > > > > > On LoongArch, the LL and SC

Re: [PATCH] Fix (fcopysign x, NEGATIVE_CONST) -> (fneg (fabs x)) simplification [PR112483]

2023-11-12 Thread Xi Ruoyao
(x)); #endif a = __builtin_copysignf(a, x); asm(""::"f"(a)); } } If DISALLOW_COPYSIGN_OPTIMIZATION is defined, the result is faster for 0.23 seconds. I'll submit another patch to disable this. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

[PATCH] Fix (fcopysign x, NEGATIVE_CONST) -> (fneg (fabs x)) simplification [PR112483]

2023-11-12 Thread Xi Ruoyao
(fcopysign x, NEGATIVE_CONST) can be simplified to (fneg (fabs x)), but a logic error in the code caused it mistakenly simplified to (fneg x) instead. gcc/ChangeLog: PR rtl-optimization/112483 * simplify-rtx.cc (simplify_binary_operation_1) : Fix the simplification of

Re: [PATCH v2] In the pipeline, USE or CLOBBER should delay execution if it starts a new live range.

2023-11-12 Thread Xi Ruoyao
On Sun, 2023-11-12 at 11:02 -0700, Jeff Law wrote: > > > On 11/12/23 10:41, Xi Ruoyao wrote: > > On Sat, 2023-11-11 at 13:12 -0700, Jeff Law wrote: > > > > > > > > > On 8/14/23 05:22, Jin Ma wrote: > > > > CLOBBER and USE does not r

Re: [PATCH v2] In the pipeline, USE or CLOBBER should delay execution if it starts a new live range.

2023-11-12 Thread Xi Ruoyao
ing was done.  Standard practice is > to > do a bootstrap and regression test on a primary platform such as x86, > aarch64, ppc64. > > I went ahead and did a bootstrap and regression test on x86_64, then > pushed this to the trunk. Unfortunately this patch has triggered a bootstrap comparison failure on loongarch64-linux-gnu: https://gcc.gnu.org/PR112497. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

[PATCH] LoongArch: Use simplify_gen_subreg instead of gen_rtx_SUBREG in loongarch_expand_vec_cond_mask_expr [PR112476]

2023-11-11 Thread Xi Ruoyao
GCC internal says: 'subreg's of 'subreg's are not supported. Using 'simplify_gen_subreg' is the recommended way to avoid this problem. Unfortunately loongarch_expand_vec_cond_mask_expr might create nested subreg under certain circumstances, causing an ICE. Use simplify_gen_subreg as

[PATCH v2] LoongArch: Optimize single-used address with -mexplicit-relocs=auto for fld/fst

2023-11-11 Thread Xi Ruoyao
fld and fst have same address mode as ld.w and st.w, so the same optimization as r14-4851 should be applied for them too. gcc/ChangeLog: * config/loongarch/loongarch.md (LD_AT_LEAST_32_BIT): New mode iterator. (ST_ANY): New mode iterator. (define_peephole2): Use

Re: [PATCH] LoongArch: Remove redundant barrier instructions before LL-SC loops

2023-11-08 Thread Xi Ruoyao
On Wed, 2023-11-08 at 09:49 +0800, chenglulu wrote: > > 在 2023/11/6 下午7:36, Xi Ruoyao 写道: > > This is isomorphic to the LLVM changes [1-2]. > > > > On LoongArch, the LL and SC instructions has memory barrier semantics: > > > > - LL: + > > - SC: + &

Re: [PATCH v1] LoongArch: Add modifiers for lsx and lasx.

2023-11-07 Thread Xi Ruoyao
On Tue, 2023-11-07 at 19:10 +0800, Xi Ruoyao wrote: > On Tue, 2023-11-07 at 12:06 +0800, chenxiaolong wrote: > > +__m128i  a,b,c; > > + > > +__asm__ ("vadd.d %w0,%w1,%w2\n\t" > > +   :"=f" (c) > > +   :"f" (a),"f" (b) &g

Re: [PATCH v1] LoongArch: Add modifiers for lsx and lasx.

2023-11-07 Thread Xi Ruoyao
1.c:6:1: error: inconsistent operand constraints in an ‘asm’ 6 | __asm__ ("vadd.d %w0,%w1,%w2\n\t" Please recheck. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

[PATCH] LoongArch: Remove redundant barrier instructions before LL-SC loops

2023-11-06 Thread Xi Ruoyao
This is isomorphic to the LLVM changes [1-2]. On LoongArch, the LL and SC instructions has memory barrier semantics: - LL: + - SC: + But the compare and swap operation is allowed to fail, and if it fails the SC instruction is not executed, thus the guarantee of acquiring semantics cannot be

[PATCH] LoongArch: Optimize single-used address with -mexplicit-relocs=auto for fld/fst

2023-11-05 Thread Xi Ruoyao
fld and fst have same address mode as ld.w and st.w, so the same optimization as r14-4851 should be applied for them too. gcc/ChangeLog: * config/loongarch/loongarch.md (LD_AT_LEAST_32_BIT): New mode iterator. (ST_ANY): New mode iterator. (define_peephole2): Use

[PATCH] LoongArch: Disable relaxation if the assembler don't support conditional branch relaxation [PR112330]

2023-11-05 Thread Xi Ruoyao
As the commit message of r14-4674 has indicated, if the assembler does not support conditional branch relaxation, a relocation overflow may happen on conditional branches when relaxation is enabled because the number of NOP instructions inserted by the assembler will be more than the number

Pushed: [PATCH v2] LoongArch: Define HAVE_AS_TLS to 0 if it's undefined [PR112299]

2023-10-31 Thread Xi Ruoyao
Pushed r14-5030. The subject and ChangeLog are updated to include the PR number. The code change is same as v1. On Mon, 2023-10-30 at 20:44 +0800, chenglulu wrote: > > 在 2023/10/30 下午8:26, Xi Ruoyao 写道: > > On Mon, 2023-10-30 at 19:50 +0800, chenglulu wrote: > > > 在

Re: [PATCH] LoongArch: Define HAVE_AS_TLS to 0 if it's undefined

2023-10-30 Thread Xi Ruoyao
On Mon, 2023-10-30 at 19:50 +0800, chenglulu wrote: > 在 2023/10/30 下午7:42, Xi Ruoyao 写道: > > Now loongarch.md uses HAVE_AS_TLS, we need this to fix the failure > > building a cross compiler if the cross assembler is not installed yet. > > > > gcc/ChangeLog: >

[PATCH] LoongArch: Define HAVE_AS_TLS to 0 if it's undefined

2023-10-30 Thread Xi Ruoyao
Now loongarch.md uses HAVE_AS_TLS, we need this to fix the failure building a cross compiler if the cross assembler is not installed yet. gcc/ChangeLog: * config/loongarch/loongarch-opts.h (HAVE_AS_TLS): Define to 0 if not defined yet. --- Ok for trunk?

Pushed: [PATCH 0/5] LoongArch: Better balance between relaxation and scheduling

2023-10-23 Thread Xi Ruoyao
Pushed r14-{4848..4852}. On Thu, 2023-10-19 at 22:02 +0800, Xi Ruoyao wrote: > For relaxation we are now generating assembler macros for symbolic > addresses everywhere, but this is limiting scheduling and there are > known situations where the relaxation cannot improve the code. >

Re: [PATCH 2/5] LoongArch: Use explicit relocs for GOT access when -mexplicit-relocs=auto and LTO during a final link with linker plugin

2023-10-21 Thread Xi Ruoyao
external symbol c, the linker may relax "la.global c" to "la.local c" (if ab.o is linked together with another file c.o which contains the definition of c) or not. As we cannot exclude the possibility of a relaxation on la.global for incremental linking, just emit la.global and let the

[PATCH 4/5] LoongArch: Use explicit relocs for addresses only used for one load or store with -mexplicit-relocs=auto and -mcmodel={normal, medium}

2023-10-19 Thread Xi Ruoyao
In these cases, if we use explicit relocs, we end up with 2 instructions: pcalau12it0, %pc_hi20(x) ld.d t0, t0, %pc_lo12(x) If we use la.local pseudo-op, in the best scenario (x is in +/- 2MiB range) we still have 2 instructions: pcaddi t0, %pcrel_20(x) ld.d

[PATCH 2/5] LoongArch: Use explicit relocs for GOT access when -mexplicit-relocs=auto and LTO during a final link with linker plugin

2023-10-19 Thread Xi Ruoyao
If we are performing LTO for a final link and linker plugin is enabled, then we are sure any GOT access may resolve to a symbol out of the link unit (otherwise the linker plugin will tell us the symbol should be resolved locally and we'll use PC-relative access instead). Produce machine

[PATCH 0/5] LoongArch: Better balance between relaxation and scheduling

2023-10-19 Thread Xi Ruoyao
the compiler to use explicit relocs for these cases, but assembler macros for other cases. Use it as the default if the assembler supports both explicit relocs and relaxation. LTO-bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? Xi Ruoyao (5): LoongArch: Add enum-style

[PATCH 5/5] LoongArch: Document -mexplicit-relocs={auto,none,always}

2023-10-19 Thread Xi Ruoyao
gcc/ChangeLog: * doc/invoke.texi (-mexplicit-relocs=style): Document. (-mexplicit-relocs): Document as an alias of -mexplicit-relocs=always. (-mno-explicit-relocs): Document as an alias of -mexplicit-relocs=none. (-mcmodel=extreme): Mention

[PATCH 3/5] LoongArch: Use explicit relocs for TLS access with -mexplicit-relocs=auto

2023-10-19 Thread Xi Ruoyao
The linker does not know how to relax TLS access for LoongArch, so let's emit machine instructions with explicit relocs for TLS. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_explicit_relocs_p): Return true for TLS symbol types if -mexplicit-relocs=auto.

[PATCH 1/5] LoongArch: Add enum-style -mexplicit-relocs= option

2023-10-19 Thread Xi Ruoyao
To take a better balance between scheduling and relaxation when -flto is enabled, add three-way -mexplicit-relocs={auto,none,always} options. The old -mexplicit-relocs and -mno-explicit-relocs options are still supported, they are mapped to -mexplicit-relocs=always and -mexplicit-relocs=none. The

Pushed: [PATCH] LoongArch: Use fcmp.caf.s instead of movgr2cf for zeroing a fcc

2023-10-18 Thread Xi Ruoyao
On Wed, 2023-10-18 at 09:34 +0800, chenglulu wrote: > > 在 2023/10/17 下午10:24, WANG Xuerui 写道: > > > > On 10/17/23 22:06, Xi Ruoyao wrote: > > > During the review of a LLVM change [1], on LA464 we found that zeroing > > "an" LLVM change (because t

[PATCH] LoongArch: Use fcmp.caf.s instead of movgr2cf for zeroing a fcc

2023-10-17 Thread Xi Ruoyao
During the review of a LLVM change [1], on LA464 we found that zeroing a fcc with fcmp.caf.s is much faster than a movgr2cf from $r0. [1]: https://github.com/llvm/llvm-project/pull/69300 gcc/ChangeLog: * config/loongarch/loongarch.md (movfcc): Use fcmp.caf.s for zeroing a fcc.

Re: [PATCH] LoongArch: Reimplement multilib build option handling.

2023-10-07 Thread Xi Ruoyao
o. > > P.S. Currently support for "f32" is not active, and it should probably be > avoided if you want to build a working rootfs. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH] Support g++ 4.8 as a host compiler.

2023-10-04 Thread Xi Ruoyao
using g++ 4.8 as a host compiler. AFAIK G++ 5.1 also has a bug (https://gcc.gnu.org/PR65801) breaking building recent GCC. I don't think it's really "maintainable" to ensure current GCC able to be built with a buggy host compiler. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

[PATCH] LoongArch: Replace UNSPEC_FCOPYSIGN with copysign RTL

2023-10-02 Thread Xi Ruoyao
When I added copysign support for LoongArch (r13-3702), we did not have a copysign RTL insn, so I had to use UNSPEC to represent the copysign instruction. Now the copysign RTX code has been added in r14-1586, so this patch removes those UNSPECs, and it uses the native RTL copysign insn. Inspired

Re: [PATCH 0/2] Replace intl/ with out-of-tree GNU gettext

2023-09-25 Thread Xi Ruoyao
mscratch.org/lfs/view/development/chapter08/gcc.html -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v1] Update check_effective_target_vect_int_mod according to LoongArch SX/ASX capabilities.

2023-09-25 Thread Xi Ruoyao
or isa options, but pr104992.c failed because > it expected result with "vect_int_mod returns 1" but it was compiled > without -mlsx/-mlasx. Seems pr104992.c is invoked by gcc.dg/dg.exp, > pr104992.c is not affected by DEFAULT_CFLAGS, so we still need to check > if LSX/LASX is available in vect_int_mod. > > Other parts of new patch is still WIP. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH] LoongArch: doc: Update -m[no-]explicit-relocs for r14-4160

2023-09-25 Thread Xi Ruoyao
On Mon, 2023-09-25 at 16:26 +0800, chenglulu wrote: > LGTM! > > Thank you for your modification! Pushed r14-4250. > 在 2023/9/25 下午4:13, Xi Ruoyao 写道: > > gcc/ChangeLog: > > > > * doc/invoke.texi: Update -m[no-]explicit-relocs for r14-4160. > > --- >

[PATCH] LoongArch: doc: Update -m[no-]explicit-relocs for r14-4160

2023-09-25 Thread Xi Ruoyao
gcc/ChangeLog: * doc/invoke.texi: Update -m[no-]explicit-relocs for r14-4160. --- I've not regtested this as it's only a doc change. Ok for trunk? gcc/doc/invoke.texi | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/gcc/doc/invoke.texi

Re: [PATCH v1] Update check_effective_target_vect_int_mod according to LoongArch SX/ASX capabilities.

2023-09-24 Thread Xi Ruoyao
sx { } { > +    return [check_no_compiler_messages loongarch_asx assembly { > +   #if !defined(__loongarch_asx) > +   #error "LASX not defined" > +   #endif > +    }] > +} > + >  # Appends necessary Python flags to extra-tool-flags if Python.h is > supported. >  # Otherwise, modifies dg-do-what. >  proc dg-require-python-h { args } { -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v7 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces

2023-09-19 Thread Xi Ruoyao
_push (state.defs_list[0]); >       } >     reinsn_del_list.safe_push (curr_cand->insn); >     state.modified[INSN_UID (curr_cand->insn)].deleted = 1; > @@ -1345,6 +1483,10 @@ find_and_remove_re (void) >    for (unsigned int i = 0; i < reinsn_copy_list.length (); i +=

PING^5: [PATCH] rtl-optimization/110939 Really fix narrow comparison of memory and constant

2023-09-19 Thread Xi Ruoyao via Gcc-patches
   unsigned HOST_WIDE_INT n > > > > +   = (unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK (int_mode); > > > >    enum rtx_code adjusted_code; > > > >   > > > >    /* Normalize code to either LEU or GEU.  */ > > > > @@ -12051,15 +12052,15 @@ simplify_compare_const (enum rtx_code code, > > > > machine_mode mode, > > > > HOST_WIDE_INT_PRINT_HEX ") to (MEM %s " > > > > HOST_WIDE_INT_PRINT_HEX ").\n", GET_MODE_NAME > > > > (int_mode), > > > > GET_MODE_NAME (narrow_mode_iter), GET_RTX_NAME (code), > > > > -   (unsigned HOST_WIDE_INT)const_op, GET_RTX_NAME > > > > (adjusted_code), > > > > -   n); > > > > +   (unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK > > > > (int_mode), > > > > +   GET_RTX_NAME (adjusted_code), n); > > > >     } > > > >   poly_int64 offset = (BYTES_BIG_ENDIAN > > > >    ? 0 > > > >    : (GET_MODE_SIZE (int_mode) > > > >   - GET_MODE_SIZE (narrow_mode_iter))); > > > >   *pop0 = adjust_address_nv (op0, narrow_mode_iter, offset); > > > > - *pop1 = GEN_INT (n); > > > > + *pop1 = gen_int_mode (n, narrow_mode_iter); > > > >   return adjusted_code; > > > > } > > > > } > > > > -- > > > > 2.41.0 > > > > > > > > -- > > Xi Ruoyao > > School of Aerospace Science and Technology, Xidian University -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: Question on -fwrapv and -fwrapv-pointer

2023-09-15 Thread Xi Ruoyao via Gcc-patches
defined behavior WRAP-AROUND) only to part > of the program.  And then add -fsnaitize=*overflow to detect all other > Unexpected overflows in the program. > > This is currently missing from GCC, I guess? If overflow is really so rare, we should just enable -fsanitize=signed- integer-overflow globally and special case the code paths where we want wrapping. It's easy in 2023: /* b + c may wrap here because ... ... */ ckd_add(, b, c); Or /* if b + c overflows, we have a severe issue, let's panic even if sanitizer disabled */ if (chk_add(, b, c)) panic("b + c overflows but it shouldn't (b = %d, c = %d)", b, c); -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: Question on -fwrapv and -fwrapv-pointer

2023-09-15 Thread Xi Ruoyao via Gcc-patches
g* could be unintentional and should be warned then. GCC is a compiler, not an advanced AI educating the programmers. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: Question on -fwrapv and -fwrapv-pointer

2023-09-14 Thread Xi Ruoyao via Gcc-patches
we treat them as integers in the brain we'll end up invoking undefined behavior sooner or later. Thus the wrapping/overflowing behavior of pointer is controlled by a different option than integers. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH] LoongArch: gcc: Modify gas uleb128 support test.

2023-09-14 Thread Xi Ruoyao via Gcc-patches
128 and .uleb128], gcc_cv_as_leb128,, +gcc_GAS_CHECK_FEATURE([.sleb128 and .uleb128], gcc_cv_as_leb128, +[$check_leb128_asflags], [ .data .uleb128 L2 - L1 L1: -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH] LoongArch: gcc: Modify gas uleb128 support test.

2023-09-14 Thread Xi Ruoyao via Gcc-patches
&5 > +  (eval $ac_try) 2>&5 > +  ac_status=$? > +  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 > +  test $ac_status = 0 > +  mv conftest conftest.o > +    fi > +esac Phew. Randomly modifying configure and paste the m

Re: [PATCH v3 4/9] LoongArch:Added support for SX vector floating-point instructions.

2023-09-10 Thread Xi Ruoyao via Gcc-patches
The subject should be "Add tests for SX vector floating-point instructions". The "support" has already been added. Likewise for patches 5-9. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

PING^4: [PATCH] rtl-optimization/110939 Really fix narrow comparison of memory and constant

2023-09-10 Thread Xi Ruoyao via Gcc-patches
   unsigned HOST_WIDE_INT n > > > > +   = (unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK (int_mode); > > > >    enum rtx_code adjusted_code; > > > >   > > > >    /* Normalize code to either LEU or GEU.  */ > > > > @@ -12051,15 +12052,15 @@ simplify_compare_const (enum rtx_code code, > > > > machine_mode mode, > > > > HOST_WIDE_INT_PRINT_HEX ") to (MEM %s " > > > > HOST_WIDE_INT_PRINT_HEX ").\n", GET_MODE_NAME > > > > (int_mode), > > > > GET_MODE_NAME (narrow_mode_iter), GET_RTX_NAME (code), > > > > -   (unsigned HOST_WIDE_INT)const_op, GET_RTX_NAME > > > > (adjusted_code), > > > > -   n); > > > > +   (unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK > > > > (int_mode), > > > > +   GET_RTX_NAME (adjusted_code), n); > > > >     } > > > >   poly_int64 offset = (BYTES_BIG_ENDIAN > > > >    ? 0 > > > >    : (GET_MODE_SIZE (int_mode) > > > >   - GET_MODE_SIZE (narrow_mode_iter))); > > > >   *pop0 = adjust_address_nv (op0, narrow_mode_iter, offset); > > > > - *pop1 = GEN_INT (n); > > > > + *pop1 = gen_int_mode (n, narrow_mode_iter); > > > >   return adjusted_code; > > > > } > > > > } > > > > -- > > > > 2.41.0 > > > > > > > > -- > > Xi Ruoyao > > School of Aerospace Science and Technology, Xidian University -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH] LoongArch: Fix up memcpy-vec-3.c test case

2023-09-09 Thread Xi Ruoyao via Gcc-patches
On Sat, 2023-09-09 at 16:21 +0800, chenglulu wrote: > LGTM! Pushed r14-3821. > 在 2023/9/9 下午4:20, Xi Ruoyao 写道: > > The generic code will split 16-byte copy into two 8-byte copies, so the > > vector code wouldn't be used even if -mno-strict-align.  This > > contr

[PATCH] LoongArch: Fix up memcpy-vec-3.c test case

2023-09-09 Thread Xi Ruoyao via Gcc-patches
The generic code will split 16-byte copy into two 8-byte copies, so the vector code wouldn't be used even if -mno-strict-align. This contradicted with the purpose of this test case. gcc/testsuite/ChangeLog: * gcc.target/loongarch/memcpy-vec-3.c: Increase the amount of copied

Re: [PATCH v1] LoongArch: Fix bug of 'di3_fake'.

2023-09-09 Thread Xi Ruoyao via Gcc-patches
gt; +    struct { > +   unsigned char offset; > +   unsigned char size; > +    } args[384]; > +}; > + > +struct isel_context { > +    const struct ac_shader_args* args; > +    int arg_temps[384]; > +}; > + > + > +void > +add_startpgm (struct isel_context* ctx, unsigned short arg_count) > +{ > + > +  for (unsigned i = 0, arg = 0; i < arg_count; i++) > +    { > +  unsigned size = ctx->args->args[i].size; > +  unsigned reg = ctx->args->args[i].offset; > + > +  if (reg % ( 4 < util_next_power_of_two (size) > +    ? 4 : util_next_power_of_two (size))) > + ctx->arg_temps[i] = create_vec_from_array (); > +    } > +} > + -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH] LoongArch: Use LSX and LASX for block move

2023-09-09 Thread Xi Ruoyao via Gcc-patches
On Sat, 2023-09-09 at 15:14 +0800, chenglulu wrote: > > 在 2023/9/9 下午3:06, Xi Ruoyao 写道: > > On Sat, 2023-09-09 at 15:04 +0800, chenglulu wrote: > > > Hi,RuoYao: > > > > > >    I think the test example memcpy-vec-3.c submitted in r14-3818 is > > &

Re: [PATCH] LoongArch: Use LSX and LASX for block move

2023-09-09 Thread Xi Ruoyao via Gcc-patches
-align', so no vector load instructions > will be generated. Yes, in this case we cannot use vst because we don't know if b is aligned. Thus a { scan-assembler-not "vst" } guarantees that. Or am I understanding something wrongly here? -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Pushed: [PATCH] LoongArch: Slightly simplify loongarch_block_move_straight

2023-09-09 Thread Xi Ruoyao via Gcc-patches
Pushed r14-3819. On Sat, 2023-09-09 at 14:16 +0800, chenglulu wrote: > > 在 2023/9/8 上午12:33, Xi Ruoyao 写道: > > gcc/ChangeLog: > > > > * config/loongarch/loongarch.cc > > (loongarch_block_move_straight): > > Check precondition

Pushed: [PATCH v2] LoongArch: Use LSX and LASX for block move

2023-09-09 Thread Xi Ruoyao via Gcc-patches
Pushed r14-3818 with test cases added. The pushed patch is attached. On Sat, 2023-09-09 at 14:10 +0800, chenglulu wrote: > > 在 2023/9/8 上午12:14, Xi Ruoyao 写道: > > gcc/ChangeLog: > > > > * config/loongarch/loongarch.h (LARCH_MAX_MOVE_PER_INSN): > >   

Re: [PATCH v3 1/4] LoongArch: improved target configuration interface

2023-09-09 Thread Xi Ruoyao via Gcc-patches
n't see real consequences to this unless you have a build script > that relieas on the path of libgcc.a / startfile, which can still (and > should) be revised using $(gcc --print-multi-dir). I guess I can live with it. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v3 1/4] LoongArch: improved target configuration interface

2023-09-08 Thread Xi Ruoyao via Gcc-patches
ilib configuration, esp. today most LoongArch users don't need multilib at all? -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH] LoongArch: Enable -fsched-pressure by default at -O1 and higher.

2023-09-08 Thread Xi Ruoyao via Gcc-patches
mmon/" ? My bad. I didn't realized the file has been moved to common. Don't change it :(. > Thanks for the review. > > > 在 2023/9/8 下午4:06, Xi Ruoyao 写道: > > On Fri, 2023-09-08 at 10:00 +0800, Guo Jie wrote: > > > gcc/ChangeLog: > > > > > >   

Re: [PATCH] LoongArch: Enable -fsched-pressure by default at -O1 and higher.

2023-09-08 Thread Xi Ruoyao via Gcc-patches
@ static const struct default_options > loongarch_option_optimization_table[] = >    { OPT_LEVELS_ALL, OPT_fasynchronous_unwind_tables, NULL, 1 }, >    { OPT_LEVELS_1_PLUS, OPT_fsection_anchors, NULL, 1 }, >    { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 }, > +  { OPT_LEVELS_1_PLUS, OPT_fsched_

[PATCH] LoongArch: Slightly simplify loongarch_block_move_straight

2023-09-07 Thread Xi Ruoyao via Gcc-patches
gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_block_move_straight): Check precondition (delta must be a power of 2) and use popcount_hwi instead of a homebrew loop. --- I've not run a full bootstrap with this, but it should be obvious. Ok for trunk?

[PATCH] LoongArch: Use LSX and LASX for block move

2023-09-07 Thread Xi Ruoyao via Gcc-patches
gcc/ChangeLog: * config/loongarch/loongarch.h (LARCH_MAX_MOVE_PER_INSN): Define to the maximum amount of bytes able to be loaded or stored with one machine instruction. * config/loongarch/loongarch.cc (loongarch_mode_for_move_size): New static function.

Re: [PATCH v3 1/4] LoongArch: improved target configuration interface

2023-09-07 Thread Xi Ruoyao via Gcc-patches
On Thu, 2023-09-07 at 17:47 +0800, Xi Ruoyao wrote: /* snip */ > I've made some local experiment too, I think we can add a "-mbuild- > multilib" option which does nothing but in the hacked spec we can wrap > the line in %{mbuild-multilib:...}: > > %{mbuild-multilib:%

Re: [PATCH v3 1/4] LoongArch: improved target configuration interface

2023-09-07 Thread Xi Ruoyao via Gcc-patches
On Thu, 2023-09-07 at 17:31 +0800, Yang Yujie wrote: > > This is bad.  It makes BOOT_CFLAGS=-mlasx or CFLAGS_FOR_TARGET=-mlasx > > silently ignored so we cannot test a LSX/LASX or vectorizer change with > > them. > > > > Why do we need to purge all user-specified -m options here? > > Yes, that

Re: [PATCH v3 1/4] LoongArch: improved target configuration interface

2023-09-07 Thread Xi Ruoyao via Gcc-patches
On Wed, 2023-09-06 at 09:04 +0800, Yang Yujie wrote: > On Tue, Sep 05, 2023 at 09:31:56PM +0800, Xi Ruoyao wrote: > > On Thu, 2023-08-31 at 20:48 +0800, Yang Yujie wrote: > > > * Support options for LoongArch SIMD extensions: > > >   new configure options --with-simd={

Re: [PATCH] LoongArch: Use bstrins instruction for (a & ~mask) and (a & mask) | (b & ~mask) [PR111252]

2023-09-07 Thread Xi Ruoyao via Gcc-patches
On Thu, 2023-09-07 at 10:15 +0800, chenglulu wrote: > > 在 2023/9/6 下午6:58, Xi Ruoyao 写道: > > Forgot to mention: I've bootstrapped and regtested this patch on > > loongarch64-linux-gnu (with PR110939 patch applied to unbreak the > > bootstrapping).  Ok for trunk? > &

Re: [PATCH v2 2/4] LoongArch: Add testsuite framework for Loongson SX/ASX.

2023-09-07 Thread Xi Ruoyao via Gcc-patches
    >   \ > +    if (ref != res){  >   \ > +   printf(" error: %s at line %ld , expected %d, got %d\n",   >   \ > +  __FILE__, line, ref, res);  >   \ > +    } >   \ > +}while(0) -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v1 1/4] LoongArch: Add tests of -mstrict-align option.

2023-09-06 Thread Xi Ruoyao via Gcc-patches
{ dg-do compile } */ > +/* { dg-options "-Ofast -mstrict-align -mlasx" } */ > +/* { dg-final { scan-assembler-not "vfadd.s" } } */ > + > +void > +foo (float* restrict x, float* restrict y) > +{ > +  x[0] = x[0] + y[0]; > +  x[1] = x[1] + y[1]; > +  x[

Re: [PATCH] LoongArch: Use bstrins instruction for (a & ~mask) and (a & mask) | (b & ~mask) [PR111252]

2023-09-06 Thread Xi Ruoyao via Gcc-patches
Forgot to mention: I've bootstrapped and regtested this patch on loongarch64-linux-gnu (with PR110939 patch applied to unbreak the bootstrapping). Ok for trunk? On Wed, 2023-09-06 at 18:46 +0800, Xi Ruoyao wrote: > If mask is a constant with value ((1 << N) - 1) << M

Re: [PATCH v1 4/4] LoongArch: Add tests for Loongson SX floating-point conversion instructions.

2023-09-06 Thread Xi Ruoyao via Gcc-patches
d they will suddenly blow up when GCC optimizer starts to optimize more aggressively based on the aliasing rule. Try not to use these (you can write a helper function to memcpy() into a __m128). Or use -fno-strict-alising in dg-options. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

<    1   2   3   4   5   6   7   8   9   10   >