cc.target/loongarch/func-call-medium-4.c scan-assembler
test2:.*la.local\\t.*l\\n\\tjirl
Some strange thing is happening: with -mexplicit-relocs=auto or always I
get pcalau12i + jirl as expected, but with -mexplicit-relocs=none I get
"pcaddu18i $r1,%call36(g)" and jirl. This seems irony (!).
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
e << 32" may trigger a left-shift of negative
value.
C++11 doesn't allow shifting left any negative value. Yes it's allowed
as a GCC extension and it's also allowed by C++23, but GCC codebase is
still C++11. So it may break GCC if bootstrapping from a different
compiler, and --with-build-config=bootstrap-ubsan will complain.
Otherwise LGTM.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
re register is used as TP on this target. But anyway TLS may be
disabled via --disable-tls, though I don't know it this configuration
really works on loongarch64-linux-gnu (nobody have really tested it, I
guess).
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
The usage LSX and LASX frint/ftint instructions had some problems:
1. These instructions raises FE_INEXACT, which is not allowed with
-fno-fp-int-builtin-inexact for most C2x section F.10.6 functions
(the only exceptions are rint, lrint, and llrint).
2. The "frint" instruction without
* config/loongarch/loongarch-def.h:
(loongarch_isa_base_features): Declare. Define it in ...
* config/loongarch/loongarch-cpu.cc
(loongarch_isa_base_features): ... here.
(fill_native_cpu_config): If we know the base ISA of the CPU
model from PRID,
With -mdiv32, we can assume div.w[u] and mod.w[u] works on low 32 bits
of a 64-bit GPR even if it's not sign-extended.
gcc/ChangeLog:
* config/loongarch/loongarch.md (DIV): New mode iterator.
(3): Don't expand if TARGET_DIV32.
(di3_fake): Disable if TARGET_DIV32.
gcc/ChangeLog:
* config/loongarch/genopts/isa-evolution.in: (lam-bh, lamcas):
Add.
* config/loongarch/loongarch-str.h: Regenerate.
* config/loongarch/loongarch.opt: Regenerate.
* config/loongarch/loongarch-cpucfg-map.h: Regenerate.
*
This option (CPUCFG word 0x3 bit 23) means "the hardware guarantee that
two loads on the same address won't be reordered with each other". Thus
we can omit the "load-load" barrier dbar 0x700.
This is only a micro-optimization because dbar 0x700 is already treated
as nop if the hardware supports
LoongArch v1.10 introduced the concept of ISA evolution. During ISA
evolution, many independent features can be added and enumerated via
CPUCFG.
Add a data file into genopts storing the CPUCFG word, bit, the name
of the command line option controlling if this feature should be used
for
On LA664, the PRID preset is ISA_BASE_LA64V110 but the base architecture
is guessed ISA_BASE_LA64V100. This causes a warning to be outputed:
cc1: warning: base architecture 'la64' differs from PRID preset '?'
But we've not set the "?" above in loongarch_isa_base_strings, thus it's
a nullptr
erbose-asm. It's helpful for testing and debugging.
Xi Ruoyao (6):
LoongArch: Fix internal error running "gcc -march=native" on LA664
LoongArch: genopts: Add infrastructure to generate code for new
features in ISA evolution
LoongArch: Add evolution features of base ISA revisio
kahead[N_TUNE_TYPES]
= {
const char*
loongarch_isa_base_strings[N_ISA_BASE_TYPES] = {
[ISA_BASE_LA64V100] = STR_ISA_BASE_LA64V100,
+ [ISA_BASE_LA64V110] = STR_ISA_BASE_LA64V110,
};
const char*
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
| 7 +-
> gcc/config/loongarch/loongarch.opt | 3 +
> gcc/config/loongarch/sync.md | 256 ++---
> -
> 12 files changed, 263 insertions(+), 67 deletions(-)
I'll rebase my patches for div32 and ld-seq-sa on top of this.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
ction set, what do you think?
I'll add it too. I had misread section 1.5 paragraph 1 of the spec so I
didn't consider this a good idea, but after reading it again I think it
should be added.
> 在 2023/11/16 下午9:18, Xi Ruoyao 写道:
> > Loongson 3A6000 processor will be shipped to
Allow using -march=la664 and -mtune=la664. -march=la664 implies -mdiv32
and -mld-seq-sa. -mtune=la664 is currently same as -mtune=la464 and it
may need an update later.
gcc/ChangeLog:
* config/loongarch/genopts/loongarch-strings: Add la664 as
STR_CPU_LA664.
*
We'll use HOST_WIDE_INT in LoongArch static properties in following
patches. Switch loongarch-def from C to C++ to make it possible.
To keep the same readability as C99 designated initializers, create a
std::array like data structure with position setter function, and add
field setter functions
LoongArch v1.10 introduced the concept of ISA evolution. During ISA
evolution, many independent features can be added and enumerated via
CPUCFG.
Add a data file into genopts storing the CPUCFG word, bit, the name
of the command line option controlling if this feature should be used
for
This option (CPUCFG word 0x3 bit 23) means "the hardware guarantee that
two loads on the same address won't be reordered with each other". Thus
we can omit the "load-load" barrier dbar 0x700.
This is only a micro-optimization because dbar 0x700 is already treated
as nop if the hardware supports
With -mdiv32, we can assume div.w[u] and mod.w[u] works on low 32 bits
of a 64-bit GPR even if it's not sign-extended.
gcc/ChangeLog:
* config/loongarch/loongarch.md (DIV): New mode iterator.
(3): Don't expand if TARGET_DIV32.
(di3_fake): Disable if TARGET_DIV32.
results later.
Bootstrapped and regtested on a LA664 with BOOT_CFLAGS="-march=la664
-O2", a LA464 with BOOT_CFLAGS="-march=native -O2". And manually
verified -march=native probing on LA664 and LA464.
Xi Ruoyao (5):
LoongArch: Switch loongarch-def to C++
LoongArch: genopts
} */
> +
> +/* { dg-final { scan-tree-dump-times {= \.CTZ} 4 "forwprop2" { target {
> loongarch64*-*-* } } } } */
> +/* { dg-final { scan-assembler-times "ctz.d\t" 1 { target { loongarch64*-*-*
> } } } } */
> +/* { dg-final { scan-assembler-times "ctz.w\t&
.sle\.d} 6 } } */
> +/* { dg-final { scan-assembler-times {\tvfcmp\.cor\.s} 3 } } */
> +/* { dg-final { scan-assembler-times {\tvfcmp\.cor\.d} 3 } } */
> +/* { dg-final { scan-assembler-times {\tvfcmp\.cun\.s} 3 } } */
> +/* { dg-final { scan-assembler-times {\tvfcmp\.cun\.d} 3 } } */
> +/*
On Thu, 2023-11-16 at 09:18 +0800, chenglulu wrote:
>
> 在 2023/11/15 下午7:38, Xi Ruoyao 写道:
> > Pushed r14-5486.
> >
> > /* snip */
> >
> > > > * gcc.target/loongarch/cas-acquire.c: New test.
> > This test fails with GCC 12/13 on LA664, an
RGET
+#undef HAVE_DCGETTEXT
+#endif
+/* Define if the GNU gettext() function is already present or preinstalled. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_GETTEXT
+#endif
I don't know if they are related to the issue on AIX though.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
Pushed r14-5486.
/* snip */
> > * gcc.target/loongarch/cas-acquire.c: New test.
This test fails with GCC 12/13 on LA664, and it indicates a correctness
issue. May I backport this patch to 12/13 as well?
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
This is isomorphic to the LLVM changes [1-2].
On LoongArch, the LL and SC instructions has memory barrier semantics:
- LL: +
- SC: +
But the compare and swap operation is allowed to fail, and if it fails
the SC instruction is not executed, thus the guarantee of acquiring
semantics cannot be
t.
> And I wonder when that happens - I suppose when op0 is CONST_DOUBLE only?
Yes, it's Andrew's intention.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
On Wed, 2023-11-15 at 04:42 +0800, Xi Ruoyao wrote:
> > There seems a better solution as suggested by the GCC internal doc.
> > Section 18.9.16 mentions -fipa-ra:
> >
> > -- Target Hook: bool TARGET_CALL_FUSAGE_CONTAINS_NON_CALLEE_CLOBBERS
> > Set to true if e
On Wed, 2023-11-15 at 04:26 +0800, Xi Ruoyao wrote:
> On Tue, 2023-11-14 at 20:46 +0800, chenglulu wrote:
> >
> > 在 2023/11/14 下午5:55, Xi Ruoyao 写道:
> > > On Tue, 2023-11-14 at 17:45 +0800, Lulu Cheng wrote:
> > > > + /* When function calls
On Tue, 2023-11-14 at 20:46 +0800, chenglulu wrote:
>
> 在 2023/11/14 下午5:55, Xi Ruoyao 写道:
> > On Tue, 2023-11-14 at 17:45 +0800, Lulu Cheng wrote:
> > > + /* When function calls are made through call36, t0 register
> > > will be
> > > + implicitly mod
From: Andrew Pinski
On targets with native copysign instructions, (copysign x, -1) is
usually more efficient than (fneg (fabs x)). Since r14-5284, in the
middle end we always optimize (fneg (fabs x)) to (copysign x, -1), not
vice versa. If the target does not support native fcopysign,
->x_flag_ipa_ra = 0;
> + break;
Maybe we can add a (clobber (reg:P 12)) to the related insns instead?
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
Ping. I've tested this with Binutils 2.41 and 2.41.50.202311xx several
times so it should be OK.
On Mon, 2023-11-06 at 15:50 +0800, Xi Ruoyao wrote:
/* snip */
> Bootstrapped and regtested on loongarch64-linux-gnu twice: once with
> Binutils 2.41, another with Binutils 2.41.50.20
UME gcc/config
gcc/config/aarch64/aarch64.cc
gcc/config/riscv/riscv.cc
gcc/config/ia64/ia64.cc
gcc/config/ia64/sync.md
gcc/config/gcn/gcn.md
gcc/config/loongarch/loongarch.cc
gcc/config/rs6000/rs6000.cc
gcc/config/rs6000/sync.md
gcc/config/nvptx/nvptx.cc
Maybe all of them are redundant?
--
Xi Ru
LA664 defines DBAR hints 0x1 - 0x1f (except 0xf and 0x1f) as follows [1-2]:
- Bit 4: kind of constraint (0: completion, 1: ordering)
- Bit 3: barrier for previous read (0: true, 1: false)
- Bit 2: barrier for previous write (0: true, 1: false)
- Bit 1: barrier for succeeding read (0: true, 1:
With LSX or LASX, copysign (x[i], -1) (or any negative constant) can be
vectorized using [x]vbitseti.{w/d} instructions to directly set the
signbits.
Inspired by Tamar Christina's "AArch64: Handle copysign (x, -1) expansion
efficiently" (r14-5289).
gcc/ChangeLog:
*
On Wed, 2023-11-08 at 16:27 +0800, Xi Ruoyao wrote:
> On Wed, 2023-11-08 at 09:49 +0800, chenglulu wrote:
> >
> > 在 2023/11/6 下午7:36, Xi Ruoyao 写道:
> > > This is isomorphic to the LLVM changes [1-2].
> > >
> > > On LoongArch, the LL and SC
(x));
#endif
a = __builtin_copysignf(a, x);
asm(""::"f"(a));
}
}
If DISALLOW_COPYSIGN_OPTIMIZATION is defined, the result is faster for
0.23 seconds.
I'll submit another patch to disable this.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
(fcopysign x, NEGATIVE_CONST) can be simplified to (fneg (fabs x)), but
a logic error in the code caused it mistakenly simplified to (fneg x)
instead.
gcc/ChangeLog:
PR rtl-optimization/112483
* simplify-rtx.cc (simplify_binary_operation_1) :
Fix the simplification of
On Sun, 2023-11-12 at 11:02 -0700, Jeff Law wrote:
>
>
> On 11/12/23 10:41, Xi Ruoyao wrote:
> > On Sat, 2023-11-11 at 13:12 -0700, Jeff Law wrote:
> > >
> > >
> > > On 8/14/23 05:22, Jin Ma wrote:
> > > > CLOBBER and USE does not r
ing was done. Standard practice is
> to
> do a bootstrap and regression test on a primary platform such as x86,
> aarch64, ppc64.
>
> I went ahead and did a bootstrap and regression test on x86_64, then
> pushed this to the trunk.
Unfortunately this patch has triggered a bootstrap comparison failure on
loongarch64-linux-gnu: https://gcc.gnu.org/PR112497.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
GCC internal says:
'subreg's of 'subreg's are not supported. Using
'simplify_gen_subreg' is the recommended way to avoid this problem.
Unfortunately loongarch_expand_vec_cond_mask_expr might create nested
subreg under certain circumstances, causing an ICE.
Use simplify_gen_subreg as
fld and fst have same address mode as ld.w and st.w, so the same
optimization as r14-4851 should be applied for them too.
gcc/ChangeLog:
* config/loongarch/loongarch.md (LD_AT_LEAST_32_BIT): New mode
iterator.
(ST_ANY): New mode iterator.
(define_peephole2): Use
On Wed, 2023-11-08 at 09:49 +0800, chenglulu wrote:
>
> 在 2023/11/6 下午7:36, Xi Ruoyao 写道:
> > This is isomorphic to the LLVM changes [1-2].
> >
> > On LoongArch, the LL and SC instructions has memory barrier semantics:
> >
> > - LL: +
> > - SC: +
&
On Tue, 2023-11-07 at 19:10 +0800, Xi Ruoyao wrote:
> On Tue, 2023-11-07 at 12:06 +0800, chenxiaolong wrote:
> > +__m128i a,b,c;
> > +
> > +__asm__ ("vadd.d %w0,%w1,%w2\n\t"
> > + :"=f" (c)
> > + :"f" (a),"f" (b)
&g
1.c:6:1: error: inconsistent operand constraints in an ‘asm’
6 | __asm__ ("vadd.d %w0,%w1,%w2\n\t"
Please recheck.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
This is isomorphic to the LLVM changes [1-2].
On LoongArch, the LL and SC instructions has memory barrier semantics:
- LL: +
- SC: +
But the compare and swap operation is allowed to fail, and if it fails
the SC instruction is not executed, thus the guarantee of acquiring
semantics cannot be
fld and fst have same address mode as ld.w and st.w, so the same
optimization as r14-4851 should be applied for them too.
gcc/ChangeLog:
* config/loongarch/loongarch.md (LD_AT_LEAST_32_BIT): New mode
iterator.
(ST_ANY): New mode iterator.
(define_peephole2): Use
As the commit message of r14-4674 has indicated, if the assembler does
not support conditional branch relaxation, a relocation overflow may
happen on conditional branches when relaxation is enabled because the
number of NOP instructions inserted by the assembler will be more than
the number
Pushed r14-5030. The subject and ChangeLog are updated to include the
PR number. The code change is same as v1.
On Mon, 2023-10-30 at 20:44 +0800, chenglulu wrote:
>
> 在 2023/10/30 下午8:26, Xi Ruoyao 写道:
> > On Mon, 2023-10-30 at 19:50 +0800, chenglulu wrote:
> > > 在
On Mon, 2023-10-30 at 19:50 +0800, chenglulu wrote:
> 在 2023/10/30 下午7:42, Xi Ruoyao 写道:
> > Now loongarch.md uses HAVE_AS_TLS, we need this to fix the failure
> > building a cross compiler if the cross assembler is not installed yet.
> >
> > gcc/ChangeLog:
>
Now loongarch.md uses HAVE_AS_TLS, we need this to fix the failure
building a cross compiler if the cross assembler is not installed yet.
gcc/ChangeLog:
* config/loongarch/loongarch-opts.h (HAVE_AS_TLS): Define to 0
if not defined yet.
---
Ok for trunk?
Pushed r14-{4848..4852}.
On Thu, 2023-10-19 at 22:02 +0800, Xi Ruoyao wrote:
> For relaxation we are now generating assembler macros for symbolic
> addresses everywhere, but this is limiting scheduling and there are
> known situations where the relaxation cannot improve the code.
>
external symbol c, the linker may relax "la.global c" to "la.local c"
(if ab.o is linked together with another file c.o which contains the
definition of c) or not. As we cannot exclude the possibility of a
relaxation on la.global for incremental linking, just emit la.global and
let the
In these cases, if we use explicit relocs, we end up with 2
instructions:
pcalau12it0, %pc_hi20(x)
ld.d t0, t0, %pc_lo12(x)
If we use la.local pseudo-op, in the best scenario (x is in +/- 2MiB
range) we still have 2 instructions:
pcaddi t0, %pcrel_20(x)
ld.d
If we are performing LTO for a final link and linker plugin is enabled,
then we are sure any GOT access may resolve to a symbol out of the link
unit (otherwise the linker plugin will tell us the symbol should be
resolved locally and we'll use PC-relative access instead).
Produce machine
the compiler to use explicit relocs
for these cases, but assembler macros for other cases. Use it as the
default if the assembler supports both explicit relocs and relaxation.
LTO-bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk?
Xi Ruoyao (5):
LoongArch: Add enum-style
gcc/ChangeLog:
* doc/invoke.texi (-mexplicit-relocs=style): Document.
(-mexplicit-relocs): Document as an alias of
-mexplicit-relocs=always.
(-mno-explicit-relocs): Document as an alias of
-mexplicit-relocs=none.
(-mcmodel=extreme): Mention
The linker does not know how to relax TLS access for LoongArch, so let's
emit machine instructions with explicit relocs for TLS.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_explicit_relocs_p):
Return true for TLS symbol types if -mexplicit-relocs=auto.
To take a better balance between scheduling and relaxation when -flto is
enabled, add three-way -mexplicit-relocs={auto,none,always} options.
The old -mexplicit-relocs and -mno-explicit-relocs options are still
supported, they are mapped to -mexplicit-relocs=always and
-mexplicit-relocs=none.
The
On Wed, 2023-10-18 at 09:34 +0800, chenglulu wrote:
>
> 在 2023/10/17 下午10:24, WANG Xuerui 写道:
> >
> > On 10/17/23 22:06, Xi Ruoyao wrote:
> > > During the review of a LLVM change [1], on LA464 we found that zeroing
> > "an" LLVM change (because t
During the review of a LLVM change [1], on LA464 we found that zeroing
a fcc with fcmp.caf.s is much faster than a movgr2cf from $r0.
[1]: https://github.com/llvm/llvm-project/pull/69300
gcc/ChangeLog:
* config/loongarch/loongarch.md (movfcc): Use fcmp.caf.s for
zeroing a fcc.
o.
>
> P.S. Currently support for "f32" is not active, and it should probably be
> avoided if you want to build a working rootfs.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
using g++ 4.8 as a host compiler.
AFAIK G++ 5.1 also has a bug (https://gcc.gnu.org/PR65801) breaking
building recent GCC. I don't think it's really "maintainable" to ensure
current GCC able to be built with a buggy host compiler.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
When I added copysign support for LoongArch (r13-3702), we did not have
a copysign RTL insn, so I had to use UNSPEC to represent the copysign
instruction. Now the copysign RTX code has been added in r14-1586, so
this patch removes those UNSPECs, and it uses the native RTL copysign
insn.
Inspired
mscratch.org/lfs/view/development/chapter08/gcc.html
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
or isa options, but pr104992.c failed because
> it expected result with "vect_int_mod returns 1" but it was compiled
> without -mlsx/-mlasx. Seems pr104992.c is invoked by gcc.dg/dg.exp,
> pr104992.c is not affected by DEFAULT_CFLAGS, so we still need to check
> if LSX/LASX is available in vect_int_mod.
>
> Other parts of new patch is still WIP.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
On Mon, 2023-09-25 at 16:26 +0800, chenglulu wrote:
> LGTM!
>
> Thank you for your modification!
Pushed r14-4250.
> 在 2023/9/25 下午4:13, Xi Ruoyao 写道:
> > gcc/ChangeLog:
> >
> > * doc/invoke.texi: Update -m[no-]explicit-relocs for r14-4160.
> > ---
>
gcc/ChangeLog:
* doc/invoke.texi: Update -m[no-]explicit-relocs for r14-4160.
---
I've not regtested this as it's only a doc change. Ok for trunk?
gcc/doc/invoke.texi | 10 ++
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/gcc/doc/invoke.texi
sx { } {
> + return [check_no_compiler_messages loongarch_asx assembly {
> + #if !defined(__loongarch_asx)
> + #error "LASX not defined"
> + #endif
> + }]
> +}
> +
> # Appends necessary Python flags to extra-tool-flags if Python.h is
> supported.
> # Otherwise, modifies dg-do-what.
> proc dg-require-python-h { args } {
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
_push (state.defs_list[0]);
> }
> reinsn_del_list.safe_push (curr_cand->insn);
> state.modified[INSN_UID (curr_cand->insn)].deleted = 1;
> @@ -1345,6 +1483,10 @@ find_and_remove_re (void)
> for (unsigned int i = 0; i < reinsn_copy_list.length (); i +=
unsigned HOST_WIDE_INT n
> > > > + = (unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK (int_mode);
> > > > enum rtx_code adjusted_code;
> > > >
> > > > /* Normalize code to either LEU or GEU. */
> > > > @@ -12051,15 +12052,15 @@ simplify_compare_const (enum rtx_code code,
> > > > machine_mode mode,
> > > > HOST_WIDE_INT_PRINT_HEX ") to (MEM %s "
> > > > HOST_WIDE_INT_PRINT_HEX ").\n", GET_MODE_NAME
> > > > (int_mode),
> > > > GET_MODE_NAME (narrow_mode_iter), GET_RTX_NAME (code),
> > > > - (unsigned HOST_WIDE_INT)const_op, GET_RTX_NAME
> > > > (adjusted_code),
> > > > - n);
> > > > + (unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK
> > > > (int_mode),
> > > > + GET_RTX_NAME (adjusted_code), n);
> > > > }
> > > > poly_int64 offset = (BYTES_BIG_ENDIAN
> > > > ? 0
> > > > : (GET_MODE_SIZE (int_mode)
> > > > - GET_MODE_SIZE (narrow_mode_iter)));
> > > > *pop0 = adjust_address_nv (op0, narrow_mode_iter, offset);
> > > > - *pop1 = GEN_INT (n);
> > > > + *pop1 = gen_int_mode (n, narrow_mode_iter);
> > > > return adjusted_code;
> > > > }
> > > > }
> > > > --
> > > > 2.41.0
> > > >
> >
> > --
> > Xi Ruoyao
> > School of Aerospace Science and Technology, Xidian University
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
defined behavior WRAP-AROUND) only to part
> of the program. And then add -fsnaitize=*overflow to detect all other
> Unexpected overflows in the program.
>
> This is currently missing from GCC, I guess?
If overflow is really so rare, we should just enable -fsanitize=signed-
integer-overflow globally and special case the code paths where we want
wrapping. It's easy in 2023:
/* b + c may wrap here because ... ... */
ckd_add(, b, c);
Or
/* if b + c overflows, we have a severe issue, let's panic even if
sanitizer disabled */
if (chk_add(, b, c))
panic("b + c overflows but it shouldn't (b = %d, c = %d)", b, c);
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
g* could be unintentional and should be warned then. GCC is a
compiler, not an advanced AI educating the programmers.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
we treat them as integers
in the brain we'll end up invoking undefined behavior sooner or later.
Thus the wrapping/overflowing behavior of pointer is controlled by a
different option than integers.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
128 and .uleb128], gcc_cv_as_leb128,,
+gcc_GAS_CHECK_FEATURE([.sleb128 and .uleb128], gcc_cv_as_leb128,
+[$check_leb128_asflags],
[ .data
.uleb128 L2 - L1
L1:
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
&5
> + (eval $ac_try) 2>&5
> + ac_status=$?
> + $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
> + test $ac_status = 0
> + mv conftest conftest.o
> + fi
> +esac
Phew. Randomly modifying configure and paste the m
The subject should be "Add tests for SX vector floating-point
instructions". The "support" has already been added.
Likewise for patches 5-9.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
unsigned HOST_WIDE_INT n
> > > > + = (unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK (int_mode);
> > > > enum rtx_code adjusted_code;
> > > >
> > > > /* Normalize code to either LEU or GEU. */
> > > > @@ -12051,15 +12052,15 @@ simplify_compare_const (enum rtx_code code,
> > > > machine_mode mode,
> > > > HOST_WIDE_INT_PRINT_HEX ") to (MEM %s "
> > > > HOST_WIDE_INT_PRINT_HEX ").\n", GET_MODE_NAME
> > > > (int_mode),
> > > > GET_MODE_NAME (narrow_mode_iter), GET_RTX_NAME (code),
> > > > - (unsigned HOST_WIDE_INT)const_op, GET_RTX_NAME
> > > > (adjusted_code),
> > > > - n);
> > > > + (unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK
> > > > (int_mode),
> > > > + GET_RTX_NAME (adjusted_code), n);
> > > > }
> > > > poly_int64 offset = (BYTES_BIG_ENDIAN
> > > > ? 0
> > > > : (GET_MODE_SIZE (int_mode)
> > > > - GET_MODE_SIZE (narrow_mode_iter)));
> > > > *pop0 = adjust_address_nv (op0, narrow_mode_iter, offset);
> > > > - *pop1 = GEN_INT (n);
> > > > + *pop1 = gen_int_mode (n, narrow_mode_iter);
> > > > return adjusted_code;
> > > > }
> > > > }
> > > > --
> > > > 2.41.0
> > > >
> >
> > --
> > Xi Ruoyao
> > School of Aerospace Science and Technology, Xidian University
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
On Sat, 2023-09-09 at 16:21 +0800, chenglulu wrote:
> LGTM!
Pushed r14-3821.
> 在 2023/9/9 下午4:20, Xi Ruoyao 写道:
> > The generic code will split 16-byte copy into two 8-byte copies, so the
> > vector code wouldn't be used even if -mno-strict-align. This
> > contr
The generic code will split 16-byte copy into two 8-byte copies, so the
vector code wouldn't be used even if -mno-strict-align. This
contradicted with the purpose of this test case.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/memcpy-vec-3.c: Increase the amount of
copied
gt; + struct {
> + unsigned char offset;
> + unsigned char size;
> + } args[384];
> +};
> +
> +struct isel_context {
> + const struct ac_shader_args* args;
> + int arg_temps[384];
> +};
> +
> +
> +void
> +add_startpgm (struct isel_context* ctx, unsigned short arg_count)
> +{
> +
> + for (unsigned i = 0, arg = 0; i < arg_count; i++)
> + {
> + unsigned size = ctx->args->args[i].size;
> + unsigned reg = ctx->args->args[i].offset;
> +
> + if (reg % ( 4 < util_next_power_of_two (size)
> + ? 4 : util_next_power_of_two (size)))
> + ctx->arg_temps[i] = create_vec_from_array ();
> + }
> +}
> +
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
On Sat, 2023-09-09 at 15:14 +0800, chenglulu wrote:
>
> 在 2023/9/9 下午3:06, Xi Ruoyao 写道:
> > On Sat, 2023-09-09 at 15:04 +0800, chenglulu wrote:
> > > Hi,RuoYao:
> > >
> > > I think the test example memcpy-vec-3.c submitted in r14-3818 is
> > &
-align', so no vector load instructions
> will be generated.
Yes, in this case we cannot use vst because we don't know if b is
aligned. Thus a { scan-assembler-not "vst" } guarantees that.
Or am I understanding something wrongly here?
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
Pushed r14-3819.
On Sat, 2023-09-09 at 14:16 +0800, chenglulu wrote:
>
> 在 2023/9/8 上午12:33, Xi Ruoyao 写道:
> > gcc/ChangeLog:
> >
> > * config/loongarch/loongarch.cc
> > (loongarch_block_move_straight):
> > Check precondition
Pushed r14-3818 with test cases added. The pushed patch is attached.
On Sat, 2023-09-09 at 14:10 +0800, chenglulu wrote:
>
> 在 2023/9/8 上午12:14, Xi Ruoyao 写道:
> > gcc/ChangeLog:
> >
> > * config/loongarch/loongarch.h (LARCH_MAX_MOVE_PER_INSN):
> >
n't see real consequences to this unless you have a build script
> that relieas on the path of libgcc.a / startfile, which can still (and
> should) be revised using $(gcc --print-multi-dir).
I guess I can live with it.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
ilib
configuration, esp. today most LoongArch users don't need multilib at
all?
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
mmon/" ?
My bad. I didn't realized the file has been moved to common.
Don't change it :(.
> Thanks for the review.
>
>
> 在 2023/9/8 下午4:06, Xi Ruoyao 写道:
> > On Fri, 2023-09-08 at 10:00 +0800, Guo Jie wrote:
> > > gcc/ChangeLog:
> > >
> > >
@ static const struct default_options
> loongarch_option_optimization_table[] =
> { OPT_LEVELS_ALL, OPT_fasynchronous_unwind_tables, NULL, 1 },
> { OPT_LEVELS_1_PLUS, OPT_fsection_anchors, NULL, 1 },
> { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
> + { OPT_LEVELS_1_PLUS, OPT_fsched_
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_block_move_straight):
Check precondition (delta must be a power of 2) and use
popcount_hwi instead of a homebrew loop.
---
I've not run a full bootstrap with this, but it should be obvious.
Ok for trunk?
gcc/ChangeLog:
* config/loongarch/loongarch.h (LARCH_MAX_MOVE_PER_INSN):
Define to the maximum amount of bytes able to be loaded or
stored with one machine instruction.
* config/loongarch/loongarch.cc (loongarch_mode_for_move_size):
New static function.
On Thu, 2023-09-07 at 17:47 +0800, Xi Ruoyao wrote:
/* snip */
> I've made some local experiment too, I think we can add a "-mbuild-
> multilib" option which does nothing but in the hacked spec we can wrap
> the line in %{mbuild-multilib:...}:
>
> %{mbuild-multilib:%
On Thu, 2023-09-07 at 17:31 +0800, Yang Yujie wrote:
> > This is bad. It makes BOOT_CFLAGS=-mlasx or CFLAGS_FOR_TARGET=-mlasx
> > silently ignored so we cannot test a LSX/LASX or vectorizer change with
> > them.
> >
> > Why do we need to purge all user-specified -m options here?
>
> Yes, that
On Wed, 2023-09-06 at 09:04 +0800, Yang Yujie wrote:
> On Tue, Sep 05, 2023 at 09:31:56PM +0800, Xi Ruoyao wrote:
> > On Thu, 2023-08-31 at 20:48 +0800, Yang Yujie wrote:
> > > * Support options for LoongArch SIMD extensions:
> > > new configure options --with-simd={
On Thu, 2023-09-07 at 10:15 +0800, chenglulu wrote:
>
> 在 2023/9/6 下午6:58, Xi Ruoyao 写道:
> > Forgot to mention: I've bootstrapped and regtested this patch on
> > loongarch64-linux-gnu (with PR110939 patch applied to unbreak the
> > bootstrapping). Ok for trunk?
>
&
> \
> + if (ref != res){
> \
> + printf(" error: %s at line %ld , expected %d, got %d\n",
> \
> + __FILE__, line, ref, res);
> \
> + }
> \
> +}while(0)
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
{ dg-do compile } */
> +/* { dg-options "-Ofast -mstrict-align -mlasx" } */
> +/* { dg-final { scan-assembler-not "vfadd.s" } } */
> +
> +void
> +foo (float* restrict x, float* restrict y)
> +{
> + x[0] = x[0] + y[0];
> + x[1] = x[1] + y[1];
> + x[
Forgot to mention: I've bootstrapped and regtested this patch on
loongarch64-linux-gnu (with PR110939 patch applied to unbreak the
bootstrapping). Ok for trunk?
On Wed, 2023-09-06 at 18:46 +0800, Xi Ruoyao wrote:
> If mask is a constant with value ((1 << N) - 1) << M
d they will suddenly blow up when
GCC optimizer starts to optimize more aggressively based on the aliasing
rule.
Try not to use these (you can write a helper function to memcpy() into a
__m128). Or use -fno-strict-alising in dg-options.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
301 - 400 of 970 matches
Mail list logo