[PATCH] LoongArch: Optimize the loading of immediate numbers with the same high and low 32-bit values

2023-11-17 Thread Guo Jie
For the following immediate load operation in gcc/testsuite/gcc.target/loongarch/imm-load1.c: long long r = 0x0101010101010101; Before this patch: lu12i.w $r15,16842752>>12 ori $r15,$r15,257 lu32i.d $r15,0x10101>>32 lu52i.d $r1

Re: [PATCH v3 1/2] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-11-17 Thread waffl3x
The patch is coming along, I just have a quick question regarding style. I make use of IILE's (immediately invoked lambda expression) a whole lot in my own code. I know that their use is controversial in general so I would prefer to ask instead of just submitting the patch using them a bunch sudden

[PATCH v2 9/9] RISC-V: Disable fractional type intrinsics for the XTheadVector extension

2023-11-17 Thread Jun Sha (Joshua)
Because the XTheadVector extension does not support fractional operations, so we need to delete the related intrinsics. The types involved are as follows: v(u)int8mf8_t, v(u)int8mf4_t, v(u)int8mf2_t, v(u)int16mf4_t, v(u)int16mf2_t, v(u)int32mf2_t, vfloat16mf4_t, vfloat16mf2_t, vfloat32mf2_t Contr

[PATCH v2 8/9] RISC-V: Add support for xtheadvector-specific load/store intrinsics

2023-11-17 Thread Jun Sha (Joshua)
This patch involves the generation of xtheadvector special load/store instructions. Contributors: Jun Sha (Joshua) Jin Ma Christoph Müllner gcc/ChangeLog: * config/riscv/riscv-vector-builtins-bases.cc (class th_loadstore_width): Define new builtin bases

[PATCH v2 6/9] RISC-V: Tests for overlapping RVV and XTheadVector instructions (Part4)

2023-11-17 Thread Jun Sha (Joshua)
For big changes in instruction generation, we can only duplicate some typical tests in testsuite/gcc.target/riscv/rvv/base. This patch is adding some tests for ternary and unary operations. Contributors: Jun Sha (Joshua) Jin Ma Christoph Müllner gcc/testsuite/ChangeLog

[PATCH v2 5/9] RISC-V: Tests for overlapping RVV and XTheadVector instructions (Part3)

2023-11-17 Thread Jun Sha (Joshua)
For big changes in instruction generation, we can only duplicate some typical tests in testsuite/gcc.target/riscv/rvv/base. This patch is adding some tests for binary operations. Contributors: Jun Sha (Joshua) Jin Ma Christoph Müllner gcc/testsuite/ChangeLog:

[PATCH v2 4/9] RISC-V: Tests for overlapping RVV and XTheadVector instructions (Part2)

2023-11-17 Thread Jun Sha (Joshua)
For big changes in instruction generation, we can only duplicate some typical tests in testsuite/gcc.target/riscv/rvv/base. This patch is adding some tests for binary operations. Contributors: Jun Sha (Joshua) Jin Ma Christoph Müllner gcc/testsuite/ChangeLog:

[PATCH v2 3/9] RISC-V: Tests for overlapping RVV and XTheadVector instructions (Part1)

2023-11-17 Thread Jun Sha (Joshua)
For big changes in instruction generation, we can only duplicate some typical tests in testsuite/gcc.target/riscv/rvv/base. This patch is adding some tests for binary operations. Contributors: Jun Sha (Joshua) Jin Ma Christoph Müllner gcc/testsuite/ChangeLog:

[PATCH v2 2/9] RISC-V: Handle differences between xtheadvector and vector

2023-11-17 Thread Jun Sha (Joshua)
This patch is to handle the differences in instruction generation between vector and xtheadvector, mainly adding th. prefix to all xtheadvector instructions. Contributors: Jun Sha (Joshua) Jin Ma Christoph Müllner gcc/ChangeLog: * config.gcc: Add header for XTh

[PATCH v2 1/9] RISC-V: minimal support for xtheadvector

2023-11-17 Thread Jun Sha (Joshua)
This patch is to introduce basic XTheadVector support (march string parsing and a test for __riscv_xtheadvector) according to https://github.com/T-head-Semi/thead-extension-spec/ Contributors: Jun Sha (Joshua) Jin Ma Christoph Müllner gcc/ChangeLog: * common/co

[PATCH v2 0/9] RISC-V: Support XTheadVector extensions

2023-11-17 Thread Jun Sha (Joshua)
This patch series presents gcc implementation of the XTheadVector extension [1]. [1] https://github.com/T-head-Semi/thead-extension-spec/ I updated my patch series, because I forgot to add co-authors in the last version. Contributors: Jun Sha (Joshua) Jin Ma Christoph M

Re: [PATCH] RISC-V: Refactor RVV iterators[NFC]

2023-11-17 Thread Kito Cheng
LGTM, that's a really great clean up :) On Sat, Nov 18, 2023 at 11:12 AM Juzhe-Zhong wrote: > > This patch refactors RVV iteratros for easier maintain. > > E.g. > > (define_mode_iterator V [ > RVVM8QI RVVM4QI RVVM2QI RVVM1QI RVVMF2QI RVVMF4QI (RVVMF8QI > "TARGET_MIN_VLEN > 32") > > RVVM8HI R

[PATCH] LoongArch: Modify MUSL_DYNAMIC_LINKER.

2023-11-17 Thread Lulu Cheng
Use no suffix at all in the musl dynamic linker name for hard float ABI. Use -sf and -sp suffixes in musl dynamic linker name for soft float and single precision ABIs. The following table outlines the musl interpreter names for the LoongArch64 ABI names. musl interpreter| LoongArch64 A

[PATCH] RISC-V: Refactor RVV iterators[NFC]

2023-11-17 Thread Juzhe-Zhong
This patch refactors RVV iteratros for easier maintain. E.g. (define_mode_iterator V [ RVVM8QI RVVM4QI RVVM2QI RVVM1QI RVVMF2QI RVVMF4QI (RVVMF8QI "TARGET_MIN_VLEN > 32") RVVM8HI RVVM4HI RVVM2HI RVVM1HI RVVMF2HI (RVVMF4HI "TARGET_MIN_VLEN > 32") (RVVM8HF "TARGET_VECTOR_ELEN_FP_16") (RVV

[pushed] analyzer: new warning: -Wanalyzer-infinite-loop [PR106147]

2023-11-17 Thread David Malcolm
This patch implements a new analyzer warning: -Wanalyzer-infinite-loop. It works by examining the exploded graph once the latter has been fully built. It attempts to detect cycles in the exploded graph in which: - no externally visible work occurs - no escape is possible from the cycle once it ha

[PATCH v3] libstdc++: Remove UB from operator+ of months and weekdays.

2023-11-17 Thread Cassio Neri
The following functions invoke signed integer overflow (UB) for some extreme values of days and months [1]: weekday operator+(const weekday& x, const days& y); // #1 month operator+(const month& x, const months& y); // #2 For #1 the problem is that in libstdc++ days::rep is int64_t. Other i

[PATCH v2] The following functions invoke signed integer overflow (UB) for some extreme values of days and months [1]:

2023-11-17 Thread Cassio Neri
weekday operator+(const weekday& x, const days& y); // #1 month operator+(const month& x, const months& y); // #2 For #1 the problem is that in libstdc++ days::rep is int64_t. Other implementations use int32_t and cast operands to int64_t. Hence then perform arithmetic operations without fea

Re: Re: RISC-V: Support XTheadVector extensions

2023-11-17 Thread 钟居哲
>> I suspect it's going to be even worse if you we have multiple patterns >> with the same underlying RTL, but just different output strings. No. We don't need to add (duplicate) any new patterns. I know RVV GCC very well. I know how to do that. juzhe.zh...@rivai.ai From: Jeff Law Date: 2023-11

Re: RISC-V: Support XTheadVector extensions

2023-11-17 Thread Jeff Law
On 11/17/23 16:16, 钟居哲 wrote: >> I assume this hunk is meant for riscv_output_operand in riscv.cc.  We may also need to add '^' to the punct_valid_p hook.  But yes, this is the preferred way to go when all we need to do is prefix the instruction with "th.". No. I don't think we need to add

Re: Re: RISC-V: Support XTheadVector extensions

2023-11-17 Thread 钟居哲
>> I assume this hunk is meant for riscv_output_operand in riscv.cc. We >> may also need to add '^' to the punct_valid_p hook. But yes, this is >> the preferred way to go when all we need to do is prefix the instruction >> with "th.". No. I don't think we need to add '^' . I don't want theadvect

[PATCH] libgccjit: Add ways to set the personality function

2023-11-17 Thread Antoni Boucher
Hi. This adds functions to set the personality function (bug 112603). I'm not sure I can make a test for this: it seems the personality function will not be set if there are no try/catch inside the functions. Do you know a way to keep the personality function that is set in this case? Or should w

[PATCH] libgccjit: Add vector permutation and vector access operations

2023-11-17 Thread Antoni Boucher
Hi. This patch adds a vector permutation and vector access operations (bug 112602). This was split from this patch: https://gcc.gnu.org/pipermail/jit/2023q1/001606.html Thanks for the review. From 25b386334f22845d7ba1b60658730373eb6ddbb3 Mon Sep 17 00:00:00 2001 From: Antoni Boucher Date: Fri, 1

[PATCH] Makefile.tpl: Avoid race condition in generating site.exp from the top level

2023-11-17 Thread Lewis Hyatt
Hello- I often find it convenient to run a new c-c++-common test from the main build dir like: $ make -j 2 RUNTESTFLAGS=dg.exp=new-test.c check-gcc-{c,c++} I noticed that sometimes this produces a corrupted site.exp and then no tests work until it is remade manually. To avoid the issue, it is ne

Re: [committed] libstdc++: Define C++26 saturation arithmetic functions (P0543R3)

2023-11-17 Thread Jonathan Wakely
On Fri, 17 Nov 2023 at 15:32, Jonathan Wakely wrote: > > Tested x86_64-linux. Pushed to trunk. > > GCC generates better code for add_sat if we use: > > unsigned z = x + y; > z |= -(z < x); > return z; > > If the compiler can't be improved we should consider using that instead > of __builtin_add_ov

[PATCH] c++: P2280R4, Using unknown refs in constant expr [PR106650]

2023-11-17 Thread Marek Polacek
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? -- >8 -- This patch is an attempt to implement (part of?) P2280, Using unknown pointers and references in constant expressions. (Note that R4 seems to only allow References to unknown/Accesses via this, but not Pointers to unknown.) Th

Re: [PATCH] libgccjit Fix a RTL bug for libgccjit

2023-11-17 Thread Jeff Law
On 11/17/23 14:08, Antoni Boucher wrote: In contrast with the other frontends, libgccjit can be executed multiple times in a row in the same process. Yup. I'm aware of that. Even so calling init_emit_once more than one time still seems wrong. jeff

Re: [PATCH] libgccjit Fix a RTL bug for libgccjit

2023-11-17 Thread Antoni Boucher
In contrast with the other frontends, libgccjit can be executed multiple times in a row in the same process. This is the source of multiple bugs due to global variables as can be seen by several patches I sent these past years. On Fri, 2023-11-17 at 14:06 -0700, Jeff Law wrote: > > > On 11/16/23

Re: [PATCH] libgccjit Fix a RTL bug for libgccjit

2023-11-17 Thread Jeff Law
On 11/16/23 15:36, Antoni Boucher wrote: Hi. This patch fixes a RTL bug when using some target-specific builtins in libgccjit (bug 112576). The test use a function from an unmerged patch: https://gcc.gnu.org/pipermail/jit/2023q1/001605.html Thanks for the review! The natural question here is

[PATCH] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]

2023-11-17 Thread Xi Ruoyao
The usage LSX and LASX frint/ftint instructions had some problems: 1. These instructions raises FE_INEXACT, which is not allowed with -fno-fp-int-builtin-inexact for most C2x section F.10.6 functions (the only exceptions are rint, lrint, and llrint). 2. The "frint" instruction without explic

[PATCH v2 3/6] LoongArch: Add evolution features of base ISA revisions

2023-11-17 Thread Xi Ruoyao
* config/loongarch/loongarch-def.h: (loongarch_isa_base_features): Declare. Define it in ... * config/loongarch/loongarch-cpu.cc (loongarch_isa_base_features): ... here. (fill_native_cpu_config): If we know the base ISA of the CPU model from PRID, us

[PATCH v2 4/6] LoongArch: Take the advantage of -mdiv32 if it's enabled

2023-11-17 Thread Xi Ruoyao
With -mdiv32, we can assume div.w[u] and mod.w[u] works on low 32 bits of a 64-bit GPR even if it's not sign-extended. gcc/ChangeLog: * config/loongarch/loongarch.md (DIV): New mode iterator. (3): Don't expand if TARGET_DIV32. (di3_fake): Disable if TARGET_DIV32. (

[PATCH v2 6/6] LoongArch: Add fine-grained control for LAM_BH and LAMCAS

2023-11-17 Thread Xi Ruoyao
gcc/ChangeLog: * config/loongarch/genopts/isa-evolution.in: (lam-bh, lamcas): Add. * config/loongarch/loongarch-str.h: Regenerate. * config/loongarch/loongarch.opt: Regenerate. * config/loongarch/loongarch-cpucfg-map.h: Regenerate. * config/loongarch

[PATCH v2 5/6] LoongArch: Don't emit dbar 0x700 if -mld-seq-sa

2023-11-17 Thread Xi Ruoyao
This option (CPUCFG word 0x3 bit 23) means "the hardware guarantee that two loads on the same address won't be reordered with each other". Thus we can omit the "load-load" barrier dbar 0x700. This is only a micro-optimization because dbar 0x700 is already treated as nop if the hardware supports L

[PATCH v2 2/6] LoongArch: genopts: Add infrastructure to generate code for new features in ISA evolution

2023-11-17 Thread Xi Ruoyao
LoongArch v1.10 introduced the concept of ISA evolution. During ISA evolution, many independent features can be added and enumerated via CPUCFG. Add a data file into genopts storing the CPUCFG word, bit, the name of the command line option controlling if this feature should be used for compilatio

[PATCH v2 1/6] LoongArch: Fix internal error running "gcc -march=native" on LA664

2023-11-17 Thread Xi Ruoyao
On LA664, the PRID preset is ISA_BASE_LA64V110 but the base architecture is guessed ISA_BASE_LA64V100. This causes a warning to be outputed: cc1: warning: base architecture 'la64' differs from PRID preset '?' But we've not set the "?" above in loongarch_isa_base_strings, thus it's a nullptr

[PATCH v2 0/6] Add LoongArch v1.1 div32 and ld-seq-sa support

2023-11-17 Thread Xi Ruoyao
Superseds https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636795.html. Requires https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636946.html. Changes: - Rebase on top of "Add LoongarchV1.1 instructions support". - Not to translate loongarch-def.c C++. Use int64_t instead of HOS

[PATCH 7/7] lto: partition specific lto_clone_numbers

2023-11-17 Thread Michal Jires
Replaces "lto_priv.$clone_number" by "lto_priv.$partition_hash.$partition_specific_clone_number". To reduce divergence for incremental LTO. Bootstrapped/regtested on x86_64-pc-linux-gnu gcc/lto/ChangeLog: * lto-partition.cc (set_clone_partition_name_checksum): New. (CHECKSUM_STRI

[PATCH 6/7] lto: squash order of symbols in partitions

2023-11-17 Thread Michal Jires
This patch squashes order of symbols in individual partitions, so that their relative order is conserved, but is not influenced by symbols in other partitions. Order of cloned symbols is set to 0. This should be fine because order specifies order of symbols in input files, which cloned symbols are

[PATCH 5/7] lto: Implement cache partitioning

2023-11-17 Thread Michal Jires
This patch implements new cache partitioning. It tries to keep symbols from single source file together to minimize propagation of divergence. It starts with symbols already grouped by source files. If reasonably possible it only either combines several files into one final partition, or, if a fil

[PATCH 3/7] Lockfile.

2023-11-17 Thread Michal Jires
This patch implements lockfile used for incremental LTO. Bootstrapped/regtested on x86_64-pc-linux-gnu gcc/ChangeLog: * Makefile.in: Add lockfile.o. * lockfile.cc: New file. * lockfile.h: New file. --- gcc/Makefile.in | 5 +- gcc/lockfile.cc | 136 +

[PATCH 4/7] lto: Implement ltrans cache

2023-11-17 Thread Michal Jires
This patch implements Incremental LTO as ltrans cache. The cache is active when directory $GCC_LTRANS_CACHE is specified and exists. Stored are pairs of ltrans input/output files and input file hash. File locking is used to allow multiple GCC instances to use to same cache. Bootstrapped/regtested

[PATCH 2/7] lto: Remove random_seed from section name.

2023-11-17 Thread Michal Jires
Bootstrapped/regtested on x86_64-pc-linux-gnu gcc/ChangeLog: * lto-streamer.cc (lto_get_section_name): Remove random_seed in WPA. --- gcc/lto-streamer.cc | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/gcc/lto-streamer.cc b/gcc/lto-streamer.cc index 4968fd13413

[PATCH 1/7] lto: Skip flag OPT_fltrans_output_list_.

2023-11-17 Thread Michal Jires
Bootstrapped/regtested on x86_64-pc-linux-gnu gcc/ChangeLog: * lto-opts.cc (lto_write_options): Skip OPT_fltrans_output_list_. --- gcc/lto-opts.cc | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/lto-opts.cc b/gcc/lto-opts.cc index c9bee9d4197..0451e290c75 100644 --- a/gcc/lto-opt

[PATCH 0/7] lto: Incremental LTO.

2023-11-17 Thread Michal Jires
Hi, these patches implement Incremental LTO, specifically by caching results of ltrans phase. Secondarily these patches contain changes to reduce divergence of ltrans partitions so that they can be cached. The aim is to reduce compile times for quick edit-compile cycles while using LTO. Even with

Re: [PATCH] vect: Use statement vectype for conditional mask.

2023-11-17 Thread Robin Dapp
> No, you shouldn't place _7 != 0 inside the .COND_ADD but instead > have an extra pattern stmt producing that so > > patt_8 = _7 != 0; > patt_9 = .COND_ADD (patt_8, ...); > > that's probably still not enough, but I always quickly forget how > bool patterns work ... basically a comparison like pa

Re: [Patch] Fortran: Accept -std=f2023, update line-length for Fortran 2023

2023-11-17 Thread Harald Anlauf
Hi Tobias, On 11/17/23 12:38, Tobias Burnus wrote: Hi Harald, hi all, On 16.11.23 20:30, Harald Anlauf wrote: According to the standard one can have 99 lines with only "&" and then an ";", but then only 100 lines with 1 characters. I believe a single '&' is not valid, you either need

[PATCH 4/5] aarch64: Add ZT0

2023-11-17 Thread Richard Sandiford
SME2 adds a 512-bit lookup table called ZT0. It is enabled and disabled by PSTATE.ZA, just like ZA itself. This patch adds support for the register, including saving and restoring contents. The code reuses the V8DI that was added for LS64, including the associated memory classification rules. (

[PATCH 2/5] aarch64: Add svcount_t

2023-11-17 Thread Richard Sandiford
Some SME2 instructions interpret predicates as counters, rather than as bit-per-byte masks. The SME2 ACLE defines an svcount_t type for this interpretation. I don't think we have a better way of representing counters than the VNx16BI that we use for masks. The patch therefore doesn't add a new m

[PATCH 3/5] aarch64: Add svboolx2_t

2023-11-17 Thread Richard Sandiford
SME2 has some instructions that operate on pairs of predicates. The SME2 ACLE defines an svboolx2_t type for the associated intrinsics. The patch uses a double-width predicate mode, VNx32BI, to represent the contents, similarly to how data vector tuples work. At present there doesn't seem to be a

[PATCH 1/5] aarch64: Add +sme2

2023-11-17 Thread Richard Sandiford
gcc/ * doc/invoke.texi: Document +sme2. * doc/sourcebuild.texi: Document aarch64_sme2. * config/aarch64/aarch64-option-extensions.def (AARCH64_OPT_EXTENSION): Add sme2. * config/aarch64/aarch64.h (AARCH64_ISA_SME2, TARGET_SME2): New macros. gcc/testsuite/

Re: [committed] libstdc++: Define C++26 saturation arithmetic functions (P0543R3)

2023-11-17 Thread Daniel Krügler
Am Fr., 17. Nov. 2023 um 18:31 Uhr schrieb Jonathan Wakely : > > On Fri, 17 Nov 2023 at 17:01, Daniel Krügler > wrote: > > [..] > > > + > > > +namespace std _GLIBCXX_VISIBILITY(default) > > > +{ > > > +_GLIBCXX_BEGIN_NAMESPACE_VERSION > > > + > > > + /// Add two integers, with saturation in case

aarch64: Add support for SME2

2023-11-17 Thread Richard Sandiford
This series of patches adds support for SME2. It is gated behind the earlier series for SME. All of the detail is in the individual patch summaries. Tested on aarch64-linux-gnu. Richard

Re: [committed] libstdc++: Define C++26 saturation arithmetic functions (P0543R3)

2023-11-17 Thread Jonathan Wakely
On Fri, 17 Nov 2023 at 17:01, Daniel Krügler wrote: > > Am Fr., 17. Nov. 2023 um 16:32 Uhr schrieb Jonathan Wakely > : > > > > Tested x86_64-linux. Pushed to trunk. > > > > GCC generates better code for add_sat if we use: > > > > unsigned z = x + y; > > z |= -(z < x); > > return z; > > > > If the

[PATCH 20/21] aarch64: Enforce inlining restrictions for SME

2023-11-17 Thread Richard Sandiford
A function that has local ZA state cannot be inlined into its caller, since we only support managing ZA switches at function scope. A function whose body directly clobbers ZA state cannot be inlined into a function with ZA state. A function whose body requires a particular PSTATE.SM setting can o

[PATCH 21/21] aarch64: Update sibcall handling for SME

2023-11-17 Thread Richard Sandiford
We only support tail calls between functions with the same PSTATE.ZA setting ("private-ZA" to "private-ZA" and "shared-ZA" to "shared-ZA"). Only a normal non-streaming function can tail-call another non-streaming function, and only a streaming function can tail-call another streaming function. An

[PATCH 18/21] aarch64: Add support for __arm_locally_streaming

2023-11-17 Thread Richard Sandiford
This patch adds support for the __arm_locally_streaming attribute, which allows a function to use SME internally without changing the function's ABI. The attribute is valid but redundant for __arm_streaming functions. gcc/ * config/aarch64/aarch64.cc (aarch64_arm_attribute_table): Add

[PATCH 19/21] aarch64: Handle PSTATE.SM across abnormal edges

2023-11-17 Thread Richard Sandiford
PSTATE.SM is always off on entry to an exception handler, and on entry to a nonlocal goto receiver. Those entry points need to switch PSTATE.SM back to the appropriate state for the current function. In the case of streaming-compatible functions, they need to restore the mode that the caller was o

[PATCH 14/21] aarch64: Add a VNx1TI mode

2023-11-17 Thread Richard Sandiford
Although TI isn't really a native SVE element mode, it's convenient for SME if we define VNx1TI anyway, so that it can be used to distinguish .Q ZA operations from others. It's purely an RTL convenience and isn't (yet) a valid storage mode. gcc/ * config/aarch64/aarch64-modes.def: Add VNx

[PATCH 13/21] aarch64: Add a register class for w12-w15

2023-11-17 Thread Richard Sandiford
Some SME instructions use w12-w15 to index ZA. This patch adds a register class for that range. gcc/ * config/aarch64/aarch64.h (W12_W15_REGNUM_P): New macro. (W12_W15_REGS): New register class. (REG_CLASS_NAMES, REG_CLASS_CONTENTS): Add entries for it. * config/aa

[PATCH 16/21] aarch64: Generalise _m rules for SVE intrinsics

2023-11-17 Thread Richard Sandiford
In SVE there was a simple rule that unary merging (_m) intrinsics had a separate initial argument to specify the values of inactive lanes, whereas other merging functions took inactive lanes from the first operand to the operation. That rule began to break down in SVE2, and it continues to do so i

[PATCH 11/21] aarch64: Switch PSTATE.SM around calls

2023-11-17 Thread Richard Sandiford
This patch adds support for switching to the appropriate SME mode for each call. Switching to streaming mode requires an SMSTART SM instruction and switching to non-streaming mode requires an SMSTOP SM instruction. If the call is being made from streaming-compatible code, these switches are condi

[PATCH 15/21] aarch64: Generalise unspec_based_function_base

2023-11-17 Thread Richard Sandiford
Until now, SVE intrinsics that map directly to unspecs have always used type suffix 0 to distinguish between signed integers, unsigned integers, and floating-point values. SME adds functions that need to use type suffix 1 instead. This patch generalises the classes accordingly. gcc/ * conf

[PATCH 12/21] aarch64: Add support for SME ZA attributes

2023-11-17 Thread Richard Sandiford
SME has an array called ZA that can be enabled and disabled separately from streaming mode. A status bit called PSTATE.ZA indicates whether ZA is currently enabled or not. In C and C++, the state of PSTATE.ZA is controlled using function attributes. There are four attributes that can be attached

[PATCH 09/21] aarch64: Distinguish streaming-compatible AdvSIMD insns

2023-11-17 Thread Richard Sandiford
The vast majority of Advanced SIMD instructions are not available in streaming mode, but some of the load/store/move instructions are. This patch adds a new target feature macro called TARGET_BASE_SIMD for this streaming-compatible subset. The vector-to-vector move instructions are not streaming-

[PATCH 08/21] aarch64: Add +sme

2023-11-17 Thread Richard Sandiford
This patch adds the +sme ISA feature and requires it to be present when compiling arm_streaming code. (arm_streaming_compatible code does not necessarily assume the presence of SME. It just has to work when SME is present and streaming mode is enabled.) gcc/ * doc/invoke.texi: Document S

[PATCH 07/21] aarch64: Add arm_streaming(_compatible) attributes

2023-11-17 Thread Richard Sandiford
This patch adds support for recognising the SME arm::streaming and arm::streaming_compatible attributes. These attributes respectively describe whether the processor is definitely in "streaming mode" (PSTATE.SM==1), whether the processor is definitely not in streaming mode (PSTATE.SM==0), or wheth

[PATCH 06/21] aarch64: Add tuple forms of svreinterpret

2023-11-17 Thread Richard Sandiford
SME2 adds a number of intrinsics that operate on tuples of 2 and 4 vectors. The ACLE therefore extends the existing svreinterpret intrinsics to handle tuples as well. gcc/ * config/aarch64/aarch64-sve-builtins-base.cc (svreinterpret_impl::fold): Punt on tuple forms. (svrei

[PATCH 05/21] aarch64: Add group suffixes to SVE intrinsics

2023-11-17 Thread Richard Sandiford
The SME2 ACLE adds a new "group" suffix component to the naming convention for SVE intrinsics. This is also used in the new tuple forms of the svreinterpret intrinsics. This patch adds support for group suffixes and defines the x2, x3 and x4 suffixes that are needed for the svreinterprets. gcc/

[PATCH 04/21] aarch64: Make AARCH64_FL_SVE requirements explicit

2023-11-17 Thread Richard Sandiford
So far, all intrinsics covered by the aarch64-sve-builtins* framework have (naturally enough) required at least SVE. However, arm_sme.h defines a couple of intrinsics that can be called by any code. It's therefore necessary to make the implicit SVE requirement explicit. gcc/ * config/aarc

[PATCH 03/21] aarch64: Use SVE's RDVL instruction

2023-11-17 Thread Richard Sandiford
We didn't previously use SVE's RDVL instruction, since the CNT* forms are preferred and provide most of the range. However, there are some cases that RDVL can handle and CNT* can't, and using RDVL-like instructions becomes important for SME. gcc/ * config/aarch64/aarch64-protos.h (aarch64

[PATCH 02/21] aarch64: Add a result_mode helper function

2023-11-17 Thread Richard Sandiford
SME will add more intrinsics whose expansion code requires the mode of the function return value. This patch adds an associated helper routine. gcc/ * config/aarch64/aarch64-sve-builtins.h (function_expander::result_mode): New member function. * config/aarch64/aarch64-sve-

[PATCH 01/21] aarch64: Generalise require_immediate_lane_index

2023-11-17 Thread Richard Sandiford
require_immediate_lane_index previously hard-coded the assumption that the group size is determined by the argument immediately before the index. However, for SME, there are cases where it should be determined by an earlier argument instead. gcc/ * config/aarch64/aarch64-sve-builtins.h:

[PATCH 00/21] aarch64: Add support for SME

2023-11-17 Thread Richard Sandiford
This series of patches adds support for SME. A follow-on series will add SME2 on top. All of the detail is in the individual patch summaries. The series can't go in yet, because it depends on: https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629713.html and some reviewed-but-unpushed

Re: RISC-V: Support XTheadVector extensions

2023-11-17 Thread Palmer Dabbelt
On Fri, 17 Nov 2023 03:39:48 PST (-0800), juzhe.zh...@rivai.ai wrote: 90% theadvector extension reusing current RVV 1.0 instructions patterns: Just change ASM, For example: @@ -2923,7 +2923,7 @@ (define_insn "*pred_mulh_scalar" (match_operand:VFULLI_D 3 "register_operand" "vr,vr, v

Re: [committed] libstdc++: Define C++26 saturation arithmetic functions (P0543R3)

2023-11-17 Thread Daniel Krügler
Am Fr., 17. Nov. 2023 um 16:32 Uhr schrieb Jonathan Wakely : > > Tested x86_64-linux. Pushed to trunk. > > GCC generates better code for add_sat if we use: > > unsigned z = x + y; > z |= -(z < x); > return z; > > If the compiler can't be improved we should consider using that instead > of __builtin

Re: Add 'libgomp.c++/static-local-variable-1.C'

2023-11-17 Thread Thomas Schwinge
Hi! On 2023-11-17T16:24:46+0100, I wrote: > [...] attached "Add 'libgomp.c++/static-local-variable-1.C'" [...] Now, working on translating this into an OpenMP 'target' variant. My goal here is not necessarily to make this work now, but rather to figure out whether '-fthreadsafe-statics' actually

Re: RISC-V: Support XTheadVector extensions

2023-11-17 Thread Jeff Law
On 11/17/23 04:39, juzhe.zh...@rivai.ai wrote: 90% theadvector extension reusing current RVV 1.0 instructions patterns: Just change ASM, For example: @@ -2923,7 +2923,7 @@ (define_insn "*pred_mulh_scalar" (match_operand:VFULLI_D 3 "register_operand" "vr,vr, vr, vr")] VMULH)

Re: [PATCH v3 0/2] Replace intl/ with out-of-tree GNU gettext

2023-11-17 Thread David Edelsohn
On Fri, Nov 17, 2023 at 10:17 AM Arsen Arsenović wrote: > > David Edelsohn writes: > > > On Fri, Nov 17, 2023 at 3:46 AM Arsen Arsenović wrote: > > > >> > >> David Edelsohn writes: > >> > >> > On Thu, Nov 16, 2023 at 5:52 PM Arsen Arsenović > wrote: > >> > > >> > [snip] > >> >> Sure, but my p

[PATCH 2/2] libstdc++: Ensure valid UTF-8 in std::vprint_unicode

2023-11-17 Thread Jonathan Wakely
This is a naive implementation of the UTF-8 validation algorithm, which could definitely be optimized. But it's faster than using std::codecvt_utf8 and checking the result of that, which is the only existing code we have to do it in the library. As the TODO suggests, we could do the UTF-8 to UTF-1

[PATCH 1/2] libstdc++: Implement C++23 header [PR107760]

2023-11-17 Thread Jonathan Wakely
There's a TODO here about checking for invalid UTF-8, which is done by the next patch. I don't know if the Windows code actually works. I tried to test it with mingw and Wine, but I got garbled text. But I'm not sure if that's my code here, or the conversion to UTF-16, or how I'm testing, or just

[PATCH] libstdc++: Add fast path for std::format("{}", x) [PR110801]

2023-11-17 Thread Jonathan Wakely
I'll probably push this before stage 1 closes. I might move the new lambda out to a struct at namespace scope first though. -- >8 -- libstdc++-v3/ChangeLog: PR libstdc++/110801 * include/std/format (_Sink_iter::_M_get_pointer) (_Sink_iter::_M_end_pointer): New functions

[PATCH] libstdc++: Define std::ranges::to for C++23 (P1206R7) [PR111055]

2023-11-17 Thread Jonathan Wakely
This needs tests, and doesn't include the changes to the standard containers to add insert_range etc. (but they work with ranges::to anyway, using the existing member functions). I plan to write the tests and push this tomorrow. I've trimmed the boring bits of the version.h changes, that are caus

[committed] libstdc++: Regenerate config.h.in

2023-11-17 Thread Jonathan Wakely
Pushed to trunk. -- >8 -- libstdc++-v3/ChangeLog: * config.h.in: Regenerate. --- libstdc++-v3/config.h.in | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libstdc++-v3/config.h.in b/libstdc++-v3/config.h.in index c0aa51af3f0..17da7bb9867 100644 --- a/libstdc++-v3/conf

Re: [PATCH v2] RISC-V: Implement target attribute

2023-11-17 Thread Andreas Schwab
In file included from /daten/riscv64/gcc/gcc-20231117/Build/prev-riscv64-suse-linux/libstdc++-v3/include/memory:78, from ../../gcc/system.h:769, from ../../gcc/config/riscv/riscv-target-attr.cc:25: In member function 'void std::default_delete<_Tp>::ope

[committed] libstdc++: Define C++26 saturation arithmetic functions (P0543R3)

2023-11-17 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk. GCC generates better code for add_sat if we use: unsigned z = x + y; z |= -(z < x); return z; If the compiler can't be improved we should consider using that instead of __builtin_add_overflow. -- >8 -- This was approved for C++26 last week at the WG21 me

Add 'libgomp.c++/static-local-variable-1.C'

2023-11-17 Thread Thomas Schwinge
Hi! I found that with GCC's '-fthreadsafe-statics' implementation (..., which is enabled by default) instrumented as follows: --- libstdc++-v3/libsupc++/guard.cc +++ libstdc++-v3/libsupc++/guard.cc @@ -271,6 +273,7 @@ namespace __cxxabiv1 extern "C" int __cxa_guard_acqui

Re: Re: [PATCH] RISC-V: Disallow 64-bit indexed loads and stores for rv32gcv.

2023-11-17 Thread 钟居哲
>> Yeah, just noticed that myself. Anyway will do some more tests, >> maybe my initial VLS analysis was somehow flawed. You can check binop_vx_constraint-167.c ~ binop_vx_constraint-174.c This patch is pre-approved if you change as my suggestion. I am gonna sleep so I am not able to review again

Re: [PATCH v3 0/2] Replace intl/ with out-of-tree GNU gettext

2023-11-17 Thread Arsen Arsenović
David Edelsohn writes: > On Fri, Nov 17, 2023 at 3:46 AM Arsen Arsenović wrote: > >> >> David Edelsohn writes: >> >> > On Thu, Nov 16, 2023 at 5:52 PM Arsen Arsenović wrote: >> > >> > [snip] >> >> Sure, but my patch does insert --disable-shared: >> >> >> >> --8<---cut here

Re: [PATCH] RISC-V: Fix bug of tuple move splitter[PR112561]

2023-11-17 Thread Jeff Law
On 11/17/23 07:18, Kito Cheng wrote: I didn’t take a closer look yet on the ira/lra dump yet, but my feeling is that may cause by the earlyclober modifier isn’t work as expect? Let me take closer look tomorrow. Remember that constraints aren't checked until register allocation. So the comb

Re: [PATCH] RISC-V: Disallow 64-bit indexed loads and stores for rv32gcv.

2023-11-17 Thread Robin Dapp
> It must be correct. We already have test (intrinsic codes) for it. Yeah, just noticed that myself. Anyway will do some more tests, maybe my initial VLS analysis was somehow flawed. > Condition should be put into iterators (Add a new iterator for > indexed load store). Ah, that's what you mea

Re: [PATCH v4] c-family: Implement __has_feature and __has_extension [PR60512]

2023-11-17 Thread Alex Coplan
On 03/11/2023 12:19, Marek Polacek wrote: > On Wed, Sep 27, 2023 at 03:27:30PM +0100, Alex Coplan wrote: > > Hi, > > > > This is a v4 patch to address Jason's feedback here: > > https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630911.html > > > > w.r.t. v3 it just removes a comment now th

Re: Re: [PATCH] RISC-V: Disallow 64-bit indexed loads and stores for rv32gcv.

2023-11-17 Thread 钟居哲
>> I'm wondering whether the VLA modes in the iterator are correct. >> Looks dubious to me but unsure, will need to create some tests >> before continuing. It must be correct. We already have test (intrinsic codes) for it. >> What's the problem with those? We probably won't reach there >> becaus

Re: [PATCH] RISC-V: Disallow 64-bit indexed loads and stores for rv32gcv.

2023-11-17 Thread Robin Dapp
> OK. Make sense。 I'm wondering whether the VLA modes in the iterator are correct. Looks dubious to me but unsure, will need to create some tests before continuing. > LGTM as long as you remove  all > GET_MODE_BITSIZE (GET_MODE_INNER (mode)) <= GET_MODE_BITSIZE (Pmode) What's the problem with th

[PATCH v5] c-family: Implement __has_feature and __has_extension [PR60512]

2023-11-17 Thread Alex Coplan
Hi, This is a v5 patch to address Marek's feedback here: https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635157.html I also implemented Jason's suggestion to use constexpr for the tables from this review: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634484.html I'll attach the

[committed] libstdc++: Adjust std::in_range template parameter name

2023-11-17 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk. -- >8 -- This is more consistent with the specification in the standard. libstdc++-v3/ChangeLog: * include/std/utility (in_range): Rename _Up parameter to _Res. --- libstdc++-v3/include/std/utility | 14 +++--- 1 file changed, 7 insertions(

[committed] libstdc++: Add more Doxygen comments and another test for std::out_ptr

2023-11-17 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk. -- >8 -- Improve Doxygen comments for std::out_ptr etc. and add a test for the feature test macro. Also remove a redundant preprocessor condition. Ideally the docs for std::out_ptr and std::inout_ptr would show examples of how to use them and what they do,

Re: [PATCH v3 0/2] Replace intl/ with out-of-tree GNU gettext

2023-11-17 Thread David Edelsohn
On Fri, Nov 17, 2023 at 3:46 AM Arsen Arsenović wrote: > > David Edelsohn writes: > > > On Thu, Nov 16, 2023 at 5:52 PM Arsen Arsenović wrote: > > > > [snip] > >> Sure, but my patch does insert --disable-shared: > >> > >> --8<---cut here---start->8--- > >> ho

[committed] libstdc++: Fix Doxygen markup

2023-11-17 Thread Jonathan Wakely
Pushed to trunk. -- >8 -- libstdc++-v3/ChangeLog: * include/bits/chrono_io.h: Fix Doxygen markup. --- libstdc++-v3/include/bits/chrono_io.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libstdc++-v3/include/bits/chrono_io.h b/libstdc++-v3/include/bits/chrono_io.h

Re: [PATCH] c++, v2: Implement C++26 P2741R3 - user-generated static_assert messages [PR110348]

2023-11-17 Thread Jakub Jelinek
On Fri, Nov 17, 2023 at 09:18:39AM -0500, Jason Merrill wrote: > You recently pinged this patch, but I haven't seen an update since this > review? Oops, sorry, I've missed this and DR 2406 review posts in my inbox during vacation, will get to that momentarily. Thanks. Jakub

Re: Darwin: Replace environment runpath with embedded [PR88590]

2023-11-17 Thread FX Coudert
>> I have done a full rebuild, and having looked more at the structure of >> libtool.m4 I am now convinced that having that line outside of the scope of >> _LT_DARWIN_LINKER_FEATURES is simply wrong (probably a copy-pasto or >> leftover from earlier code). >> Having rebuilt everything, it only m

  1   2   >