RE: [x86 PATCH]: Additional peephole2 to use lea in round-up integer division.

2024-06-30 Thread Roger Sayle
ates: > > > > foo:movlm(%rip), %eax > > addl%edi, %eax // 2 bytes > > subl$1, %eax// 3 bytes > > cltd > > idivl %edi > > ret > > > > This discrepancy is caused by the late

Re: [x86 PATCH]: Additional peephole2 to use lea in round-up integer division.

2024-06-30 Thread Uros Bizjak
3 bytes > cltd > idivl %edi > ret > > This discrepancy is caused by the late decision (in peephole2) to split > an addition with a memory operand, into a load followed by a reg-reg > addition. This patch improves this situation by adding a peephole2 &g

Re: [PATCH] RISC-V: use fclass insns to implement isfinite and isnormal builtins

2024-06-30 Thread Jeff Law
On 6/29/24 3:07 PM, Vineet Gupta wrote: On 6/29/24 06:44, Jeff Law wrote: +;; fclass instruction output bitmap +;; 0 negative infinity +;; 1 negative normal number. +;; 2 negative subnormal number. +;; 3 -0 +;; 4 +0 +;; 5 positive subnormal number. +;; 6 positive normal number.

[testsuite PATCH] Fix -m32 gcc.target/i386/pr102464-vrndscaleph.c on RedHat.

2024-06-30 Thread Roger Sayle
This patch fixes the 4 FAILs of gcc.target/i386/pr192464-vrndscaleph.c with --target_board='unix{-m32}' on RedHat 7.x. The issue is that this AVX512 test includes the system math.h, and on older systems this provides inline versions of floor, ceil and rint (for the 387). The work around

[Patch, rtl-optimization, loop-unroll] Loop unroll factor based on register pressure

2024-06-30 Thread Ajit Agarwal
Hello All: This patch determines unroll factor based on loop register pressure. Unroll factor is quotient of max of available registers in loop by number of liveness. If available registers increases unroll factor increases. Wherein unroll factor decreases if number of liveness increases. Loop

[PATCH 3/3] Preserve SSA info for more propagated copy

2024-06-30 Thread Richard Biener
Besides VN and copy-prop also CCP and VRP as well as forwprop propagate out copies and thus it's worthwhile to try to preserve range and points-to info there when possible. Note that this also fixes the testcase from PR115701 but that's because we do not actually intersect info but only copy info

[PATCH 2/3] tree-optimization/115701 - fix maybe_duplicate_ssa_info_at_copy

2024-06-30 Thread Richard Biener
The following restricts copying of points-to info from defs that might be in regions invoking UB and are never executed. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. PR tree-optimization/115701 * tree-ssanames.cc (maybe_duplicate_ssa_info_at_copy): Only

[PATCH 1/3] tree-optimization/115701 - factor out maybe_duplicate_ssa_info_at_copy

2024-06-30 Thread Richard Biener
The following factors out the code that preserves SSA info of the LHS of a SSA copy LHS = RHS when LHS is about to be eliminated to RHS. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. PR tree-optimization/115701 * tree-ssanames.h (maybe_duplicate_ssa_info_at_copy):

Re: [PATCH] RISC-V: use fclass insns to implement isfinite and isnormal builtins

2024-06-30 Thread Xi Ruoyao
On Fri, 2024-06-28 at 17:53 -0700, Vineet Gupta wrote: > +  UNSPEC_ISFINITE > +  UNSPEC_ISNORMAL You don't really need them. The RTL pattern of define_expand has no use when you expand it via C code and DONE. i.e. you can just code (define_expand "isfinite2" [(match_operand:SI 0

[PATCH v1] Vect: Distribute truncation into .SAT_SUB operands

2024-06-29 Thread pan2 . li
: __attribute__((noinline)) void test (uint16_t *x, unsigned b, unsigned n) { unsigned a = 0; uint16_t *p = x; do { a = *--p; *p = (uint16_t)(a >= b ? a - b : 0); } while (--n); } Before this patch: ... .L3: vle16.v v1,0(a3) vrsub.vx v5,v2,t1 mvt3,a4 addw a4,a4

[Patch, rtl-optimization]: Loop unroll factor based on register pressure

2024-06-29 Thread Ajit Agarwal
Hello All: This patch determines Unroll factor based on loop register pressure. Unroll factor is quotient of max of available registers in loop by number of liveness. If available registers increases unroll factor increases. Wherein unroll factor decreases if number of liveness increases. Loop

Re: [PATCH][PR115565] cse: Don't use a valid regno for non-register in comparison_qty

2024-06-29 Thread Maciej W. Rozycki
r could be improved, or maybe output produced from `print_rtx_function' isn't right, I don't know. > The patch is OK for trunk, thanks. I agree that it's a regression > from 08a692679fb8. Since it's fixing such a hard-to-diagnose wrong > code bug, and since it seems very safe, I think it

[PATCH 5/5] Document return value in write_cv_integer

2024-06-29 Thread Mark Harmstone
gcc/ * dwarf2codeview.cc (write_lf_modifier): Expand upon comment. --- gcc/dwarf2codeview.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc index 5a33b439b14..df53d8bab9d 100644 --- a/gcc/dwarf2codeview.cc +++

[PATCH 4/5] Make sure CodeView symbols are aligned

2024-06-29 Thread Mark Harmstone
CodeView symbols have to be multiples of four bytes; add an alignment directive to write_data_symbol to ensure this. Note that these can be zeroes, so we can rely on GAS to do this for us; it's only types that need f3, f2, f1 values. gcc/ * dwarf2codeview.cc (write_data_symbol):

[PATCH 3/5] Avoid magic numbers when writing CodeView padding

2024-06-29 Thread Mark Harmstone
Adds names for the padding magic numbers to enum cv_leaf_type. gcc/ * dwarf2codeview.cc (enum cv_leaf_type): Add padding constants. (write_cv_padding): Use names for padding constants. --- gcc/dwarf2codeview.cc | 11 +++ 1 file changed, 7 insertions(+), 4

[PATCH 2/5] Add CodeView enum cv_sym_type

2024-06-29 Thread Mark Harmstone
Make everything more gdb-friendly by using an enum for symbol constants rather than #defines. gcc/ * dwarf2codeview.cc (S_LDATA32, S_GDATA32, S_COMPILE3): Undefine. (enum cv_sym_type): Define. (struct codeview_symbol): Use enum cv_sym_type.

[PATCH 1/5] Add CodeView enum cv_leaf_type

2024-06-29 Thread Mark Harmstone
Make everything more gdb-friendly by using an enum for type constants rather than #defines. gcc/ * dwarf2codeview.cc (enum cv_leaf_type): Define. (struct codeview_subtype): Use enum cv_leaf_type. (struct codeview_custom_type): Use enum cv_leaf_type.

Re: [PATCH] RISC-V: use fclass insns to implement isfinite and isnormal builtins

2024-06-29 Thread Vineet Gupta
On 6/29/24 06:44, Jeff Law wrote: >> +;; fclass instruction output bitmap >> +;; 0 negative infinity >> +;; 1 negative normal number. >> +;; 2 negative subnormal number. >> +;; 3 -0 >> +;; 4 +0 >> +;; 5 positive subnormal number. >> +;; 6 positive normal number. >> +;; 7 positive

[PATCH] c: Diagnose declarations that are used only in their own initializer [PR115027]

2024-06-29 Thread Martin Uecker
Probably not entirely fool-proof when using statement expressions in initializers, but should be good enough. Bootstrapped and regression tested on x86_64. c: Diagnose declarations that are used only in their own initializer [PR115027] Track the declaration that is currently

[x86 PATCH]: Additional peephole2 to use lea in round-up integer division.

2024-06-29 Thread Roger Sayle
addition. This patch improves this situation by adding a peephole2 to recognized consecutive additions and transform them into lea if profitable. My first attempt at fixing this was to use a define_insn_and_split: (define_insn_and_split "*lea3_reg_mem_imm" [(set (match_opera

[PATCH] c: Fix ICE for incorrect code in comptypes_verify [PR115696]

2024-06-29 Thread Martin Uecker
This adds missing code for handling error marks. Bootstrapped and regression tested on x86_64. c: Fix ICE for incorrect code in comptypes_verify [PR115696] The new verification code produces an ICE for incorrect code. Add the same logic as already used in comptypes to to

[PATCH] c: Fix ICE for redeclaration of structs with different alignment [PR114727]

2024-06-29 Thread Martin Uecker
This fixes an ICE when redeclaring a struct and having an aligned attribute in one version in C23. Bootstrapped and regression tested on x86_64. c: Fix ICE for redeclaration of structs with different alignment [PR114727] For redeclarations of struct in C23, if one has an

Re: [RFC PATCH] cse: Add another CSE pass after split1

2024-06-29 Thread Jeff Law
On 6/27/24 3:56 PM, Palmer Dabbelt wrote: This is really more of a question than a patch. Looking at PR/115687 I managed to convince myself there's a general class of problems here: splitting might produce constant subexpressions, but as far as I can tell there's nothing to eliminate those

Re: [PATCH] _Hashtable fancy pointer support

2024-06-29 Thread François Dumont
using nullptr so I think it's fine. I haven't reviewed the patch yet, but this answers the nullptr question: https://en.cppreference.com/w/cpp/named_req/NullablePointer (aka Cpp17NullablePointer in the C++ standard). diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/includ

Re: [PATCH] RISC-V: use fclass insns to implement isfinite and isnormal builtins

2024-06-29 Thread Jeff Law
On 6/28/24 6:53 PM, Vineet Gupta wrote: Currently isfinite and isnormal use float compare instructions with fp flags save/restored around them. Our perf team complained this could be costly in uarch. RV Base ISA already has FCLASS.{d,s,h} instruction to do FP compares w/o disturbing FP

[PATCH v10] C, ObjC: Add -Wunterminated-string-initialization

2024-06-29 Thread Alejandro Colomar
re that the programmer didn't make any mistakes. This warning catches the bug above, so that the programmer will be able to fix it and write: char log_levels[][8] = { "info", "warning", "err" }; This warning already existed as part of -Wc++-compat, but this patch allo

Re: [PATCH v9] C, ObjC: Add -Wunterminated-string-initialization

2024-06-29 Thread Alejandro Colomar
On Sat, Jun 29, 2024 at 02:58:48PM GMT, Alejandro Colomar wrote: > On Sat, Jun 29, 2024 at 02:52:40PM GMT, Alejandro Colomar wrote: > > @@ -6450,6 +6452,8 @@ name is still supported, but the newer name is more > > descriptive.) > > -Wstring-compare > > -Wtype-limits > > -Wuninitialized > >

Re: [PATCH v9] C, ObjC: Add -Wunterminated-string-initialization

2024-06-29 Thread Alejandro Colomar
fix it and write: > > char log_levels[][8] = { "info", "warning", "err" }; > > This warning already existed as part of -Wc++-compat, but this patch > allows enabling it separately. It is also included in -Wextra, since > it may not always

[PATCH v9] C, ObjC: Add -Wunterminated-string-initialization

2024-06-29 Thread Alejandro Colomar
re that the programmer didn't make any mistakes. This warning catches the bug above, so that the programmer will be able to fix it and write: char log_levels[][8] = { "info", "warning", "err" }; This warning already existed as part of -Wc++-compat, but this patch allo

[PATCH] c: Add support for byte arrays in C2Y

2024-06-29 Thread Martin Uecker
This marks structures which include a byte array as typeless storage. Bootstrapped and regression tested on x86_64. c: Add support for byte arrays in C2Y To get correct aliasing behavior requires that structures and unions that contain a byte array, i.e. an array of

Re: [PATCH] libgccjit: Add ability to get the alignment of a type

2024-06-28 Thread Iain Sandoe
; vars) - later today. Fixed as attached, Iain 0001-jit-Fix-Darwin-bootstrap-after-r15-1699.patch Description: Binary data

Re: [PATCH] Hard register asm constraint

2024-06-28 Thread Stefan Schulze Frielinghaus
On Fri, Jun 28, 2024 at 11:46:08AM +0200, Georg-Johann Lay wrote: > Am 27.06.24 um 10:51 schrieb Stefan Schulze Frielinghaus: > > On Thu, Jun 27, 2024 at 09:45:32AM +0200, Georg-Johann Lay wrote: > > > Am 24.05.24 um 11:13 Am 25.06.24 um 16:03 schrieb Paul Koning: > > > > > On Jun 24, 2024, at

Re: [PATCH] RISC-V: use fclass insns to implement isfinite and isnormal builtins

2024-06-28 Thread Vineet Gupta
  return std::isfinite (x); } generating the new seq .LFB4:     fclass.d    a0,fa0     andi    a0,a0,126     snez    a0,a0     ret vs.     li    a0,1     ret I have a hunch this requires the pending value range patch from Hao Chen GUI. Thx, -Vineet [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653094.html

Re: [PATCH] libgccjit: Add ability to get the alignment of a type

2024-06-28 Thread Iain Sandoe
Hi Folks, > On 28 Jun 2024, at 12:50, Rainer Orth wrote: > > David Malcolm writes: > >> On Thu, 2024-04-04 at 18:59 -0400, Antoni Boucher wrote: >>> Hi. >>> This patch adds a new API to produce an rvalue representing the >>> alignment of a type.

Re: [PATCH] RISC-V: use fclass insns to implement isfinite and isnormal builtins

2024-06-28 Thread Andrew Waterman
+1 to any change that reduces the number of fflags accesses. On Fri, Jun 28, 2024 at 5:54 PM Vineet Gupta wrote: > > Currently isfinite and isnormal use float compare instructions with fp > flags save/restored around them. Our perf team complained this could be > costly in uarch. RV Base ISA

[PATCH] RISC-V: use fclass insns to implement isfinite and isnormal builtins

2024-06-28 Thread Vineet Gupta
Currently isfinite and isnormal use float compare instructions with fp flags save/restored around them. Our perf team complained this could be costly in uarch. RV Base ISA already has FCLASS.{d,s,h} instruction to do FP compares w/o disturbing FP exception flags. Coincidently, upstream ijust few

Re: [PATCH] Fortran: fix ALLOCATE with SOURCE of deferred character length [PR114019]

2024-06-28 Thread Steve Kargl
On Fri, Jun 28, 2024 at 10:00:53PM +0200, Harald Anlauf wrote: > > the attached patch fixes an ICE occuring for ALLOCATE with SOURCE > (or MOLD) of deferred character length in the scalar case, which > looked obscure because the ICE disappears at -O1 and higher. > > The

[PATCH] c++: DR2627, Bit-fields and narrowing conversions [PR94058]

2024-06-28 Thread Marek Polacek
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? -- >8 -- This DR (https://cplusplus.github.io/CWG/issues/2627.html) says that even if we are converting from an integer type or unscoped enumeration type to an integer type that cannot represent all the values of the original type, it's

Re: [PATCH] c++: Relax too strict assert in stabilize_expr [PR111160]

2024-06-28 Thread Patrick Palka
urns an expression > without TREE_SIDE_EFFECTS, which can't be if the involved type is volatile. > > This patch relaxes the assert to accept having TREE_THIS_VOLATILE on the > returned expression. > > Successfully tested on x86_64-pc-linux-gnu. > > PR c++/60 > >

Re: [PATCH] c++: Fix ICE locating 'this' for (not matching) template member function [PR115364]

2024-06-28 Thread Patrick Palka
} > === cut here === > > The problem is that get_fndecl_argument_location assumes that it has a > FUNCTION_DECL in its hands to find the location of the bad argument. It might > however have a TEMPLATE_DECL if there's a single candidate that cannot be > instantiated, like here. > &

[PATCH] Fortran: fix ALLOCATE with SOURCE of deferred character length [PR114019]

2024-06-28 Thread Harald Anlauf
Dear all, the attached patch fixes an ICE occuring for ALLOCATE with SOURCE (or MOLD) of deferred character length in the scalar case, which looked obscure because the ICE disappears at -O1 and higher. The dump tree suggests that it is a wrong decl for the temporary source that was e.g

Re: [PATCH 2/2] libstdc++: Do not use C++11 alignof in C++98 mode [PR104395]

2024-06-28 Thread Jonathan Wakely
Pushed to trunk. On Thu, 27 Jun 2024 at 10:01, Jonathan Wakely wrote: > > As I commented in the PR, I think it would be nice if the compiler > accepted C++11 alignof in C++98 mode when -faligned-new is used. But > even if G++ added that, we'd need Clang to use it, and then wait a few > releases

Re: [PATCH 1/2] libstdc++: Simplify class templates

2024-06-28 Thread Jonathan Wakely
Pushed to trunk. On Thu, 27 Jun 2024 at 10:03, Jonathan Wakely wrote: > > I'm planning to push this, although arguably the first change isn't > worth doing if we can't use it everywhere. If we need to keep the old > code for EDG, maybe we should just keep using that? The new version > probably

RE: [PATCH v6] aarch64: Add vector popcount besides QImode [PR113859]

2024-06-28 Thread Pengxuan Zheng (QUIC)
> > On 6/28/24 6:18 AM, Pengxuan Zheng wrote: > > > This patch improves GCC’s vectorization of __builtin_popcount for > > > aarch64 target by adding popcount patterns for vector modes besides > > > QImode, i.e., HImode, SImode and DImode. > > > > >

[PATCH v9] aarch64: Add vector popcount besides QImode [PR113859]

2024-06-28 Thread Pengxuan Zheng
This patch improves GCC’s vectorization of __builtin_popcount for aarch64 target by adding popcount patterns for vector modes besides QImode, i.e., HImode, SImode and DImode. With this patch, we now generate the following for V8HI: cnt v1.16b, v0.16b uaddlp v2.8h, v1.16b For V4HI, we

RE: [PATCH v6] aarch64: Add vector popcount besides QImode [PR113859]

2024-06-28 Thread Pengxuan Zheng (QUIC)
> On 6/28/24 6:18 AM, Pengxuan Zheng wrote: > > This patch improves GCC’s vectorization of __builtin_popcount for > > aarch64 target by adding popcount patterns for vector modes besides > > QImode, i.e., HImode, SImode and DImode. > > > > With this patch, we no

[PATCH v8] aarch64: Add vector popcount besides QImode [PR113859]

2024-06-28 Thread Pengxuan Zheng
This patch improves GCC’s vectorization of __builtin_popcount for aarch64 target by adding popcount patterns for vector modes besides QImode, i.e., HImode, SImode and DImode. With this patch, we now generate the following for V8HI: cnt v1.16b, v0.16b uaddlp v2.8h, v1.16b For V4HI, we

RE: [PATCH v7] aarch64: Add vector popcount besides QImode [PR113859]

2024-06-28 Thread Pengxuan Zheng (QUIC)
Please ignore this patch. I accidently added unrelated changes. I'll push a correct version shortly. Sorry for the noise. Thanks, Pengxuan > This patch improves GCC’s vectorization of __builtin_popcount for aarch64 > target by adding popcount patterns for vector modes besides QImod

[PATCH v7] aarch64: Add vector popcount besides QImode [PR113859]

2024-06-28 Thread Pengxuan Zheng
This patch improves GCC’s vectorization of __builtin_popcount for aarch64 target by adding popcount patterns for vector modes besides QImode, i.e., HImode, SImode and DImode. With this patch, we now generate the following for V8HI: cnt v1.16b, v0.16b uaddlp v2.8h, v1.16b For V4HI, we

Re: nvptx vs. [PATCH] Add a late-combine pass [PR106594]

2024-06-28 Thread Richard Sandiford
Richard Sandiford writes: > Thomas Schwinge writes: >> Hi! >> >> On 2024-06-27T23:20:18+0200, I wrote: >>> On 2024-06-27T22:27:21+0200, I wrote: >>>> On 2024-06-27T18:49:17+0200, I wrote: >>>>> On 2023-10-24T19:49:10+0100, Richard Sandi

[PATCH] i386: Cleanup tmp variable usage in ix86_expand_move

2024-06-28 Thread Uros Bizjak
Remove extra assignment, extra temp variable and variable shadowing. No functional changes intended. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_move): Remove extra assignment to tmp variable, reuse tmp variable instead of declaring new temporary variable and remove tmp

Re: [PATCH v3] Arm: Fix disassembly error in Thumb-1 relaxed load/store [PR115188]

2024-06-28 Thread Richard Earnshaw (lists)
On 27/06/2024 17:16, Wilco Dijkstra wrote: > Hi Richard, > >> Doing just this will mean that the register allocator will have to undo a >> pre/post memory operand that was accepted by the predicate (memory_operand).  >> I think we really need a tighter predicate (lets call it noautoinc_mem_op)

RE: [PATCH v3] Vect: Support truncate after .SAT_SUB pattern in zip

2024-06-28 Thread Li, Pan2
rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: RE: [PATCH v3] Vect: Support truncate after .SAT_SUB pattern in zip > -Original Message- > From: Richard Biener > Sent: Friday, June 28, 2024 6:39 AM > To: Li, Pan2 > Cc: gcc-patch

RE: [PATCH v1] Match: Support imm form for unsigned scalar .SAT_ADD

2024-06-28 Thread Li, Pan2
.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1] Match: Support imm form for unsigned scalar .SAT_ADD On Fri, Jun 28, 2024 at 5:44 AM wrote: > > From: Pan Li > > This patch would like to support the form of unsigned scalar .SAT_ADD > when one of the op is

[PATCH] RISC-V: Handle NULL stmt in SLP_TREE_SCALAR_STMTS

2024-06-28 Thread Richard Biener
The following starts to handle NULL elements in SLP_TREE_SCALAR_STMTS with the first candidate being the two-operator nodes where some lanes are do-not-care and also do not have a scalar stmt computing the result. I've sofar whack-a-moled the vect.exp testsuite. I do plan to use NULL elements

[PATCH][v2] RISC-V: Harden SLP reduction support wrt STMT_VINFO_REDUC_IDX

2024-06-28 Thread Richard Biener
The following makes sure that for a SLP reductions all lanes have the same STMT_VINFO_REDUC_IDX. Once we move that info and can adjust it we can implement swapping. It also makes the existing protection against operand swapping trigger for all stmts participating in a reduction, not just the

Re: nvptx vs. [PATCH] Add a late-combine pass [PR106594]

2024-06-28 Thread Richard Sandiford
Thomas Schwinge writes: > Hi! > > On 2024-06-27T23:20:18+0200, I wrote: >> On 2024-06-27T22:27:21+0200, I wrote: >>> On 2024-06-27T18:49:17+0200, I wrote: >>>> On 2023-10-24T19:49:10+0100, Richard Sandiford >>>> wrote: >>>>>

RE: [PATCH v3] Vect: Support truncate after .SAT_SUB pattern in zip

2024-06-28 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Friday, June 28, 2024 6:39 AM > To: Li, Pan2 > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; > jeffreya...@gmail.com; rdapp@gmail.com; Tamar Christina > > Subject: Re: [PATCH v3

Re: [PATCH] Use move-aware auto_vec in map

2024-06-28 Thread Jørgen Kvalsvik
On 6/28/24 13:55, Richard Biener wrote: On Fri, Jun 28, 2024 at 8:43 AM Jørgen Kvalsvik wrote: Using auto_vec rather than vec for means the vectors are release automatically upon return, to stop the leak. The problem seems is that auto_vec is not really move-aware, only the specialization

Re: [PATCH v1] Match: Support imm form for unsigned scalar .SAT_ADD

2024-06-28 Thread Richard Biener
On Fri, Jun 28, 2024 at 5:44 AM wrote: > > From: Pan Li > > This patch would like to support the form of unsigned scalar .SAT_ADD > when one of the op is IMM. For example as below: > > Form IMM: > #define DEF_SAT_U_ADD_IMM_FMT_1(T) \ >

Re: [PATCH 7/8] vect: Support multiple lane-reducing operations for loop reduction [PR114440]

2024-06-28 Thread Richard Biener
On Wed, Jun 26, 2024 at 4:50 PM Feng Xue OS wrote: > > Updated the patch. > > For lane-reducing operation(dot-prod/widen-sum/sad) in loop reduction, current > vectorizer could only handle the pattern if the reduction chain does not > contain other operation, no matter the other

Handle 'NUM' in 'PUSH_INSERT_PASSES_WITHIN' (was: [PATCH 03/11] Handwritten part of conversion of passes to C++ classes)

2024-06-28 Thread Thomas Schwinge
Hi! As part of this: On 2013-07-26T11:04:33-0400, David Malcolm wrote: > This patch is the hand-written part of the conversion of passes from > C structs to C++ classes. > --- a/gcc/passes.c > +++ b/gcc/passes.c ..., we did hard-code 'PUSH_INSERT_PASSES_WITHIN(PASS)' to

Re: [PATCH 4/8] vect: Determine input vectype for multiple lane-reducing

2024-06-28 Thread Richard Biener
on has its own input vectype, while reduction > - PHI records the input vectype with least lanes. */ > - if (lane_reducing) > -STMT_VINFO_REDUC_VECTYPE_IN (stmt_info) = vectype_in; > >enum vect_reduction_type reduction_type = STMT_VINFO_REDUC_TYPE (phi_info); >ST

Re: Rewrite usage comment at the top of 'gcc/passes.def' (was: [PATCH 02/11] Generate pass-instances.def)

2024-06-28 Thread Richard Biener
On Fri, Jun 28, 2024 at 2:14 PM Thomas Schwinge wrote: > > Hi! > > On 2013-07-26T11:04:32-0400, David Malcolm wrote: > > Introduce a new gen-pass-instances.awk script, and use it at build time > > to make a pass-instances.def from passes.def. > > (The script has later been rewritten and

Re: LoongArch vs. [PATCH 0/6] Add a late-combine pass

2024-06-28 Thread chenglulu
在 2024/6/28 下午8:35, Xi Ruoyao 写道: On Fri, 2024-06-28 at 20:34 +0800, chenglulu wrote: 在 2024/6/28 下午8:25, Xi Ruoyao 写道: Hi Richard, The late combine pass has triggered some FAILs on LoongArch and I'm investigating.  One of them is movcf2gr-via-fr.c.  In 315r.postreload: (insn 22 7 24 2

Re: LoongArch vs. [PATCH 0/6] Add a late-combine pass

2024-06-28 Thread Xi Ruoyao
On Fri, 2024-06-28 at 20:34 +0800, chenglulu wrote: > > 在 2024/6/28 下午8:25, Xi Ruoyao 写道: > > Hi Richard, > > > > The late combine pass has triggered some FAILs on LoongArch and I'm > > investigating.  One of them is movcf2gr-via-fr.c.  In > > 315r.postreload: > > > > (insn 22 7 24 2 (set

Re: LoongArch vs. [PATCH 0/6] Add a late-combine pass

2024-06-28 Thread chenglulu
在 2024/6/28 下午8:25, Xi Ruoyao 写道: Hi Richard, The late combine pass has triggered some FAILs on LoongArch and I'm investigating. One of them is movcf2gr-via-fr.c. In 315r.postreload: (insn 22 7 24 2 (set (reg:FCC 32 $f0 [87]) (reg:FCC 64 $fcc0 [87]))

Re: [PATCH] Fix native_encode_vector_part for itype when TYPE_PRECISION (itype) == BITS_PER_UNIT

2024-06-28 Thread Richard Sandiford
Richard Biener writes: > On Fri, Jun 28, 2024 at 2:16 PM Richard Biener > wrote: >> >> On Fri, Jun 28, 2024 at 11:06 AM Richard Biener >> wrote: >> > >> > >> > >> > > Am 28.06.2024 um 10:27 schrieb Richard Sandiford >> > > : >> > > >> > > Richard Biener writes: >> > >>> On Fri, Jun 28, 2024

LoongArch vs. [PATCH 0/6] Add a late-combine pass

2024-06-28 Thread Xi Ruoyao
Hi Richard, The late combine pass has triggered some FAILs on LoongArch and I'm investigating. One of them is movcf2gr-via-fr.c. In 315r.postreload: (insn 22 7 24 2 (set (reg:FCC 32 $f0 [87]) (reg:FCC 64 $fcc0 [87])) "../gcc/gcc/testsuite/gcc.target/loongarch/movcf2gr-via-fr.c":9:12

Re: [PATCH] Fix native_encode_vector_part for itype when TYPE_PRECISION (itype) == BITS_PER_UNIT

2024-06-28 Thread Richard Biener
On Fri, Jun 28, 2024 at 2:16 PM Richard Biener wrote: > > On Fri, Jun 28, 2024 at 11:06 AM Richard Biener > wrote: > > > > > > > > > Am 28.06.2024 um 10:27 schrieb Richard Sandiford > > > : > > > > > > Richard Biener writes: > > >>> On Fri, Jun 28, 2024 at 8:01 AM Richard Biener > > >>>

Re: [PATCH] Fix native_encode_vector_part for itype when TYPE_PRECISION (itype) == BITS_PER_UNIT

2024-06-28 Thread Richard Biener
On Fri, Jun 28, 2024 at 11:06 AM Richard Biener wrote: > > > > > Am 28.06.2024 um 10:27 schrieb Richard Sandiford > > : > > > > Richard Biener writes: > >>> On Fri, Jun 28, 2024 at 8:01 AM Richard Biener > >>> wrote: > >>> > >>> On Fri, Jun 28, 2024 at 3:15 AM liuhongt wrote: > >

Rewrite usage comment at the top of 'gcc/passes.def' (was: [PATCH 02/11] Generate pass-instances.def)

2024-06-28 Thread Thomas Schwinge
of 'gcc/passes.def'", see attached? Grüße Thomas >From 072cdf7d9cf86fb2b0553b93365648e153b4376b Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Fri, 28 Jun 2024 14:05:04 +0200 Subject: [PATCH] Rewrite usage comment at the top of 'gcc/passes.def' Since Subversion r201359 (Git c

Re: [PATCH] Use move-aware auto_vec in map

2024-06-28 Thread Richard Biener
On Fri, Jun 28, 2024 at 8:43 AM Jørgen Kvalsvik wrote: > > Using auto_vec rather than vec for means the vectors are release > automatically upon return, to stop the leak. The problem seems is that > auto_vec is not really move-aware, only the specialization > is. Indeed. > This is actually

Re: [PATCH] libgccjit: Add ability to get the alignment of a type

2024-06-28 Thread Rainer Orth
David Malcolm writes: > On Thu, 2024-04-04 at 18:59 -0400, Antoni Boucher wrote: >> Hi. >> This patch adds a new API to produce an rvalue representing the >> alignment of a type. >> Thanks for the review. > > Patch looks good to me (but may need the usual A

Re: [PATCH] i386: Fix regression after refactoring legitimize_pe_coff_symbol, ix86_GOT_alias_set and PE_COFF_LEGITIMIZE_EXTERN_DECL

2024-06-28 Thread Uros Bizjak
t; > > Thanks, > > Uros. > > It looks like the patch resolves 3 reported issues. > Uros, I suggest merging the patch as it is, without minor refactoring, to > avoid triggering another round of testing, if you agree. Yes, please go ahead. Thanks, Uros.

[PATCH] tree-optimization/115652 - more fixing of the fix

2024-06-28 Thread Richard Biener
The following addresses the corner case of an outer loop with an empty header where we end up asking for the BB of a NULL stmt by special-casing this case. Bootstrap and regtest running on x86_64-unknown-linux-gnu, the patch fixes observed ICEs on GCN. PR tree-optimization/115652

[PATCH] i386: Fix regression after refactoring legitimize_pe_coff_symbol, ix86_GOT_alias_set and PE_COFF_LEGITIMIZE_EXTERN_DECL

2024-06-28 Thread Evgeny Karpov
Thursday, June 27, 2024 8:13 PM Uros Bizjak wrote: > > So, there is no problem having #endif just after else. > > Anyway, it's your call, this is not a hill I'm willing to die on. ;) > > Thanks, > Uros. It looks like the patch resolves 3 reported issues. Uros, I sugg

RE: [RFC PATCH] cse: Add another CSE pass after split1

2024-06-28 Thread Tamar Christina
Hi, > -Original Message- > From: Palmer Dabbelt > Sent: Thursday, June 27, 2024 10:57 PM > To: gcc-patches@gcc.gnu.org > Cc: Palmer Dabbelt > Subject: [RFC PATCH] cse: Add another CSE pass after split1 > > This is really more of a question than a patch. >

Re: Re: [PATCH 0/2] fix RISC-V zcmp popretz [PR113715]

2024-06-28 Thread Fei Gao
On 2024-06-09 04:36  Jeff Law wrote: > > > >On 6/5/24 8:42 PM, Fei Gao wrote: > >>> But let's back up and get a good explanation of what the problem is. >>> Based on patch 2/2 it looks like we have lost an assignment to the >>> return register. >

Re: [PATCH v2] MIPS: Output $0 for conditional trap if !ISA_HAS_COND_TRAPI

2024-06-28 Thread Maciej W. Rozycki
pattern is the same > > > > in both cases anyway. This would prevent special-casing from being > > > > needed > > > > in `mips_expand_conditional_trap' as well. > > > > > > > > > > I agree. The

[PATCH v2 8/8] libgomp: Map omp_default_mem_space to USM

2024-06-28 Thread Andrew Stubbs
When unified shared memory is required, the default memory space should also be unified. libgomp/ChangeLog: * config/linux/allocator.c (linux_memspace_alloc): Check omp_requires_mask. (linux_memspace_calloc): Likewise. (linux_memspace_free): Likewise.

[PATCH v2 6/8] amdgcn: libgomp plugin USM implementation

2024-06-28 Thread Andrew Stubbs
d for avoiding page migrations (in general). This implementation reuses the "usmpin" allocator (introduced in my previous patch-set to optimize pinned memory allocation) to solve these issues. Firstly, all USM memory is allocated from specially memmap'd pages to ensure that as few pages as po

[PATCH v2 7/8] openmp, libgomp: Handle unified shared memory in omp_target_is_accessible

2024-06-28 Thread Andrew Stubbs
From: Marcel Vollweiler This patch handles Unified Shared Memory (USM) in the OpenMP runtime routine omp_target_is_accessible. libgomp/ChangeLog: * target.c (omp_target_is_accessible): Handle unified shared memory. * testsuite/libgomp.c-c++-common/target-is-accessible-1.c

[PATCH v2 5/8] amdgcn, openmp: Auto-detect USM mode and set HSA_XNACK

2024-06-28 Thread Andrew Stubbs
From: Andrew Stubbs The AMD GCN runtime must be set to the correct mode for Unified Shared Memory to work, but this is not always clear at compile and link time due to the split nature of the offload compilation pipeline. This patch sets a new attribute on OpenMP offload functions to ensure

[PATCH v2 4/8] openmp: Use libgomp memory allocation functions with unified shared memory.

2024-06-28 Thread Andrew Stubbs
From: Hafiz Abid Qadeer This patches changes calls to malloc/free/calloc/realloc and operator new to memory allocation functions in libgomp with allocator=ompx_unified_shared_mem_alloc. This helps existing code to benefit from the unified shared memory, and is necessary to implement "requires

[PATCH v2 2/8] openmp, nvptx: ompx_gnu_unified_shared_mem_alloc

2024-06-28 Thread Andrew Stubbs
From: Andrew Stubbs This adds support for using Cuda Managed Memory with omp_alloc. It will be used as the underpinnings for "requires unified_shared_memory" in a later patch. There are two new predefined allocators, ompx_gnu_unified_shared_mem_alloc and ompx_gnu_host_mem_a

[PATCH v2 3/8] openmp: Enable -foffload-memory=unified

2024-06-28 Thread Andrew Stubbs
From: Andrew Stubbs Ensure that "requires unified_shared_memory" plays nicely with the -foffload-memory options, and that enabling the option has the same effect as enabling USM in the code. Also adds some testcases. gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_target): Add

[PATCH v2 0/8] OpenMP: Unified Shared Memory via Managed Memory

2024-06-28 Thread Andrew Stubbs
These patched are an evolution of the USM portion of the patches previously posted in July 2022 (yes, it's taken a while!) https://patchwork.sourceware.org/project/gcc/list/?series=10748=%2A=both The pinned memory portion was already posted (and partially approved already) and must be applied

[PATCH v2 1/8] libgomp: Disentangle shared memory from managed

2024-06-28 Thread Andrew Stubbs
that I will implement in later patches. There may be a temporary regression in USM support. This patch disables the basic stop-gap shared memory so we can introduce fast Unified Shared Memory using the managed memory APIs in the next patches. If a device has integrated memory then the patch attempts

Re: [PATCH] Hard register asm constraint

2024-06-28 Thread Georg-Johann Lay
Am 27.06.24 um 10:51 schrieb Stefan Schulze Frielinghaus: On Thu, Jun 27, 2024 at 09:45:32AM +0200, Georg-Johann Lay wrote: Am 24.05.24 um 11:13 Am 25.06.24 um 16:03 schrieb Paul Koning: On Jun 24, 2024, at 1:50 AM, Stefan Schulze Frielinghaus wrote: On Mon, Jun 10, 2024 at 07:19:19AM +0200,

[PATCH] c++: Fix ICE locating 'this' for (not matching) template member function [PR115364]

2024-06-28 Thread Simon Martin
ECL in its hands to find the location of the bad argument. It might however have a TEMPLATE_DECL if there's a single candidate that cannot be instantiated, like here. This patch simply defaults to using the FNDECL's location in this case, which fixes this PR. Successfully tested on x86_64-pc

[PATCH] Remove unused hybrid_* operators in range-ops.

2024-06-28 Thread Aldy Hernandez
Now that the dust has settled on the prange work, we can remove the hybrid operators. I will push this once tests complete. gcc/ChangeLog: * range-op-ptr.cc (class hybrid_and_operator): Remove. (class hybrid_or_operator): Same. (class hybrid_min_operator): Same.

Re: [Patch, Fortran] 2/3 Refactor locations where _vptr is (re)set.

2024-06-28 Thread Andre Vehreschild
Hi Paul, thanks for the review. I have removed the commented assert and committed as gcc-15-1704-gaa3599a10ca What about your pr59104 patch? It is living happily in my dev-branch and making no problems. Thanks again and regards, Andre On Thu, 27 Jun 2024 07:29:40 +0100 Paul Richard

Re: [PATCH 2/3] libstdc++: Optimize __uninitialized_default using memset

2024-06-28 Thread Jonathan Wakely
On Fri, 28 Jun 2024 at 07:53, Maciej Cencora wrote: > > But constexpr-ness of bit_cast has additional limitations and e.g. providing > an union as _Tp would be a hard-error. So we have two options: > - before bitcasting check if type can be bitcast-ed at compile-time, > - change the 'if

Re: [PATCH] Fix native_encode_vector_part for itype when TYPE_PRECISION (itype) == BITS_PER_UNIT

2024-06-28 Thread Richard Biener
> Am 28.06.2024 um 10:27 schrieb Richard Sandiford : > > Richard Biener writes: >>> On Fri, Jun 28, 2024 at 8:01 AM Richard Biener >>> wrote: >>> >>> On Fri, Jun 28, 2024 at 3:15 AM liuhongt wrote: for the testcase in the PR115406, here is part of the dump. char

Re: [PATCH] Fix native_encode_vector_part for itype when TYPE_PRECISION (itype) == BITS_PER_UNIT

2024-06-28 Thread Richard Sandiford
Richard Biener writes: > On Fri, Jun 28, 2024 at 8:01 AM Richard Biener > wrote: >> >> On Fri, Jun 28, 2024 at 3:15 AM liuhongt wrote: >> > >> > for the testcase in the PR115406, here is part of the dump. >> > >> > char D.4882; >> > vector(1) _1; >> > vector(1) signed char _2; >> >

[PATCH] MIPS/testsuite: Fix umips-save-restore-1.c

2024-06-28 Thread YunQiang Su
With some recent optimization, -O1/-O2/-O3 can archive almost same performace/size by stack load/store. Thus lwm/swm will save/store less callee-saved register. In fact only $16 is saved with swm. To be sure that this optimization does exist, let's add 2 more function calls. So that lwm/swm

Re: [RFC PATCH] cse: Add another CSE pass after split1

2024-06-28 Thread Oleg Endo
Hi, On Thu, 2024-06-27 at 14:56 -0700, Palmer Dabbelt wrote: > This is really more of a question than a patch. > > Looking at PR/115687 I managed to convince myself there's a general > class of problems here: splitting might produce constant subexpressions, > but as far as I c

Re: [PATCH] aarch64: Remove RNG and MTE from -mcpu=neoverse-v2

2024-06-28 Thread Kyrylo Tkachov
gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>; Richard Earnshaw mailto:richard.earns...@arm.com>>; Richard Sandiford mailto:richard.sandif...@arm.com>> Subject: Re: [PATCH] aarch64: Remove RNG and MTE from -mcpu=neoverse-v2 Hi Tamar, Thanks for going through the docs h

Re: [PATCH] jit: Ensure ssize_t is defined.

2024-06-28 Thread FX Coudert
rm broken currently, and the patch is definitely an improvement there. > That said, maybe we're fine with this but then I walk back and say > just unconditionally include sys/types.h ... It is included unconditionally in other headers, yes. > Should be Davids say as he added this API. Agr

<    1   2   3   4   5   6   7   8   9   10   >