Re: [RFA configure parts] aarch64: Make cc1 handle --with options

2022-08-05 Thread Richard Sandiford via Gcc-patches
Richard Earnshaw writes: > On 13/06/2022 15:33, Richard Sandiford via Gcc-patches wrote: >> On aarch64, --with-arch, --with-cpu and --with-tune only have an >> effect on the driver, so “./xgcc -B./ -O3” can give significantly >> different results from “./cc1 -O3”. --with-ar

Re: [PATCH] middle-end: Allow backend to expand/split double word compare to 0/-1.

2022-08-05 Thread Richard Sandiford via Gcc-patches
"Roger Sayle" writes: > This patch to the middle-end's RTL expansion reorders the code in > emit_store_flag_1 so that the backend has more control over how best > to expand/split double word equality/inequality comparisons against > zero or minus one. With the current implementation, the

Re: Missed lowering to ld1rq from svld1rq for memory operand

2022-08-05 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni writes: > Hi Richard, > Following from off-list discussion, in the attached patch, I wrote pattern > similar to vec_duplicate_reg, which seems to work for the svld1rq tests. > Does it look OK ? > > Sorry, I didn't fully understand your suggestion on integrating with >

Re: [RFC: PATCH] Extend vectorizer to handle nonlinear induction for neg, mul/lshift/rshift with a constant.

2022-08-04 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: >> +/* Create vector init for vectorized iv. */ >> +static tree >> +vect_create_nonlinear_iv_init (gimple_seq* stmts, tree init_expr, >> + tree step_expr, poly_uint64 nunits, >> + tree vectype, >> +

Re: [PATCH] lower-subreg, expr: Mitigate inefficiencies derived from "(clobber (reg X))" followed by "(set (subreg (reg X)) (...))"

2022-08-04 Thread Richard Sandiford via Gcc-patches
Takayuki 'January June' Suwa writes: > Thanks for your response. > > On 2022/08/03 16:52, Richard Sandiford wrote: >> Takayuki 'January June' Suwa via Gcc-patches >> writes: >>> Emitting "(clobber (reg X))" before "(set (subreg (reg X)) (...))" keeps >>> data flow consistent, but it also

Re: [PATCH] lower-subreg, expr: Mitigate inefficiencies derived from "(clobber (reg X))" followed by "(set (subreg (reg X)) (...))"

2022-08-04 Thread Richard Sandiford via Gcc-patches
Jeff Law via Gcc-patches writes: > On 8/3/2022 1:52 AM, Richard Sandiford via Gcc-patches wrote: >> Takayuki 'January June' Suwa via Gcc-patches >> writes: >>> Emitting "(clobber (reg X))" before "(set (subreg (reg X)) (...))" keeps >>>

Re: [PATCH] RFC: Extend SLP permutation optimisations

2022-08-04 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Tue, 2 Aug 2022, Richard Sandiford wrote: > >> Currently SLP tries to force permute operations "down" the graph >> from loads in the hope of reducing the total number of permutes >> needed or (in the best case) removing the need for the permutes >> entirely. This

Re: [09/23] Add a cut-down version of std::span (array_slice)

2022-08-03 Thread Richard Sandiford via Gcc-patches
Martin Jambor writes: > Hi Richard, > > On Fri, Nov 13 2020, Richard Sandiford via Gcc-patches wrote: >> A later patch wants to be able to pass around subarray views of an >> existing array. The standard class to do that is std::span, but it's >> a C++20 thing. This

Re: [PATCH] lower-subreg, expr: Mitigate inefficiencies derived from "(clobber (reg X))" followed by "(set (subreg (reg X)) (...))"

2022-08-03 Thread Richard Sandiford via Gcc-patches
Takayuki 'January June' Suwa via Gcc-patches writes: > Emitting "(clobber (reg X))" before "(set (subreg (reg X)) (...))" keeps > data flow consistent, but it also increases register allocation pressure > and thus often creates many unwanted register-to-register moves that > cannot be optimized

Re: [PATCH take #2] Some additional zero-extension related optimizations in simplify-rtx.

2022-08-02 Thread Richard Sandiford via Gcc-patches
Richard Sandiford via Gcc-patches writes: > "Roger Sayle" writes: >> Many thanks to Segher and Richard for pointing out that my removal >> of optimizations of ABS(ABS(x)) and ABS(FFS(x)) in the original version >> of this patch was incorrect, and my assumption t

Re: [PATCH take #2] Some additional zero-extension related optimizations in simplify-rtx.

2022-08-02 Thread Richard Sandiford via Gcc-patches
"Roger Sayle" writes: > Many thanks to Segher and Richard for pointing out that my removal > of optimizations of ABS(ABS(x)) and ABS(FFS(x)) in the original version > of this patch was incorrect, and my assumption that these would be > subsumed by val_signbit_known_clear_p was mistaken. That

Re: try_finally_expr in must_not_throw_expr

2022-08-02 Thread Richard Sandiford via Gcc
Philipp Rimmele via Gcc writes: > Hi, > > i'm developing a GCC-Plugin. And i don't understand why there is a > "try_finally_expr" in a must_not_throw-Area in my AST. It happens in the > destructors. > Here is my AST: > function_decl Exception::__dt_base > 1:

Re: [PATCH] Some additional zero-extension related optimizations in simplify-rtx.

2022-08-02 Thread Richard Sandiford via Gcc-patches
"Roger Sayle" writes: > This patch implements some additional zero-extension and sign-extension > related optimizations in simplify-rtx.cc. The original motivation comes > from PR rtl-optimization/71775, where in comment #2 Andrew Pinski sees: > > Failed to match this instruction: > (set (reg:DI

Re: [PATCH] Teach VN about masked/len stores

2022-08-02 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > The following teaches VN to handle reads from .MASK_STORE and > .LEN_STORE. For this push_partial_def is extended first for > convenience so we don't have to handle the full def case in the > caller (possibly other paths can be simplified then). Also > the partial

Ping^3: [RFA configure parts] aarch64: Make cc1 handle --with options

2022-08-02 Thread Richard Sandiford via Gcc-patches
Ping^3 for the configure bits. Richard Sandiford via Gcc-patches writes: > On aarch64, --with-arch, --with-cpu and --with-tune only have an > effect on the driver, so “./xgcc -B./ -O3” can give significantly > different results from “./cc1 -O3”. --with-arch did have a limited > eff

[PATCH] RFC: Extend SLP permutation optimisations

2022-08-02 Thread Richard Sandiford via Gcc-patches
Currently SLP tries to force permute operations "down" the graph from loads in the hope of reducing the total number of permutes needed or (in the best case) removing the need for the permutes entirely. This patch tries to extend it as follows: - Allow loads to take a different permutation from

Re: [PATCH] Add new target hook: simplify_modecc_const.

2022-07-28 Thread Richard Sandiford via Gcc-patches
Seems this thread has become a bit heated, so I'll try to proceed with caution :-) In the below, I'll use "X-mode const_int" to mean "a const_int that is known from context to represent an X-mode value". Of course, the const_int itself always stores VOIDmode. "Roger Sayle" writes: > Hi Segher,

Re: PING: [PATCH] libsanitizer: Cherry-pick 2bfb0fcb51510f22723c8cdfefe from upstream

2022-07-27 Thread Richard Sandiford via Gcc-patches
Dimitrije Milosevic writes: >> Do you know someone very familiar with MIPS and GCC and capable as a >> port maintainer? An active MIPS port maintainer will make the situation >> better. > Sadly, no. I agree it would make things easier. Yeah, I agree that's what we need. I stepped down from

Re: [PATCH 1/1] Fix bit-position comparison

2022-07-27 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Wed, 27 Jul 2022, juzhe.zh...@rivai.ai wrote: > >> From: zhongjuzhe >> >> gcc/ChangeLog: >> >> * expr.cc (expand_assignment): Change GET_MODE_PRECISION to >> GET_MODE_BITSIZE >> >> --- >> gcc/expr.cc | 2 +- >> 1 file changed, 1 insertion(+), 1

[PATCH] graphds: Fix description of SCC algorithm

2022-07-22 Thread Richard Sandiford via Gcc-patches
graphds_scc says that it uses Tarjan's algorithm, but it looks like it uses Kosaraju's algorithm instead (dfs one way followed by dfs the other way). OK to install? Richard gcc/ * graphds.cc (graphds_scc): Fix algorithm attribution. --- gcc/graphds.cc | 2 +- 1 file changed, 1

Re: [PATCH v2.1 3/4] aarch64: Consolidate simd type lookup functions

2022-07-20 Thread Richard Sandiford via Gcc-patches
Andrew Carlotti writes: > On Wed, Jul 13, 2022 at 05:36:04PM +0100, Richard Sandiford wrote: >> I like the part about getting rid of: >> >> static tree >> aarch64_simd_builtin_type (machine_mode mode, >> bool unsigned_p, bool poly_p) >> >> and the flow of the new

Re: [PATCH] aarch64: Replace manual swapping idiom with std::swap in aarch64.cc

2022-07-18 Thread Richard Sandiford via Gcc-patches
Richard Ball writes: > Replace manual swapping idiom with std::swap in aarch64.cc > > gcc/config/aarch64/aarch64.cc has a few manual swapping idioms of the form: > > x = in0, in0 = in1, in1 = x; > > The preferred way is using the standard: > > std::swap (in0, in1); > > We should just fix these to

Re: ICE after folding svld1rq to vec_perm_expr duing forwprop

2022-07-14 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni > wrote: >> >> On Wed, 13 Jul 2022 at 12:22, Richard Biener >> wrote: >> > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches >> > wrote: >> > > >> > > Hi Richard, >> > > For the following test:

Re: [aarch64] Use op_mode instead of vmode for op0, op1 in aarch64_vectorize_vec_perm_const

2022-07-14 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni writes: > Hi, > For following test case: > > svint32_t foo() > { > int32x4_t v = (int32x4_t) { 1, 2, 3, 4 }; > svint32_t v2 = svld1rq_s32 (svptrue_b8(), [0]); > return v2; > } > > After applying workaround in forwprop to not simplify VEC_PERM_EXPR in >

Re: [PATCH v2 4/4] aarch64: Move vreinterpret definitions into the compiler

2022-07-13 Thread Richard Sandiford via Gcc-patches
Andrew Carlotti writes: > This removes a significant number of intrinsic definitions from the arm_neon.h > header file, and reduces the amount of code duplication. The new macros and > data structures are intended to also facilitate moving other intrinsic > definitions out of the header file in

Re: [PATCH v2 3/4] aarch64: Consolidate simd type lookup functions

2022-07-13 Thread Richard Sandiford via Gcc-patches
Andrew Carlotti writes: > There were several similarly-named functions, which each built or looked up a > type using a different subset of valid modes or qualifiers. > > This change combines these all into a single function, which can additionally > handle const and pointer qualifiers. I like

Re: [PATCH v2 2/4] aarch64: Remove qualifier_internal

2022-07-13 Thread Richard Sandiford via Gcc-patches
Andrew Carlotti writes: > This has been unused since 2014, so there's no reason to retain it. > > gcc/ChangeLog: > > * config/aarch64/aarch64-builtins.cc > (enum aarch64_type_qualifiers): Remove qualifier_internal. > (aarch64_init_simd_builtin_functions): Remove

Re: [PATCH v2 1/4] aarch64: Add V1DI mode

2022-07-13 Thread Richard Sandiford via Gcc-patches
Andrew Carlotti writes: > We already have a V1DF mode, so this makes the vector modes more consistent. > > Additionally, this allows us to recognise uint64x1_t and int64x1_t types given > only the mode and type qualifiers (e.g. in aarch64_lookup_simd_builtin_type). > > gcc/ChangeLog: > > *

Re: [PATCH v2 2/2] aarch64: Lower vcombine to GIMPLE

2022-07-13 Thread Richard Sandiford via Gcc-patches
Andrew Carlotti writes: > This lowers vcombine intrinsics to a GIMPLE vector constructor, which enables > better optimisation during GIMPLE passes. > > gcc/ > > * config/aarch64/aarch64-builtins.c > (aarch64_general_gimple_fold_builtin): Add combine. > > gcc/testsuite/ > > *

[PATCH] arm: Replace arm_builtin_vectorized_function [PR106253]

2022-07-13 Thread Richard Sandiford via Gcc-patches
This patch extends the fix for PR106253 to AArch32. As with AArch64, we were using ACLE intrinsics to vectorise scalar built-ins, even though the two sometimes have different ECF_* flags. (That in turn is because the ACLE intrinsics should follow the instruction semantics as closely as possible,

Re: [PATCH v2 1/2] aarch64: Don't return invalid GIMPLE assign statements

2022-07-13 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches writes: > On Tue, Jul 12, 2022 at 4:38 PM Andrew Carlotti > wrote: >> >> aarch64_general_gimple_fold_builtin doesn't check whether the LHS of a >> function call is null before converting it to an assign statement. To avoid >> returning an invalid GIMPLE statement

Ping^2: [RFA configure parts] aarch64: Make cc1 handle --with options

2022-07-12 Thread Richard Sandiford via Gcc-patches
Ping^2 for the configure bits. Richard Sandiford via Gcc-patches writes: > On aarch64, --with-arch, --with-cpu and --with-tune only have an > effect on the driver, so “./xgcc -B./ -O3” can give significantly > different results from “./cc1 -O3”. --with-arch did have a limited > eff

[PATCH] Add internal functions for iround etc. [PR106253]

2022-07-12 Thread Richard Sandiford via Gcc-patches
The PR is about the aarch64 port using an ACLE built-in function to vectorise a scalar function call, even though the ECF_* flags for the ACLE function didn't match the ECF_* flags for the scalar call. To some extent that kind of difference is inevitable, since the ACLE intrinsics are supposed to

[PATCH] aarch64: Remove redundant builtins code

2022-07-12 Thread Richard Sandiford via Gcc-patches
aarch64_builtin_vectorized_function handles some built-in functions that already have equivalent internal functions. This seems to be redundant now, since the target builtins that it chooses are mapped to the same optab patterns as the internal functions. Tested on aarch64-linux-gnu & pushed.

[committed] vect: Restore optab_vector argument [PR106250]

2022-07-11 Thread Richard Sandiford via Gcc-patches
In g:76c3041b856cb0 I'd removed a "C ? optab_vector : optab_mixed_sign" argument from a call to directly_supported_p, thinking that the argument only existed because of the condition (which I was removing). But the difference between the scalar and vector forms matters for shifts, so we do still

Re: [PATCH] Move reload_completed and other rtl.h globals to crtl structure.

2022-07-11 Thread Richard Sandiford via Gcc-patches
I know it'll seem like make-work, but could you put the combine flag in a separate follow-on patch? Reorganising the existing flags (very welcome!) and adding new ones seem like different things. TBH I'm a bit suspicious of the combine flag. What fundamental property holds true after combine

Re: [PATCH]middle-end Use subregs to expand COMPLEX_EXPR to set the lowpart.

2022-07-05 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: >> > so that the multiple_p test is skipped if the structure is undefined. >> >> Actually, we should probably skip the constant_multiple_p test as well. >> Keeping it would only be meaningful for little-endian. >> >> simplify_gen_subreg should alread do the necessary

Re: [PATCH] Maintain LC SSA when doing SVE vectorization

2022-07-05 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > The final loop IV use after the loop has that not in LC SSA > (and inserts not simplified _2 = _3 - 0 stmts). In particular > since it splits the exit edge when there's a virtual PHI in the > destination it breaks virtual LC SSA form (but likely also > non-virtual). > >

Re: [PATCH] aarch64: Move vreinterpret definitions into the compiler

2022-07-05 Thread Richard Sandiford via Gcc-patches
Sorry for the slow review. Andrew Carlotti via Gcc-patches writes: > Hi, > > This removes a significant number of intrinsic definitions from the arm_neon.h > header file, and reduces the amount of code duplication. The new macros and > data structures are intended to also facilitate moving other

Re: [PATCH 1/2]AArch64 Add fallback case using sdot for usdot

2022-07-05 Thread Richard Sandiford via Gcc-patches
t;> > > > -Original Message- >> >> > > > From: Richard Sandiford >> >> > > > Sent: Thursday, June 16, 2022 7:54 PM >> >> > > > To: Tamar Christina >> >> > > > Cc: gc

Re: [PATCH] Mips: Resolve build issues for the n32 ABI

2022-07-04 Thread Richard Sandiford via Gcc-patches
Xi Ruoyao via Gcc-patches writes: > On Fri, 2022-07-01 at 12:40 +, Dimitrije Milosevic wrote: >> Building the ASAN for the n32 MIPS ABI currently fails, due to a few reasons: >> - defined(__mips64), which is set solely based on the architecture type >> (32-bit/64-bit), >> was still used in

Re: [RFC] trailing_wide_ints with runtime variable lengths

2022-07-01 Thread Richard Sandiford via Gcc-patches
Aldy Hernandez via Gcc-patches writes: > Currently global ranges are stored in SSA_NAME_RANGE_INFO as a pair of > wide_int-like objects along with the nonzero bits. We frequently lose > precision when streaming out our higher resolution iranges. The plan > was always to store the full irange

Re: [PATCH 2/2] Revert maybe_ne -> known_ne change in vn_reference_lookup_3

2022-07-01 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches writes: > This reverts the change as discussed. Thanks! > Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. > > 2022-07-01 Richard Biener > > * tree-ssa-sccvn.cc (vn_reference_lookup_3): Revert > back to using maybe_ne (off, -1). > --- >

Re: [PATCH][AArch64] Implement ACLE Data Intrinsics

2022-07-01 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)" writes: > On 29/06/2022 08:18, Richard Sandiford wrote: >>> + break; >>> +case AARCH64_RBIT: >>> +case AARCH64_RBITL: >>> +case AARCH64_RBITLL: >>> + if (mode == SImode) >>> + icode = CODE_FOR_aarch64_rbitsi; >>> + else >>> + icode =

Re: [PATCH] tree-optimization/106131 - wrong code with FRE rewriting

2022-07-01 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Fri, 1 Jul 2022, Richard Sandiford wrote: > >> Richard Biener via Gcc-patches writes: >> > The following makes sure to not use the original TBAA type for >> > looking up a value across an aggregate copy when we had to offset >> > the read. >> > >> > Bootstrapped and

Re: [PATCH] tree-optimization/106131 - wrong code with FRE rewriting

2022-07-01 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches writes: > The following makes sure to not use the original TBAA type for > looking up a value across an aggregate copy when we had to offset > the read. > > Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed to trunk. > > 2022-06-30 Richard Biener > >

Re: [PATCH] wide-int: Fix up wi::shifted_mask [PR106144]

2022-07-01 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek writes: > Hi! > > As the following self-test testcase shows, wi::shifted_mask sometimes > doesn't create canonicalized wide_ints, which then fail to compare equal > to canonicalized wide_ints with the same value. > In particular, wi::mask (128, false, 128) gives { -1 } with len 1

Re: [PATCH 1/2]AArch64 Add fallback case using sdot for usdot

2022-06-29 Thread Richard Sandiford via Gcc-patches
; nd ; Richard Earnshaw >> > > > ; Marcus Shawcroft >> > > > ; Kyrylo Tkachov >> > >> > > > Subject: Re: [PATCH 1/2]AArch64 Add fallback case using sdot for >> > > > usdot >> > > > >> > > > Richard Sandif

Re: [PATCH][AArch64] Implement ACLE Data Intrinsics

2022-06-29 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)" writes: > On 17/06/2022 11:54, Richard Sandiford wrote: >> "Andre Vieira (lists)" writes: >>> Hi, >>> >>> This patch adds support for the ACLE Data Intrinsics to the AArch64 port. >>> >>> Bootstrapped and regression tested on aarch64-none-linux. >>> >>> OK for trunk? >>

Re: [PATCH] RFC: Optimise SLP permutes of non-consecutive loads

2022-06-24 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Thu, 23 Jun 2022, Richard Sandiford wrote: >> In a reduction pair like: >> >> typedef float T; >> >> void >> f1 (T *x) >> { >> T res1 = 0; >> T res2 = 0; >> for (int i = 0; i < 100; ++i) >> { >> res1 += x[i * 2]; >> res2 += x[i * 2

[PATCH] RFC: Optimise SLP permutes of non-consecutive loads

2022-06-23 Thread Richard Sandiford via Gcc-patches
In a reduction pair like: typedef float T; void f1 (T *x) { T res1 = 0; T res2 = 0; for (int i = 0; i < 100; ++i) { res1 += x[i * 2]; res2 += x[i * 2 + 1]; } x[0] = res1; x[1] = res2; } it isn't easy to predict whether the initial

[PATCH] testsuite: Compile slsr-39.c without vectorisation

2022-06-23 Thread Richard Sandiford via Gcc-patches
The fix for PR106019 regressed slsr-39.c for -m32 -march=cascadelake because we are now able to vectorise the code. (Whether the code model should be allowing that is a different question -- the vectorised code looked worse to me.) The test runs at -O2 and predates vectorisation being enabled at

Re: [PATCH] aarch64: testsuite: symbol-range compile only

2022-06-22 Thread Richard Sandiford via Gcc-patches
Alexandre Oliva writes: > On Jun 21, 2022, Richard Sandiford wrote: > >> Could we instead have a new target selector for whether the memory >> map includes xGB of RAM? > > How about this? Testing on aarch64-rtems6.0. Ok to install? > > > aarch64: testsuite: symbol-range fallback to compile > >

[PATCH] data-ref: Improve non-loop disambiguation [PR106019]

2022-06-21 Thread Richard Sandiford via Gcc-patches
When dr_may_alias_p is called without a loop context, it tries to use the tree-affine interface to calculate the difference between the two addresses and use that difference to check whether the gap between the accesses is known at compile time. However, as the example in the PR shows, this

Re: [PATCH] aarch64: testsuite: symbol-range compile only

2022-06-21 Thread Richard Sandiford via Gcc-patches
Alexandre Oliva writes: > On some of our embedded aarch64 targets, RAM size is too small for > this test to fit. It doesn't look like this test requires linking, > and if it does, the -tiny version may presumably get most of the > coverage without going overboard in target system requirements.

Re: [PATCH]middle-end Add optimized float addsub without needing VEC_PERM_EXPR.

2022-06-20 Thread Richard Sandiford via Gcc-patches
Tamar Christina via Gcc-patches writes: >> -Original Message- >> From: Richard Biener >> Sent: Monday, June 20, 2022 12:56 PM >> To: Tamar Christina >> Cc: Andrew Pinski via Gcc-patches ; nd >> >> Subject: RE: [PATCH]middle-end Add optimized float addsub without >> needing

Re: [PATCH 1/2]middle-end: Simplify subtract where both arguments are being bitwise inverted.

2022-06-20 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches writes: > On Thu, Jun 16, 2022 at 1:10 PM Tamar Christina via Gcc-patches > wrote: >> >> Hi All, >> >> This adds a match.pd rule that drops the bitwwise nots when both arguments >> to a >> subtract is inverted. i.e. for: >> >> float g(float a, float b) >> { >>

Ping: [RFA configure parts] aarch64: Make cc1 handle --with options

2022-06-20 Thread Richard Sandiford via Gcc-patches
Ping for the configure bits Richard Sandiford via Gcc-patches writes: > On aarch64, --with-arch, --with-cpu and --with-tune only have an > effect on the driver, so “./xgcc -B./ -O3” can give significantly > different results from “./cc1 -O3”. --with-arch did have a limited > eff

Re: [PATCH]middle-end Use subregs to expand COMPLEX_EXPR to set the lowpart.

2022-06-20 Thread Richard Sandiford via Gcc-patches
Richard Sandiford via Gcc-patches writes: > Tamar Christina writes: >>> -Original Message- >>> From: Richard Sandiford >>> Sent: Monday, June 13, 2022 9:41 AM >>> To: Tamar Christina >>> Cc: gcc-patches@gcc.gnu.org; nd ; rguent...@suse.d

Re: [PATCH]middle-end Use subregs to expand COMPLEX_EXPR to set the lowpart.

2022-06-17 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: >> -Original Message- >> From: Richard Sandiford >> Sent: Monday, June 13, 2022 9:41 AM >> To: Tamar Christina >> Cc: gcc-patches@gcc.gnu.org; nd ; rguent...@suse.de >> Subject: Re: [PATCH]middle-end Use subregs to expand COMPLEX_EXPR to >> set the lowpart. >>

Re: [PATCH][AArch64] Implement ACLE Data Intrinsics

2022-06-17 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)" writes: > Hi, > > This patch adds support for the ACLE Data Intrinsics to the AArch64 port. > > Bootstrapped and regression tested on aarch64-none-linux. > > OK for trunk? Sorry for the slow review. > > gcc/ChangeLog: > > 2022-06-10  Andre Vieira  > >     *

Re: [PATCH 1/2]AArch64 Add fallback case using sdot for usdot

2022-06-16 Thread Richard Sandiford via Gcc-patches
Richard Sandiford via Gcc-patches writes: > Tamar Christina writes: >> Hi All, >> >> The usdot operation is common in video encoder and decoders including some of >> the most widely used ones. >> >> This patch adds a +dotprod version of the optab as

Re: [PATCH 1/2]AArch64 Add fallback case using sdot for usdot

2022-06-16 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: > Hi All, > > The usdot operation is common in video encoder and decoders including some of > the most widely used ones. > > This patch adds a +dotprod version of the optab as a fallback for when you do > have sdot but not usdot available. > > The fallback works by adding

[pushed] Revert recent internal-fn changes [PR105975]

2022-06-15 Thread Richard Sandiford via Gcc-patches
The recent internal-fn “clean-ups” triggered problems on nvptx because some of the omp_simt_* patterns had modeless operands. I wondered about adapting expand_fn_using_insn to cope with that, but then the problem becomes: what should the mode of operand 0 be when there is no lhs? The answer

Re: [PATCH]middle-end Use subregs to expand COMPLEX_EXPR to set the lowpart.

2022-06-15 Thread Richard Sandiford via Gcc-patches
Jeff Law via Gcc-patches writes: > On 6/13/2022 5:54 AM, Richard Biener wrote: >> On Sun, Jun 12, 2022 at 7:27 PM Jeff Law via Gcc-patches >> wrote: >> [...] >>> On a related topic, any thoughts on keeping complex objects as complex >>> types/modes through gimple and into at least parts of the

[pushed] aarch64: Revert bogus fix for PR105254

2022-06-15 Thread Richard Sandiford via Gcc-patches
In f2ebf2d98efe0ac2314b58cf474f44cb8ebd5244 I'd forced the chosen unroll factor to be a factor of the VF, in order to work around an exact_div ICE in PR105254. This was completely bogus -- clearly I didn't look in enough detail at why we ended up with an unrolled VF that wasn't a multiple of the

[pushed] gen: Allow unspec numbers in .md attributes

2022-06-15 Thread Richard Sandiford via Gcc-patches
Tamar pointed out that: (unspec:M ... ) didn't work when a value of attribute FOO was defined by define_constant, such as in: (define_int_attribute FOO [(UNSPEC_A "UNSPEC_B") ...]) This is because symbolic constants are substituted during lexing and only apply to bare symbol names, not

Re: [PATCH 1/2]middle-end Support optimized division by pow2 bitmask

2022-06-14 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Mon, 13 Jun 2022, Tamar Christina wrote: > >> > -Original Message- >> > From: Richard Biener >> > Sent: Monday, June 13, 2022 12:48 PM >> > To: Tamar Christina >> > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford >> > >> > Subject: RE: [PATCH

[RFA configure parts] aarch64: Make cc1 handle --with options

2022-06-13 Thread Richard Sandiford via Gcc-patches
On aarch64, --with-arch, --with-cpu and --with-tune only have an effect on the driver, so “./xgcc -B./ -O3” can give significantly different results from “./cc1 -O3”. --with-arch did have a limited effect on ./cc1 in previous releases, although it didn't work entirely correctly. Being of a lazy

Re: [PATCH]middle-end Use subregs to expand COMPLEX_EXPR to set the lowpart.

2022-06-13 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: > Hi All, > > When lowering COMPLEX_EXPR we currently emit two VEC_EXTRACTs. One for the > lowpart and one for the highpart. > > The problem with this is that in RTL the lvalue of the RTX is the only thing > tying the two instructions together. > > This means that e.g.

Re: [PATCH]AArch64 relax predicate on load structure load instructions

2022-06-13 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Wed, 8 Jun 2022, Richard Sandiford wrote: >> Tamar Christina writes: >> >> -Original Message- >> >> From: Richard Sandiford >> >> Sent: Wednesday, June 8, 2022 11:31 AM >> >> To: Tamar Christina >> >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw >> >>

[PATCH 2/2] Add a general mapping from internal fns to target insns

2022-06-10 Thread Richard Sandiford via Gcc-patches
Several existing internal functions map directly to an instruction defined in target-insns.def. This patch makes it easier to define more such functions in future. This should help to reduce cut-&-paste, but more importantly, it allows the difference between optab functions and target-insns.def

[PATCH 1/2] Factor out common internal-fn idiom

2022-06-10 Thread Richard Sandiford via Gcc-patches
internal-fn.c has quite a few functions that simply map the result of the call to an instruction's output operand (if any) and map each argument to an instruction's input operand, in order. This patch adds a single function for doing that. It's really just a generalisation of

Re: [PATCH] aarch64: Lower vcombine to GIMPLE

2022-06-10 Thread Richard Sandiford via Gcc-patches
Andrew Carlotti via Gcc-patches writes: > Hi all, > > This lowers vcombine intrinsics to a GIMPLE vector constructor, which enables > better optimisation during GIMPLE passes. > > Bootstrapped and tested on aarch64-none-linux-gnu, and tested for > aarch64_be-none-linux-gnu via

Re: [committed] openmp: Add support for HBW or large capacity or interleaved memory through the libmemkind.so library

2022-06-09 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek via Gcc-patches writes: > Hi! > > This patch adds support for dlopening libmemkind.so on Linux and uses it > for some kinds of allocations (but not yet e.g. pinned memory). > > Bootstrapped/regtested on x86_64-linux and i686-linux (with libmemkind > around) and compile tested with

Re: [PATCH]AArch64 relax predicate on load structure load instructions

2022-06-09 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: >> -Original Message- >> From: Richard Sandiford >> Sent: Wednesday, June 8, 2022 3:36 PM >> To: Tamar Christina >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw >> ; Marcus Shawcroft >> ; Kyrylo Tkachov >> ; rguent...@suse.de; ro...@eyesopen.com >> Subject:

Re: [PATCH]AArch64 relax predicate on load structure load instructions

2022-06-08 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: >> -Original Message- >> From: Richard Sandiford >> Sent: Wednesday, June 8, 2022 11:31 AM >> To: Tamar Christina >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw >> ; Marcus Shawcroft >> ; Kyrylo Tkachov >> Subject: Re: [PATCH]AArch64 relax predicate on

Re: aarch64: Fix bitfield alignment in param passing [PR105549]

2022-06-08 Thread Richard Sandiford via Gcc-patches
Christophe Lyon writes: > On 6/7/22 19:44, Richard Sandiford wrote: >> Christophe Lyon via Gcc-patches writes: >>> While working on enabling DFP for AArch64, I noticed new failures in >>> gcc.dg/compat/struct-layout-1.exp (t028) which were not actually >>> caused by DFP types handling. These

Re: [PATCH]AArch64 relax predicate on load structure load instructions

2022-06-08 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: > Hi All, > > At some point in time we started lowering the ld1r instructions in gimple. > > That is: > > uint8x8_t f1(const uint8_t *in) { > return vld1_dup_u8([1]); > } > > generates at gimple: > > _3 = MEM[(const uint8_t *)in_1(D) + 1B]; > _4 = {_3, _3, _3, _3,

Re: aarch64: Fix bitfield alignment in param passing [PR105549]

2022-06-07 Thread Richard Sandiford via Gcc-patches
Christophe Lyon via Gcc-patches writes: > While working on enabling DFP for AArch64, I noticed new failures in > gcc.dg/compat/struct-layout-1.exp (t028) which were not actually > caused by DFP types handling. These tests are generated during 'make > check' and enabling DFP made generation

Re: [1/2] PR96463 - aarch64 specific changes

2022-06-07 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni writes: > On Mon, 6 Jun 2022 at 16:29, Richard Sandiford > wrote: >> >> Prathamesh Kulkarni writes: >> >> > { >> >> >/* The pattern matching functions above are written to look for a >> >> > small >> >> > number to begin the sequence (0, 1, N/2). If we begin

Re: [ping][vect-patterns] Refactor widen_plus/widen_minus as internal_fns

2022-06-07 Thread Richard Sandiford via Gcc-patches
Joel Hutton writes: >> > Patches attached. They already incorporated the .cc rename, now >> > rebased to be after the change to tree.h >> >> @@ -1412,8 +1412,7 @@ vect_recog_widen_op_pattern (vec_info *vinfo, >>2, oprnd, half_type, unprom, vectype); >> >>tree var =

Re: [1/2] PR96463 - aarch64 specific changes

2022-06-06 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni writes: >> > { >> >/* The pattern matching functions above are written to look for a small >> > number to begin the sequence (0, 1, N/2). If we begin with an index >> > @@ -24084,6 +24112,12 @@ aarch64_expand_vec_perm_const_1 (struct >> > expand_vec_perm_d *d) >>

Re: [PATCH] Update document for VECTOR_MODES_WITH_PREFIX

2022-06-06 Thread Richard Sandiford via Gcc-patches
"Kewen.Lin" writes: > Hi, > > r10-3912 updated the format of VECTOR_MODES_WITH_PREFIX by > adding one more parameter ORDER, the related document is out > of date. So update the document for ORDER. > > Is it ok for trunk? > > BR, > Kewen > - > > gcc/ChangeLog: > > * machmode.def

Re: [PATCH] Simplify vec_unpack of uniform_vector_p constructors in match.pd.

2022-06-06 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Sat, May 21, 2022 at 5:31 PM Roger Sayle > wrote: >> This patch simplifies vec_unpack_hi_expr/vec_unpack_lo_expr of a uniform >> constructor or vec_duplicate operand. The motivation is from PR 105621 >> where after optimization, we're left with: >> >> vect_cst__21

Re: [1/2] PR96463 - aarch64 specific changes

2022-06-01 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni writes: > On Thu, 12 May 2022 at 16:15, Richard Sandiford > wrote: >> >> Prathamesh Kulkarni writes: >> > On Wed, 11 May 2022 at 12:44, Richard Sandiford >> > wrote: >> >> >> >> Prathamesh Kulkarni writes: >> >> > On Fri, 6 May 2022 at 16:00, Richard Sandiford >> >> >

Re: [PATCH v4] DSE: Use the constant store source if possible

2022-06-01 Thread Richard Sandiford via Gcc-patches
"H.J. Lu" writes: > On Mon, May 30, 2022 at 09:35:43AM +0100, Richard Sandiford wrote: >> "H.J. Lu" writes: >> > --- >> > RTL DSE tracks redundant constant stores within a basic block. When RTL >> > loop invariant motion hoists a constant initialization out of the loop >> > into a separate

Re: [PATCH] Add a bit dislike for separate mem alternative when op is REG_P.

2022-05-31 Thread Richard Sandiford via Gcc-patches
Vladimir Makarov via Gcc-patches writes: > On 2022-05-29 23:05, Hongtao Liu wrote: >> On Fri, May 27, 2022 at 5:12 AM Vladimir Makarov via Gcc-patches >> wrote: >>> >>> On 2022-05-24 23:39, liuhongt wrote: Rigt now, mem_cost for separate mem alternative is 1 * frequency which is pretty

Re: [PATCH v3] DSE: Use the constant store source if possible

2022-05-30 Thread Richard Sandiford via Gcc-patches
"H.J. Lu" writes: > On Thu, May 26, 2022 at 04:14:17PM +0100, Richard Sandiford wrote: >> "H.J. Lu" writes: >> > On Wed, May 25, 2022 at 12:30 AM Richard Sandiford >> > wrote: >> >> >> >> "H.J. Lu via Gcc-patches" writes: >> >> > On Mon, May 23, 2022 at 12:38:06PM +0200, Richard Biener wrote:

Re: [PATCH v3] DSE: Use the constant store source if possible

2022-05-30 Thread Richard Sandiford via Gcc-patches
Jeff Law via Gcc-patches writes: > On 5/29/2022 3:43 PM, H.J. Lu wrote: >> On Sat, May 28, 2022 at 11:37 AM Jeff Law via Gcc-patches >> wrote: >>> >>> >>> On 5/26/2022 2:43 PM, H.J. Lu via Gcc-patches wrote: On Thu, May 26, 2022 at 04:14:17PM +0100, Richard Sandiford wrote: > "H.J. Lu"

Re: [0/9] [middle-end] Add param to vec_perm_const hook to specify mode of input operand

2022-05-30 Thread Richard Sandiford via Gcc-patches
(Sorry for the slow reply, was off on Friday) Richard Biener writes: > On Wed, May 25, 2022 at 10:24 PM Prathamesh Kulkarni > wrote: >> >> On Thu, 26 May 2022 at 00:37, Richard Biener >> wrote: > [...] >> > x86 now accepts V4SI V8SI permutes because we don’t ask it correctly and >> > thus my

Re: [PATCH] AArch64: Cleanup option processing code

2022-05-26 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra writes: > Further cleanup option processing. Remove the duplication of global > variables for CPU and tune settings so that CPU option processing is > simplified even further. Move global variables that need save and > restore due to target option processing into aarch64.opt. This

Re: [PATCH v2] DSE: Use the constant store source if possible

2022-05-26 Thread Richard Sandiford via Gcc-patches
"H.J. Lu" writes: > On Wed, May 25, 2022 at 12:30 AM Richard Sandiford > wrote: >> >> "H.J. Lu via Gcc-patches" writes: >> > On Mon, May 23, 2022 at 12:38:06PM +0200, Richard Biener wrote: >> >> On Sat, May 21, 2022 at 5:02 AM H.J. Lu via Gcc-patches >> >> wrote: >> >> > >> >> > When recording

Re: [PATCH] AArch64: Prioritise init_have_lse_atomics constructor [PR 105708]

2022-05-25 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra writes: > Hi Richard, > > I've added a comment - as usual it's just a number. A quick grep in gcc and > glibc showed that priorities 98-101 are used, so I just went a bit below so it > has a higher priority than typical initializations. Thanks. OK for trunk, and for backports

Re: [PATCH v2] DSE: Use the constant store source if possible

2022-05-25 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches writes: > On Tue, May 24, 2022 at 10:11 PM H.J. Lu wrote: >> >> On Mon, May 23, 2022 at 11:42 PM Richard Biener >> wrote: >> > >> > On Mon, May 23, 2022 at 8:34 PM H.J. Lu wrote: >> > > >> > > On Mon, May 23, 2022 at 12:38:06PM +0200, Richard Biener wrote: >> > >

Re: [PATCH] aarch64: Fix pac-ret with unusual dwarf in libgcc unwinder [PR104689]

2022-05-25 Thread Richard Sandiford via Gcc-patches
Szabolcs Nagy writes: > The 05/13/2022 16:35, Richard Sandiford wrote: >> Szabolcs Nagy via Gcc-patches writes: >> > The RA_SIGN_STATE dwarf pseudo-register is normally only set using the >> > DW_CFA_AARCH64_negate_ra_state (== DW_CFA_window_save) operation which >> > toggles the return address

Re: [PATCH v2] DSE: Use the constant store source if possible

2022-05-25 Thread Richard Sandiford via Gcc-patches
"H.J. Lu via Gcc-patches" writes: > On Mon, May 23, 2022 at 12:38:06PM +0200, Richard Biener wrote: >> On Sat, May 21, 2022 at 5:02 AM H.J. Lu via Gcc-patches >> wrote: >> > >> > When recording store for RTL dead store elimination, check if the source >> > register is set only once to a

Re: [PATCH] AArch64: Prioritise init_have_lse_atomics constructor [PR 105708]

2022-05-25 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra writes: > Increase the priority of the init_have_lse_atomics constructor so it runs > before other constructors. This improves chances that rr works when LSE > atomics are supported. Can you add a comment above the function explaining why we chose 90 in particular? I see 100 was

Re: [PATCH] middle-end/105711 - properly handle CONST_INT when expanding bitfields

2022-05-24 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > This is another place where we fail to pass down the mode of a > CONST_INT. > > Bootstrapped and tested on x86_64-unknown-linux-gnu, OK? > > Thanks, > Richard. > > 2022-05-24 Richard Biener > > PR middle-end/105711 > * expmed.cc

Re: [0/9] [middle-end] Add param to vec_perm_const hook to specify mode of input operand

2022-05-24 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni writes: > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi > index c5006afc00d..0a3c733ada9 100644 > --- a/gcc/doc/tm.texi > +++ b/gcc/doc/tm.texi > @@ -6088,14 +6088,18 @@ for the given scalar type @var{type}. > @var{is_packed} is false if the scalar > access using

<    5   6   7   8   9   10   11   12   13   14   >