Move away from the pre-C++11 compatibility macro CONSTEXPR.
Tested on aarch64-linux-gnu & pushed.
Richard
gcc/
* config/aarch64/aarch64-sve-builtins-base.cc: Replace CONSTEXPR
with constexpr throughout.
* config/aarch64/aarch64-sve-builtins-functions.h: Likewise.
Add an aarch64_sve::gimple_folder helper for folding calls
to integer constants. SME will make more use of this.
Tested on aarch64-linux-gnu & pushed.
Richard
gcc/
* config/aarch64/aarch64-sve-builtins.h
(gimple_folder::fold_to_cstu): New member function.
*
Bit of a brown-paper-bag bug, but: GCC was generating
non-existent merging forms of BRKAS and BRKBS. Those
instructions only support zero predication (although
BRKA and BRKB support both).
Tested on aarch64-linux-gnu & pushed to trunk so far. I'll backport
to release branches soon.
Richard
Unlike other flag-setting SVE instructions, BRKNS sets the flags
based on an all-true governing predicate, rather than the GP operand.
Tested on aarch64-linux-gnu & pushed to trunk so far. I'll backport
to release branches soon.
Richard
gcc/
* config/aarch64/iterators.md (SVE_BRKP):
Wilco Dijkstra writes:
> ping
>
>
>
> Hi Richard,
>
Sounds good, but could you put it before the mode version,
to avoid the forward declaration?
>>>
>>> I can swap them around but the forward declaration is still required as
>>> aarch64_check_bitmask is 5000 lines before
Jakub Jelinek writes:
> On Wed, Oct 19, 2022 at 12:54:11PM +0100, Richard Sandiford via Gcc-patches
> wrote:
>> Lewis Hyatt via Gcc-patches writes:
>> > When a GTY'ed struct is streamed to PCH, any plain char* pointers it
>> > contains
>> > (wh
Lewis Hyatt via Gcc-patches writes:
> When a GTY'ed struct is streamed to PCH, any plain char* pointers it contains
> (whether they live in GC-controlled memory or not) will be marked for PCH
> output by the routine gt_pch_note_object in ggc-common.cc. This routine
> special-cases plain char*
Richard Biener writes:
> On Tue, 11 Oct 2022, Richard Sandiford wrote:
>
>> Richard Biener writes:
>> > On Mon, 10 Oct 2022, Andrew Stubbs wrote:
>> >> On 10/10/2022 12:03, Richard Biener wrote:
>> >> > The following picks up the prototype by Ju-Zhe Zhong for vectorizing
>> >> > first order
Florian Weimer via Gcc-patches writes:
> On many architectures, there is a padding gap after the how array
> member, and cfa_how can be moved there. This reduces the size of the
> struct and the amount of memory that uw_frame_state_for has to clear.
>
> There is no measurable performance benefit
Martin Liška writes:
> On 10/13/22 12:03, Richard Sandiford wrote:
>> Martin Liška writes:
>>> I think we should add how Python scripts should be formatted. I noticed
>>> that while reading the Modula-2 patchset where it follows the C/C++ style
>>> when it comes to Python files.
>>>
>>> Ready to
Martin Liška writes:
> I think we should add how Python scripts should be formatted. I noticed
> that while reading the Modula-2 patchset where it follows the C/C++ style
> when it comes to Python files.
>
> Ready to be installed?
> Thanks,
> Martin
Did you consider requiring black formatting
Wilco Dijkstra writes:
> Hi Richard,
>
>> Realise this is awkward, but: CC_NZmode is for operations that set only
>> the N and Z flags to useful values. If we want to take advantage of V
>> being zero then I think we need a different mode.
>>
>> We can't go all the way to CCmode because the
Jakub Jelinek writes:
> On Wed, Oct 12, 2022 at 11:15:40AM +0100, Richard Sandiford wrote:
>> Looks good to me, just some minor comments below.
>
> Here is an updated patch.
>
>> How robust is the mechanism that guarantees HF comes before BF,
>> and so is the mode that appears in the (new) wider
Jakub Jelinek writes:
> On Wed, Oct 05, 2022 at 04:02:25PM -0400, Jason Merrill wrote:
>> > > > @@ -5716,7 +5716,13 @@ emit_store_flag_1 (rtx target, enum rtx_
>> > > >{
>> > > > machine_mode optab_mode = mclass == MODE_CC ? CCmode :
>> > > > compare_mode;
>> > > > icode
Richard Biener writes:
> + /* First-order recurrence autovectorization needs to handle permutation
> + with indices = [nunits-1, nunits, nunits+1, ...]. */
> + vec_perm_builder sel (nunits, 1, 3);
> + for (int i = 0; i < 3; ++i)
> +sel.quick_push (nunits - dist + i);
> +
Richard Biener writes:
> On Mon, 10 Oct 2022, Andrew Stubbs wrote:
>> On 10/10/2022 12:03, Richard Biener wrote:
>> > The following picks up the prototype by Ju-Zhe Zhong for vectorizing
>> > first order recurrences. That solves two TSVC missed optimization PRs.
>> >
>> > There's a new scalar
Wilco Dijkstra writes:
> Hi Richard,
>
>>> Yes, with a more general search loop we can get that case too -
>>> it doesn't trigger much though. The code that checks for this is
>>> now refactored into a new function. Given there are now many
>>> more calls to aarch64_bitmask_imm, I added a
Wilco Dijkstra via Gcc-patches writes:
> Hi Richard,
>
>> Did you consider handling the case where the movks aren't for
>> consecutive bitranges? E.g. the patch handles:
>
>> but it looks like it would be fairly easy to extend it to:
>>
>> 0x12345678
>
> Yes, with a more general search
Philipp Tomsich writes:
> This brings the extensions detected by -mcpu=native on Ampere-1 systems
> in sync with the defaults generated for -mcpu=ampere1.
>
> Note that some early kernel versions on Ampere1 may misreport the
> presence of PAUTH and PREDRES (i.e., -mcpu=native will add 'nopauth'
>
Philipp Tomsich writes:
> Fixes: 341573406b39
>
> Don't subtract one from the result of strnlen() when trying to point
> to the first character after the current string. This issue would
> cause individual characters (where the 128 byte buffers are stitched
> together) to be lost.
>
>
Wilco Dijkstra writes:
> Since AArch64 sets all flags on logical operations, comparisons with zero
> can be combined into an AND even if the condition is LE or GT.
>
> Passes regress, OK for commit?
>
> gcc:
> PR target/105773
> * config/aarch64/aarch64.cc
Wilco Dijkstra writes:
> Improve immediate expansion of immediates which can be created from a
> bitmask immediate and 2 MOVKs. This reduces the number of 4-instruction
> immediates in SPECINT/FP by 10-15%.
>
> Passes regress, OK for commit?
>
> gcc/ChangeLog:
>
> PR target/106583
>
Philipp Tomsich writes:
> This brings the extensions detected by -mcpu=native on Ampere-1 systems
> in sync with the defaults generated for -mcpu=ampere1.
>
> Note that some kernel versions may misreport the presence of PAUTH and
> PREDRES (i.e., -mcpu=native will add 'nopauth' and 'nopredres').
https://github.com/ARM-software/acle/pull/199 adds a new feature
macro for RCPC, for use in things like inline assembly. This patch
adds the associated support to GCC.
Also, RCPC is required for Armv8.3-A and later, but the armv8.3-a
entry didn't include it. This was probably harmless in
Philipp Tomsich writes:
> Fixes: 341573406b39
>
> Don't subtract one from the result of strnlen() when trying to point
> to the first character after the current string. This issue would
> cause individual characters (where the 128 byte buffers are stitched
> together) to be lost.
>
>
Jakub Jelinek writes:
> On Fri, Sep 30, 2022 at 09:54:49AM -0400, Jason Merrill wrote:
>> > Note, there is one further problem on aarch64/arm, types with HFmode
>> > (_Float16 and __fp16) are there mangled as Dh (which is standard
>> > Itanium mangling:
>> > ::= Dh # IEEE 754r
Torbjörn SVENSSON writes:
> Checking that the triplet matches arm*-*-eabi (or msp430-*-*) is not
> enough to know if the execution will enter an endless loop, or if it
> will give a meaningful result. As the execution test only work when
> VMA and LMA are equal, make sure that this condition is
Torbjorn SVENSSON writes:
> Hi,
>
> Ping: https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601831.html
OK, thanks.
Richard
> Kind regards,
> Torbjörn
>
> On 2022-09-19 18:57, Torbjörn SVENSSON wrote:
>> The linker script should not be prefixed with "-Wl," - it's not an
>> input file
Sorry for the slow reply.
Torbjorn SVENSSON writes:
> Hi Richard,
>
> Thanks for your review.
> Comments below.
>
> On 2022-09-23 19:34, Richard Sandiford wrote:
>> Torbjörn SVENSSON via Gcc-patches writes:
>>> Checking that the triplet matches arm*-*-eabi (or msp430-*-*) is not
>>> enough to
Richard Sandiford via Gcc-patches writes:
> Prathamesh Kulkarni writes:
>> Sorry to ask a silly question but in which case shall we select 2nd vector ?
>> For num_poly_int_coeffs == 2,
>> a1 /trunc n1 == (a1 + 0x) / (n1.coeffs[0] + n1.coeffs[1]*x)
>> If a1/trunc n1 s
Prathamesh Kulkarni writes:
> On Tue, 27 Sept 2022 at 01:59, Richard Sandiford
> wrote:
>>
>> Prathamesh Kulkarni writes:
>> > On Fri, 23 Sept 2022 at 21:33, Richard Sandiford
>> > wrote:
>> >>
>> >> Prathamesh Kulkarni writes:
>> >> > On Tue, 20 Sept 2022 at 18:09, Richard Sandiford
>> >> >
Tamar Christina writes:
>> -Original Message-
>> From: Richard Biener
>> Sent: Friday, September 30, 2022 12:53 PM
>> To: Tamar Christina
>> Cc: Richard Sandiford ; Tamar Christina via
>> Gcc-patches ; nd ; Jeff Law
>>
>> Subject: RE: [PATCH 1/2]middle-end: RFC: On expansion of
Jeff Law writes:
> This is another minor improvement to coremark. I suspect this only
> improves code size as the load-immediate was likely issuing with the ret
> statement on multi-issue machines.
>
>
> Basically we're failing to utilize conditional equivalences during the
> post-reload CSE
Richard Biener writes:
> On Mon, Sep 26, 2022 at 8:58 AM Liwei Xu wrote:
>>
>> This patch implemented the optimization in PR 54346, which Merges
>>
>> c = VEC_PERM_EXPR ;
>> d = VEC_PERM_EXPR ;
>> to
>> d = VEC_PERM_EXPR ;
>>
>>
Tamar Christina writes:
>> -Original Message-
>> From: Richard Sandiford
>> Sent: Friday, September 30, 2022 9:29 AM
>> To: Tamar Christina
>> Cc: Richard Biener ; Tamar Christina via Gcc-patches
>> ; nd ; Jeff Law
>>
>> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of
Tamar Christina writes:
>> -Original Message-
>> From: Gcc-patches > bounces+tamar.christina=arm@gcc.gnu.org> On Behalf Of Richard
>> Biener via Gcc-patches
>> Sent: Thursday, September 29, 2022 12:09 PM
>> To: Tamar Christina via Gcc-patches
>> Cc: Richard Sandiford ; nd
>>
Thanks for posting the patch.
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong
>
> gcc/ChangeLog:
>
> * tree-vect-loop.cc (vect_phi_first_order_recurrence_p): New function.
> (vect_analyze_scalar_cycles_1): Classify first-order recurrence phi.
>
dr_may_alias_p rightly used poly_int_tree_p to guard a use of
ranges_maybe_overlap_p, but used the non-poly extractors.
This caused a few failures in the SVE ACLE asm tests.
Tested on aarch64-linux-gnu and x86_64-linux-gnu. Pushed as obvious.
Richard
gcc/
* tree-data-ref.cc
After previous patches, it's possible to remove TARGET_*
options that are redundant due to (IMO) obvious dependencies.
gcc/
* config/aarch64/aarch64.h (TARGET_CRYPTO, TARGET_SHA3, TARGET_SM4)
(TARGET_DOTPROD): Don't depend on TARGET_SIMD.
(TARGET_AES, TARGET_SHA2):
Some of the option structures have all-const member variables.
That doesn't seem necessary: we can just use const on the objects
that are supposed to be read-only.
Also, with the new, more C++-heavy option handling, it seems
better to use constexpr for the static data, to make sure that
we're not
aarch64-common.cc has two arrays, one maintaining the original
definition order and one sorted by population count. Sorting
by population count was a way of ensuring topological ordering,
taking advantage of the fact that the entries are partially
ordered by the subset relation. However, the
-mgeneral-regs-only is effectively "+nofp for the compiler without
changing the assembler's ISA flags". Currently that's implemented
by making TARGET_FLOAT, TARGET_SIMD and TARGET_SVE depend on
!TARGET_GENERAL_REGS_ONLY and then making any feature that needs FP
registers depend (directly or
A previous patch added a aarch64_feature_flags typedef, to abstract
the representation of the feature flags. This patch makes existing
code use the typedef too. Hope I've caught them all!
gcc/
* common/config/aarch64/aarch64-common.cc: Use aarch64_feature_flags
for feature flags
Currently the aarch64-option-extensions.def entries, the
aarch64-cores.def entries, and the AARCH64_FL_FOR_* macros
have a transitive closure of dependencies that is maintained by hand.
This is a bit error-prone and is becoming less tenable as more features
are added. The main point of this patch
After previous changes, it's more convenient if the flags_on and
flags_off fields of all_extensions include the feature flag itself.
gcc/
* common/config/aarch64/aarch64-common.cc (all_extensions):
Include the feature flag in flags_on and flags_off.
aarch64-option-extensions.def requires us to maintain the transitive
closure of options by hand. This patch fixes a few cases where a
flag was missed.
+noaes and +nosha2 now disable +crypto, which IMO makes more
sense and is consistent with the Clang behaviour.
gcc/
*
Just a minor patch to avoid having to construct std::strings
in static data.
gcc/
* common/config/aarch64/aarch64-common.cc (processor_name_to_arch)
(arch_to_arch_name): Use const char * instead of std::string.
---
gcc/common/config/aarch64/aarch64-common.cc | 4 ++--
1 file
aarch64-option-extensions.def was topologically sorted except
for one case: crypto came before its aes and sha2 dependencies.
This patch moves crypto after sha2 instead.
gcc/
* config/aarch64/aarch64-option-extensions.def: Move crypto
after sha2.
gcc/testsuite/
*
The flags fields of the aarch64-cores.def always start with
AARCH64_FL_FOR_. After previous changes, is always
identical to the previous field, so we can drop the explicit
AARCH64_FL_FOR_ and derive it programmatically.
This isn't a big saving in itself, but it helps with later patches.
gcc/
AARCH64_FL_RCPC8_4 is an odd-one-out in that it has no associated
entry in aarch64-option-extensions.def. This means that, although
it is internally separated from AARCH64_FL_V8_4A, there is no
mechanism for turning it on and off individually, independently
of armv8.4-a.
The only place that the
The aarch64-option-extensions.def parsing in config.gcc had
some code left over from when it tried to parse the whole
macro definition. Also, config.gcc now only looks at the
first fields of the aarch64-arches.def entries.
gcc/
* config.gcc: Remove dead aarch64-option-extensions.def
This patch renames AARCH64_FL_FOR_ARCH* macros to follow the
same V names that we (now) use elsewhere.
The names are only temporary -- a later patch will move the
information to the .def file instead. However, it helps with
the sequencing to do this first.
gcc/
*
This patch completes the renaming of architecture-level related
things by adding "V" to the name of the architecture in
aarch64-arches.def. Since the "V" is predictable, we can easily
drop it when we don't need it (as when matching /proc/cpuinfo).
Having a valid C identifier is necessary for
Following on from the previous AARCH64_ISA patch, this one adds the
profile name directly to the end of architecture-level AARCH64_FL_*
macros.
gcc/
* config/aarch64/aarch64.h (AARCH64_FL_V8_1, AARCH64_FL_V8_2)
(AARCH64_FL_V8_3, AARCH64_FL_V8_4, AARCH64_FL_V8_5, AARCH64_FL_V8_6)
All AARCH64_ISA_* architecture-level macros except AARCH64_ISA_V8_R
are for the A profile: they cause __ARM_ARCH_PROFILE to be set to
'A' and they are associated with architecture names like armv8.4-a.
It's convenient for later patches if we make this explicit
by adding an "A" to the name. Also,
This series of patches supposedly cleans up the definition of
the AArch64 ISA features. The main aims are:
- to make the naming more consistent
- to reduce the amount of boilerplate needed
- to avoid the need to maintain transitive closures by hand
- to enforce a sensible (topological) order on
Andrew Stubbs writes:
> On 29/09/2022 10:24, Richard Sandiford wrote:
>> Otherwise:
>>
>>operand0[0] = operand1 < operand2;
>>for (i = 1; i < operand3; i++)
>> operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
>>
>> looks like a "length and mask" operation, which IIUC is
Jeff Law writes:
> On 9/28/22 09:04, Richard Sandiford wrote:
>> Tamar Christina writes:
Maybe the target could use (subreg:SI (reg:BI ...)) as argument. Heh.
>>> But then I'd still need to change the expansion code. I suppose this could
>>> prevent the issue with changes to code on other
Richard Biener via Gcc-patches writes:
> On Wed, Sep 28, 2022 at 5:06 PM Andrew Stubbs wrote:
>>
>> This patch is a prerequisite for some amdgcn patches I'm working on to
>> support shorter vector lengths (having fixed 64 lanes tends to miss
>> optimizations, and masking is not supported
Tamar Christina writes:
>> Maybe the target could use (subreg:SI (reg:BI ...)) as argument. Heh.
>
> But then I'd still need to change the expansion code. I suppose this could
> prevent the issue with changes to code on other targets.
>
>> > > We have undocumented addcc, negcc, etc. patterns,
I have a patch that adds a typedef to aarch64's -opts.h.
The typedef is used for a TargetVariable in the .opt file,
which means that it is covered by PCH and so needs to be
visible to gengtype.
-opts.h is not included directly in tm.h, but indirectly
by target headers (in this case aarch64.h).
Prathamesh Kulkarni writes:
> On Fri, 23 Sept 2022 at 21:33, Richard Sandiford
> wrote:
>>
>> Prathamesh Kulkarni writes:
>> > On Tue, 20 Sept 2022 at 18:09, Richard Sandiford
>> > wrote:
>> >>
>> >> Prathamesh Kulkarni writes:
>> >> > On Mon, 12 Sept 2022 at 19:57, Richard Sandiford
>> >> >
Torbjörn SVENSSON via Gcc-patches writes:
> Checking that the triplet matches arm*-*-eabi (or msp430-*-*) is not
> enough to know if the execution will enter an endless loop, or if it
> will give a meaningful result. As the execution test only work when
> VMA and LMA are equal, make sure that
Torbjörn SVENSSON via Gcc-patches writes:
> In the test cases, it's clearly written that intrinsics is not
> implemented on arm*. A simple xfail does not help since there are
> link error and that would cause an UNRESOLVED testcase rather than
> XFAIL.
> By changing to dg-skip-if, the entire test
Prathamesh Kulkarni writes:
> On Tue, 20 Sept 2022 at 18:09, Richard Sandiford
> wrote:
>>
>> Prathamesh Kulkarni writes:
>> > On Mon, 12 Sept 2022 at 19:57, Richard Sandiford
>> > wrote:
>> >>
>> >> Prathamesh Kulkarni writes:
>> >> >> The VLA encoding encodes the first N patterns
Tamar Christina writes:
> Hi All,
>
> Similar to the 1/2 patch but adds additional back-end specific folding for if
> the register sequence was created as a result of RTL optimizations.
>
> Concretely:
>
> #include
>
> unsigned int foor (uint32x4_t x)
> {
> return x[1] >> 16;
> }
>
>
Tamar Christina writes:
>> -Original Message-
>> From: Richard Sandiford
>> Sent: Friday, September 23, 2022 6:04 AM
>> To: Tamar Christina
>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> ; Marcus Shawcroft
>> ; Kyrylo Tkachov
>> Subject: Re: [PATCH 2/2]AArch64 Add support for
Tamar Christina writes:
>> -Original Message-
>> From: Richard Sandiford
>> Sent: Friday, September 23, 2022 5:30 AM
>> To: Tamar Christina
>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> ; Marcus Shawcroft
>> ; Kyrylo Tkachov
>> Subject: Re: [PATCH 2/2]AArch64 Add support for
Tamar Christina writes:
> Hi All,
>
> This adds additional recognition of & 1 into the tbz/tbnz pattern.
>
> Concretely with the mid-end changes this changes
>
> void g1(bool x)
> {
> if (__builtin_expect (x, 0))
> h ();
> }
>
> from
>
> tst w0, 255
> bne .L7
>
> to
Tamar Christina writes:
> Hi All,
>
> Often times when a check_function_body check fails it can be quite hard to
> figure out why as no additional information is provided.
>
> This changes it so that on failures it prints out the regex expression it's
> using and the text it's comparing against
Tamar Christina writes:
> Hi All,
>
> This adds support for using scalar fneg on the V1DF type.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
Why just this one operation though? Couldn't we extend iterators
like GPF_F16 to include V1DF, avoiding the need
It turns out that GTY(()) markers in definitions like:
GTY(()) tree scalar_types[NUM_VECTOR_TYPES];
are not effective and are silently ignored. The GTY(()) has
to come after an extern or static.
The externs associated with the SVE ACLE GTY variables are in
aarch64-sve-builtins.h. This file
Prathamesh Kulkarni writes:
> On Mon, 12 Sept 2022 at 19:57, Richard Sandiford
> wrote:
>>
>> Prathamesh Kulkarni writes:
>> >> The VLA encoding encodes the first N patterns explicitly. The
>> >> npatterns/nelts_per_pattern values then describe how to extend that
>> >> initial sequence to an
Jojo R via Gcc-patches writes:
> * gcc/genrecog.cc (print_nonbool_test): Fix type error of
> SUBREG_BYTE
We can't do this here. The code has done nothing to prove that the
subreg offset is a compile-time constant.
Thanks,
Richard
> ---
> gcc/genrecog.cc | 1 +
> 1 file changed, 1
Aldy Hernandez via Gcc-patches writes:
> On Thu, Sep 15, 2022 at 9:06 AM Richard Biener
> wrote:
>>
>> On Thu, Sep 15, 2022 at 7:41 AM Aldy Hernandez wrote:
>> >
>> > Hi Richard. Hi all.
>> >
>> > The attatched patch rewrites the NAN and sign handling, dropping both
>> > tristates in favor of
PR106794 shows that I'd forgotten about masked loads when
doing the SLP layout changes. These loads can't currently
be permuted independently of their mask input, so during
construction they never get a load permutation.
(If we did support permuting masked loads in future, the mask
would need to
While writing a testcase for PR106794, I noticed that we failed
to vectorise the testcase in the patch for SVE. The code that
recognises gather loads tries to optimise the point at which
the offset is calculated, to avoid unnecessary extensions or
truncations:
/* Don't include the
Robin Dapp via Gcc-patches writes:
> Hi,
>
> I have been working on making better use of s390's vzero instruction.
> Currently we rather zero a vector register once and load it into other
> registers via vlr instead of emitting multiple vzeros.
>
> At IRA/reload point we e.g. have
>
> (insn 8 5
Richard Biener via Gcc-patches writes:
> All frontends replicate this, so move it.
>
> Bootstrap and regtest running for all languages on
> x86_64-unknown-linux-gnu.
>
> OK if that succeeds?
LGTM FWIW.
Thanks,
Richard
> Thanks,
> Richard.
>
> gcc/
> * tree.cc (build_common_tree_nodes):
This patch fixes various issues around the handling of vectors
and (particularly) vector structures with +nosimd. Previously,
passing and returning structures would trigger an ICE, since:
* we didn't allow the structure modes to be stored in FPRs
* we didn't provide +nosimd move patterns
*
The ls64-related move expanders and splits required TARGET_SIMD.
That isn't necessary, since the 64-byte values are stored entirely
in GPRs. (The associated define_insn was already correct.)
I wondered about moving the patterns to aarch64.md, but it wasn't
clear-cut.
Tested on aarch64-linux-gnu
Yvan Roux writes:
> Hi Richard,
> On Mon, Sep 12, 2022 at 12:56:52PM +0100, Richard Sandiford via Gcc-patches
> wrote:
>> Torbjörn SVENSSON via Gcc-patches writes:
>> > PR/95720
>> > When the status wrapper is used, the gluefile need to be prefixed with
>
"Roger Sayle" writes:
> Hi Richard,
>
>> "Roger Sayle" writes:
>> > This patch addresses PR rtl-optimization/106594, a significant
>> > performance regression affecting aarch64 recently introduced (exposed)
>> > by one of my recent RTL simplification improvements. Firstly many
>> > thanks to
Prathamesh Kulkarni writes:
> On Mon, 5 Sept 2022 at 15:51, Richard Sandiford
> wrote:
>>
>> Sorry for the slow reply. I wrote a response a couple of weeks ago
>> but I think it get lost in a machine outage.
>>
>> Prathamesh Kulkarni writes:
>> > Hi,
>> > The attached prototype patch extends
Torbjörn SVENSSON via Gcc-patches writes:
> PR/95720
> When the status wrapper is used, the gluefile need to be prefixed with
> -Wl, in order for the test cases to have the dump files with the
> expected names.
>
> gcc/testsuite/ChangeLog:
>
> * gcc/testsuite/lib/g++.exp: Moved gluefile
"Roger Sayle" writes:
> This patch addresses PR rtl-optimization/106594, a significant performance
> regression affecting aarch64 recently introduced (exposed) by one of my
> recent RTL simplification improvements. Firstly many thanks to
> Tamar Christina for confirming that the core of this
Fix a stupid typo in my vect_optimize_slp_pass patch.
Tested on aarch64-linux-gnu, pushed as obvious.
Richard
gcc/
PR tree-optimization/106886
* tree-vect-slp.cc (vect_optimize_slp_pass::get_result_with_layout):
Fix copying of scalar stmts.
gcc/testsuite/
PR
Tom de Vries via Gcc-patches writes:
> On 7/12/22 15:42, Tom de Vries wrote:
>> [ dropped gdb-patches, since already applied there. ]
>>
>> On 6/27/22 15:38, Tom de Vries wrote:
>>> On 6/27/22 15:03, Tom de Vries wrote:
Hi,
When building gdb with --enabled-shared, I run into:
Ulrich Drepper via Gcc writes:
> I talked to Jonathan the other day about adding all the C++ library APIs to
> the name hint file now that the size of the table is not really a concern
> anymore.
>
> Jonathan mentioned that he has to create and maintain a similar file for
> the module support.
8-bit and 16-bit FPR moves would ICE for +nosimd+fp, and some other
moves would handle FPR<-zero inefficiently. This is very much a
niche case at the moment, but something like it becomes more
important with SME streaming mode.
The si, di and vector tests already passed, they're just included
+nofp disabled the automatic allocation of FPRs, but it didn't stop
users from explicitly putting register variables in FPRs. We'd then
either report an ICE or generate unsupported instructions.
It's still possible (and deliberately redundant) to specify FPRs in
clobber lists.
Tested on
At one time the aarch64 port registered the Advanced SIMD builtins
lazily, when we first encountered a set of target flags that includes
+simd. These days we always initialise them at start-up, temporarily
forcing a conducive set of flags if necessary.
This patch removes some vestiges of the old
Sorry for the slow reply. I wrote a response a couple of weeks ago
but I think it get lost in a machine outage.
Prathamesh Kulkarni writes:
> Hi,
> The attached prototype patch extends fold_vec_perm to fold VEC_PERM_EXPR
> in VLA manner, and currently handles the following cases:
> (a) fixed
Keith Packard via Gcc-patches writes:
> Picolibc is a C library for embedded systems based on code from newlib
> and avr libc. To connect some system-dependent picolibc functions
> (like stdio) to an underlying platform, the platform may provide an OS
> library.
>
> This OS library must follow
Keith Packard via Gcc-patches writes:
> Picolibc is a C library for embedded systems based on code from newlib
> and avr libc. To connect some system-dependent picolibc functions
> (like stdio) to an underlying platform, the platform may provide an OS
> library.
>
> This OS library must follow
vect_optimize_slp_pass always treats the starting layout as valid,
to avoid having to "optimise" when every possible choice is invalid.
But it gives the starting layout a high cost if it seems like the
target might reject it, in the hope that this will encourage other
(valid) layouts.
The
In the PR we have two REDUC_PLUS SLP instances that share a common
load of stride 4. Each instance also has a unique contiguous load.
Initially all three loads are out of order, so have a nontrivial
load permutation. The layout pass puts them in order instead,
For the two contiguous loads it is
This patch extends the SLP layout optimisation pass so that it
tries to remove layout changes that are brought about by permutes
of existing vectors. This fixes the bb-slp-pr54400.c regression on
x86_64 and also means that we can remove the permutes in cases like:
typedef float v4sf
The tests for sizeless SVE types include checks that the types
are handled for initialisation purposes in the same way as scalars.
GNU C and C2x now allow scalars to be initialised using empty braces,
so this patch updates the SVE tests to match.
Tested on aarch64-linux-gnu & pushed.
Richard
One call to dump_printf_loc had a stray left-over argument
from an earlier version of the patch. This went unnoticed
on aarch64-linux-gnu and x86_64-linux-gnu since the parameters
that actually mattered were passed in FPRs rather than GPRs,
but I assume this is the reason for the i686-linux-gnu
801 - 900 of 2292 matches
Mail list logo