https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101169
--- Comment #6 from Kewen Lin ---
PR111850 reminded me this bug, the sub-optimal issue described in #comment 4
has been fixed on latest trunk, I think it's r14-4664-g04c9cf5c786b94.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111753
--- Comment #3 from Haochen Jiang ---
It seems like caused by I changed the behavior when trying to use x/ymm16+ w/o
avx512vl specified.
Working on a solution for that.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111872
Bug ID: 111872
Summary: GCC rejects out of class definition of inner private
class template
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100532
--- Comment #8 from Andrew Pinski ---
Maybe the simple fix:
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 6e044b4afbc..8f8562936dc 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -3367,7 +3367,7 @@ convert_argument
libstdc++: [_Hashtable] Do not reuse untrusted cached hash code
On merge reuse merged node cached hash code only if we are on the same
type of
hash and this hash is stateless. Usage of function pointers or
std::function as
hash functor will prevent this optimization.
libstdc++-v3/ChangeLog
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110986
--- Comment #23 from Andrew Pinski ---
Final patch posted:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633517.html
The Canonicalization between the 2 forms or doing it in isel will wait until
next I think.
After r14-3110-g7fb65f10285, the canonical form for
`a ? ~b : b` changed to be `-(a) ^ b` that means
for aarch64 we need to add a few new insn patterns
to be able to catch this and change it to be
what is the canonical form for the aarch64 backend.
A secondary pattern was needed to support a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111860
--- Comment #10 from Andrew Pinski ---
Just an FYI, I do get a similar ICE on:
libgomp/testsuite/libgomp.fortran/simd3.f90
Testcase on aarch64-linux-gnu now too.
Maybe since it was in the libgomp testsuite you missed it when you tested your
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104822
--- Comment #4 from Andrew Pinski ---
Created attachment 56147
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56147=edit
Patch which I am testing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111791
--- Comment #7 from JuzheZhong ---
I don't think this is popcount vectorization issue.
This code should not be vectorized. It's true this code won' be vectorized if
we
use default COST model.
So this is not an issue.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111871
Bug ID: 111871
Summary: invoking gm2 with -pipe and -v does not work
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720
--- Comment #15 from JuzheZhong ---
After investigation:
I found it seems to be an issue to variable-length vector:
https://godbolt.org/z/6Wrjz9ofE
void fn (char * restrict out, int x)
{
[local count: 1073741824]:
MEM[(int8x16_t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111739
Andrew Pinski changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111738
Andrew Pinski changed:
What|Removed |Added
Resolution|--- |INVALID
Status|UNCONFIRMED
This patch is a prelimianry patch to add the full 1,024 bit dense math register
(DMRs) for -mcpu=future. The MMA 512-bit accumulators map onto the top of the
DMR register.
This patch only adds the new 1,024 bit register support. It does not add
support for any instructions that need 1,024 bit
This patch changes the assembler instruction names for MMA instructions from
the original name used in power10 to the new name when used with the dense math
system. I.e. xvf64gerpp becomes dmxvf64gerpp. The assembler will emit the
same bits for either spelling.
The patches have been tested on
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110733
Andrew Pinski changed:
What|Removed |Added
Status|WAITING |RESOLVED
Resolution|---
This patch changes the MMA instructions to use either FPR registers
(-mcpu=power10) or DMRs (-mcpu=future). In this patch, the existing MMA
instruction names are used.
A macro (__PPC_DMR__) is defined if the MMA instructions use the DMRs.
The patches have been tested on both little and big
The MMA subsystem added the notion of accumulator registers as an optional
feature of ISA 3.1 (power10). In ISA 3.1, these accumulators overlapped with
the traditional floating point registers 0..31, but logically the accumulator
registers were separate from the FPR registers. In ISA 3.1, it was
This patch re-enables generating load and store vector pair instructions when
doing certain memory copy operations when -mcpu=future is used.
During power10 development, it was determined that using store vector pair
instructions were problematical in a few cases, so we disabled generating load
This patch implements support for a potential future PowerPC cpu. Features
added with -mcpu=future, may or may not be added to new PowerPC processors.
This patch adds support for the -mcpu=future option. If you use -mcpu=future,
the macro __ARCH_PWR_FUTURE__ is defined, and the assembler
This patch is very preliminary support for a potential new feature to the
PowerPC that extends the current power10 MMA architecture. This feature may or
may not be present in any specific future PowerPC processor.
In the current MMA subsystem for Power10, there are 8 512-bit accumulator
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111799
Andrew Pinski changed:
What|Removed |Added
Status|UNCONFIRMED |NEW
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110986
Andrew Pinski changed:
What|Removed |Added
Attachment #56134|0 |1
is obsolete|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111791
--- Comment #6 from Andrew Pinski ---
(In reply to Vineet Gupta from comment #5)
> (In reply to Robin Dapp from comment #4)
>
> > Analyzing loop at pr111791.c:8
> > pr111791.c:8:25: note: === analyze_loop_nest ===
> > pr111791.c:8:25: note:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111791
--- Comment #5 from Vineet Gupta ---
(In reply to Robin Dapp from comment #4)
> Analyzing loop at pr111791.c:8
> pr111791.c:8:25: note: === analyze_loop_nest ===
> pr111791.c:8:25: note: === vect_analyze_loop_form ===
> pr111791.c:8:25:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111870
Andrew Pinski changed:
What|Removed |Added
Resolution|--- |INVALID
Status|WAITING
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101364
Andrew Pinski changed:
What|Removed |Added
Status|ASSIGNED|RESOLVED
Target Milestone|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111863
Andrew Pinski changed:
What|Removed |Added
Status|ASSIGNED|RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101285
Andrew Pinski changed:
What|Removed |Added
Status|ASSIGNED|RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101285
Andrew Pinski changed:
What|Removed |Added
Target Milestone|11.5|14.0
--- Comment #8 from Andrew Pinski
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111863
--- Comment #11 from CVS Commits ---
The trunk branch has been updated by Andrew Pinski :
https://gcc.gnu.org/g:b20dbddcc41120144e700c4e3ef1ec396b1c56ab
commit r14-4729-gb20dbddcc41120144e700c4e3ef1ec396b1c56ab
Author: Andrew Pinski
Date:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101364
--- Comment #5 from CVS Commits ---
The trunk branch has been updated by Andrew Pinski :
https://gcc.gnu.org/g:879c91fcccf93681bd7e13290bfbb384cadcd268
commit r14-4728-g879c91fcccf93681bd7e13290bfbb384cadcd268
Author: Andrew Pinski
Date:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101285
--- Comment #7 from CVS Commits ---
The trunk branch has been updated by Andrew Pinski :
https://gcc.gnu.org/g:11e6bcedb41359c69ee790f38b04033d236336a8
commit r14-4727-g11e6bcedb41359c69ee790f38b04033d236336a8
Author: Andrew Pinski
Date:
I had a thinko in r14-1600-ge60593f3881c72a96a3fa4844d73e8a2cd14f670
where we would remove the `& CST` part if we ended up not calling
expand_single_bit_test.
This fixes the problem by introducing a new variable that will be used
for calling expand_single_bit_test.
As afar as I know this can only
Hello-
The PR points out that my fix for PR53431 was incomplete and did not handle
-Wunknown-pragmas. This is a one-line fix to correct that, is it OK for
trunk and for GCC 13 backport please? bootstrap + regtest all languages on
x86-64 Linux. Thanks!
-Lewis
-- >8 --
As noted on the PR, commit
Victor Do Nascimento writes:
> Add a build-time test to check whether system register data, as
> imported from `aarch64-sys-reg.def' has any duplicate entries.
>
> Duplicate entries are defined as any two SYSREG entries in the .def
> file which share the same encoding values (as specified by its
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111601
--- Comment #6 from Peter Bergner ---
(In reply to Andrew Pinski from comment #5)
> (In reply to Peter Bergner from comment #4)
> > CCing richi and jakub to see if they've seen anything like this before?
>
> I suspect we are miscompiling the
Victor Do Nascimento writes:
> In implementing the ACLE read/write system register builtins it was
> observed that leaving argument type checking to be done at expand-time
> meant that poorly-formed function calls were being "fixed" by certain
> optimization passes, meaning bad code wasn't being
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103398
Mark Esler changed:
What|Removed |Added
CC||mark.esler at canonical dot com
---
> On Oct 5, 2023, at 4:08 PM, Siddhesh Poyarekar wrote:
>
> On 2023-08-25 11:24, Qing Zhao wrote:
>> This is the 3rd version of the patch, per our discussion based on the
>> review comments for the 1st and 2nd version, the major changes in this
>> version are:
>
> Hi Qing,
>
> I hope the
Victor Do Nascimento writes:
> Motivated by the need to print system register names in output
> assembly, this patch adds the required logic to
> `aarch64_print_operand' to accept rtxs of type CONST_STRING and
> process these accordingly.
>
> Consequently, an rtx such as:
>
> (set (reg/i:DI 0
Victor Do Nascimento writes:
> This patch defines the structure of a new .def file used for
> representing the aarch64 system registers, what information it should
> hold and the basic framework in GCC to process this file.
>
> Entries in the aarch64-system-regs.def file should be as follows:
>
>
May I please ping this one, and/or, is it something straightforward
enough I can just commit it as obvious? Thanks!
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631814.html
-Lewis
On Mon, Oct 2, 2023 at 6:23 PM Lewis Hyatt wrote:
>
> Hello-
>
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111601
--- Comment #5 from Andrew Pinski ---
(In reply to Peter Bergner from comment #4)
> CCing richi and jakub to see if they've seen anything like this before?
I suspect we are miscompiling the final compiler somehow. I linked 2 other
reports
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111870
--- Comment #9 from Andrew Pinski ---
If I change your testcase to be:
uint64_t huh2 (_Atomic(uint64_t)* map, int t) {
return atomic_fetch_or_explicit(map, t, memory_order_relaxed);
}
You will see that it does the `lock cmpxchg` loop too.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111870
--- Comment #8 from Andrew Pinski ---
On aarch64, ldset does both a load and ior. that is unlike the `lock or` on
x86.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111870
--- Comment #7 from Andrew Pinski ---
That is not using the fetch part is optimized to just `lock or`.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111870
--- Comment #6 from Andrew Pinski ---
If you don't use the return value of atomic_fetch_or_explicit, there is no need
for a compare-and-exchange (swap) loop. If you need the fetch part, the
compare-and-exchange loop needs to be used as `lock
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111601
Peter Bergner changed:
What|Removed |Added
CC||jakub at gcc dot gnu.org,
Hi, Sid,
Thanks a lot for the detailed comments.
See my responds embedded below.
Qing
> On Oct 5, 2023, at 4:01 PM, Siddhesh Poyarekar wrote:
>
>
>
> On 2023-08-25 11:24, Qing Zhao wrote:
>> Use the counted_by atribute info in builtin object size to compute the
>> subobject size for
Victor Do Nascimento writes:
> Implement the aarch64 intrinsics for reading and writing system
> registers with the following signatures:
>
> uint32_t __arm_rsr(const char *special_register);
> uint64_t __arm_rsr64(const char *special_register);
> void* __arm_rsrp(const char
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111870
--- Comment #5 from isoosqa ---
Please, forgive me. I typed stuff wrong in original link
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111870
--- Comment #4 from isoosqa ---
Oops, I sent wrong code. This is the one https://godbolt.org/z/GxdvMdP76
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111870
--- Comment #3 from Andrew Pinski ---
Or maybe the issue is you don't understand the cmpxchg instruction and how it
gives back the original value too.
The RTL form for the "lock;cmpxchg " is:
(insn:TI 14 10 17 5 (parallel [
(set
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111870
Andrew Pinski changed:
What|Removed |Added
Ever confirmed|0 |1
Last reconfirmed|
> Could you by the way add this mention this PR:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111791
> Add the test of this PR ?
Commented in that PR. This patch does not help there.
Regards
Robin
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111870
--- Comment #1 from Andrew Pinski ---
Created attachment 56145
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56145=edit
testcase
Next time please enter attach or place inline the testcase rather than just a
link.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111870
Bug ID: 111870
Summary: Miscompile of atomic rmw or on x86 (not aarch, though)
Product: gcc
Version: 13.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Marek,
Sorry for the late comment (I was just back from a long vacation immediate
after Cauldron).
One question:
Is the option “-fhandened” for production build or for development build?
If it’s for development build, then adding -ftrivial-auto-var-init=pattern is
reasonable since the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111857
Andrew Pinski changed:
What|Removed |Added
Severity|normal |enhancement
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111869
Andrew Pinski changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111860
Andrew Pinski changed:
What|Removed |Added
CC||shaohua.li at inf dot ethz.ch
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111860
Andrew Pinski changed:
What|Removed |Added
Status|NEW |ASSIGNED
--- Comment #8 from Andrew
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111791
--- Comment #4 from Robin Dapp ---
This is a scalar popcount and as Kito already noted we will just emit
cpop a0, a0
once the zbb extension is present.
As to the question what is actually being vectorized here, I'm not so sure :D
It looks
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111869
Bug ID: 111869
Summary: ICE: verify_ssa failed since r14-4710-g60c231cb658
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111601
Peter Bergner changed:
What|Removed |Added
Ever confirmed|0 |1
Status|UNCONFIRMED
> On Oct 6, 2023, at 4:01 PM, Martin Uecker wrote:
>
> Am Freitag, dem 06.10.2023 um 06:50 -0400 schrieb Siddhesh Poyarekar:
>> On 2023-10-06 01:11, Martin Uecker wrote:
>>> Am Donnerstag, dem 05.10.2023 um 15:35 -0700 schrieb Kees Cook:
On Thu, Oct 05, 2023 at 04:08:52PM -0400, Siddhesh
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111851
kargl at gcc dot gnu.org changed:
What|Removed |Added
CC||kargl at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110551
Roger Sayle changed:
What|Removed |Added
CC||roger at nextmovesoftware dot
com
---
Generally looks really good. Some comments below.
Victor Do Nascimento writes:
> Given the implementation of a mechanism of encoding system registers
> into GCC, this patch provides the mechanism of validating their use by
> the compiler. In particular, this involves:
>
> 1. Ensuring a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111648
prathamesh3492 at gcc dot gnu.org changed:
What|Removed |Added
Status|ASSIGNED|RESOLVED
On Wed, 18 Oct 2023 at 23:22, Richard Sandiford
wrote:
>
> Prathamesh Kulkarni writes:
> > On Tue, 17 Oct 2023 at 02:40, Richard Sandiford
> > wrote:
> >> Prathamesh Kulkarni writes:
> >> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> >> > index 4f8561509ff..55a6a68c16c 100644
> >> >
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111648
--- Comment #5 from CVS Commits ---
The master branch has been updated by Prathamesh Kulkarni
:
https://gcc.gnu.org/g:3ec8ecb8e92faec889bc6f7aeac9ff59e82b4f7f
commit r14-4726-g3ec8ecb8e92faec889bc6f7aeac9ff59e82b4f7f
Author: Prathamesh
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110500
--- Comment #3 from Andrew Pinski ---
*** Bug 111862 has been marked as a duplicate of this bug. ***
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111862
Andrew Pinski changed:
What|Removed |Added
Resolution|--- |DUPLICATE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96347
Iain Buclaw changed:
What|Removed |Added
Resolution|--- |FIXED
Status|UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111866
Andrew Pinski changed:
What|Removed |Added
Last reconfirmed||2023-10-18
Keywords|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61192
--- Comment #7 from Andrew Pinski ---
Created attachment 56144
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56144=edit
Another testcase
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61192
Andrew Pinski changed:
What|Removed |Added
CC||141242068 at smail dot
nju.edu.cn
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111865
Andrew Pinski changed:
What|Removed |Added
Resolution|--- |DUPLICATE
Status|NEW
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111865
Andrew Pinski changed:
What|Removed |Added
Status|UNCONFIRMED |NEW
Last reconfirmed|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111865
--- Comment #2 from Andrew Pinski ---
(In reply to Andrew Pinski from comment #1)
> Created attachment 56143 [details]
> testcase that could go into the testsuite with more targets supported
Add:
```
#elif defined __aarch64__
# define ASM
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111865
--- Comment #1 from Andrew Pinski ---
Created attachment 56143
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56143=edit
testcase that could go into the testsuite with more targets supported
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111865
Andrew Pinski changed:
What|Removed |Added
Target Milestone|--- |11.5
Summary|GCC: 14:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111867
--- Comment #4 from Iain Sandoe ---
(In reply to Andrew Pinski from comment #3)
> (In reply to Andrew Pinski from comment #2)
> > Maybe something like:
> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > index
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111868
Tamar Christina changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111860
Tamar Christina changed:
What|Removed |Added
CC||seurer at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111867
--- Comment #3 from Andrew Pinski ---
(In reply to Andrew Pinski from comment #2)
> Maybe something like:
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 62b1ae0652f..db2dde84329 100644
> ---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111868
Bug ID: 111868
Summary: [14 regression] many ICEs after r14-4710
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111867
--- Comment #2 from Andrew Pinski ---
Maybe something like:
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 62b1ae0652f..db2dde84329 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111867
Andrew Pinski changed:
What|Removed |Added
Status|UNCONFIRMED |NEW
Alex Coplan writes:
> This patch generalises the TFmode load/store pair patterns to TImode and
> TDmode. This brings them in line with the DXmode patterns, and uses the
> same technique with separate mode iterators (TX and TX2) to allow for
> distinct modes in each arm of the load/store pair.
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111867
Bug ID: 111867
Summary: aarch64: Wrong code for bf16 literal load when the
arch support +fp16
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Alex Coplan writes:
> The test is trying to check that we don't use q-register stores with
> -mstrict-align, so actually check specifically for that.
>
> This is a prerequisite to avoid regressing:
>
> scan-assembler-not "add\tx0, x0, :"
>
> with the upcoming ldp fusion pass, as we change where
On 10/18/23 13:28, waffl3x wrote:
I will try to get something done today, but I was struggling with
writing some of the tests, there's also a lot more of them now. I also
wrote a bunch of musings in comments that I would like feedback on.
My most concrete question is, how exactly should I be
Alex Coplan writes:
> With the new ldp/stp pass enabled, there is a change in the codegen for
> this test as follows:
>
> add x8, sp, 16
> ptrue p3.h, mul3
> str p3, [x8]
> - str x8, [sp, 8]
> - str x9, [sp]
> + stp x9, x8, [sp]
>
Alex Coplan writes:
> The test is looking for individual stores which are able to be merged
> into stp instructions. The test currently passes -fno-schedule-fusion
> -fno-peephole2, presumably to prevent these stores from being turned
> into stps, but this is no longer sufficient with the new
Alex Coplan writes:
> Currently, rtl_ssa::change_insns requires all new uses and defs to be
> specified explicitly. This turns out to be rather inconvenient for
> forming load pairs in the new aarch64 load pair pass, as the pass has to
> determine which mem def the final load pair consumes, and
Alex Coplan writes:
> This is needed by the upcoming aarch64 load pair pass, as it can
> re-order stores (when alias analysis determines this is safe) and thus
> change which mem def a given use consumes (in the RTL-SSA view, there is
> no alias disambiguation of memory).
>
>
1 - 100 of 279 matches
Mail list logo