26 Oct 2022 at 21:07, Richard Sandiford
>> > > wrote:
>> > >>
>> > >> Sorry for the slow response. I wanted to find some time to think
>> > >> about this a bit more.
>> > >>
>> > >> Prathamesh Kulkarni writes:
Tamar Christina writes:
>> -Original Message-
>> From: Richard Sandiford
>> Sent: Tuesday, December 6, 2022 10:28 AM
>> To: Tamar Christina
>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> ; Marcus Shawcroft
>> ; Kyrylo Tkachov
>> Subject: Re: [PATCH 5/8]AArch64 aarch64: Make
Tamar Christina writes:
> Hi,
>
>
>> This name might cause confusion with the SVE iterators, where FULL means
>> "every bit of the register is used". How about something like VMOVE
>> instead?
>>
>> With this change, I guess VALL_F16 represents "The set of all modes for
>> which the vld1
Richard Biener via Gcc-patches writes:
> On Thu, Nov 24, 2022 at 8:25 AM HAO CHEN GUI wrote:
>>
>> Hi Richard,
>>
>>
>> 在 2022/11/24 4:06, Richard Biener 写道:
>> > Wouldn't we usually either add an optab or try to recog a canonical
>> > RTL form instead of adding a new target hook for things like
Prathamesh Kulkarni writes:
> On Tue, 6 Dec 2022 at 00:08, Richard Sandiford
> wrote:
>>
>> Prathamesh Kulkarni writes:
>> > Hi,
>> > The following test:
>> >
>> > #include "arm_sve.h"
>> >
>> > svint8_t
>> > test_s8(int8_t *x)
>> > {
>> > return svld1rq_s8 (svptrue_b8 (), [0]);
>> > }
>> >
Wilco Dijkstra writes:
> A recent change only initializes the regs.how[] during Dwarf unwinding
> which resulted in an uninitialized offset used in return address signing
> and random failures during unwinding. The fix is to use REG_SAVED_OFFSET
> as the state where the return address signing
Prathamesh Kulkarni writes:
> Hi,
> The following test:
>
> #include "arm_sve.h"
>
> svint8_t
> test_s8(int8_t *x)
> {
> return svld1rq_s8 (svptrue_b8 (), [0]);
> }
>
> ICE's with -march=armv8.2-a+sve -O1 -fno-tree-ccp -fno-tree-forwprop:
> during GIMPLE pass: fre
> pr107920.c: In function
Tamar Christina writes:
> Hi,
>
> I hadn't received any reply so I had implemented various ways to do this
> (about 8 of them in fact).
>
> The conclusion is that no, we cannot emit one big RTL for the final
> instruction immediately.
> The reason that all comparisons in the AArch64 backend
Wilco Dijkstra writes:
> Hi Richard,
>
>> - scalar_int_mode imode = (mode == HFmode
>> - ? SImode
>> - : int_mode_for_mode (mode).require ());
>> + machine_mode imode = (mode == DFmode) ? DImode : SImode;
>
>> It looks like this
Richard Sandiford via Gcc-patches writes:
> Tamar Christina via Gcc-patches writes:
>>> > +/* Check to see if the supplied comparison in PTEST can be performed as a
>>> > + bit-test-and-branch instead. VAL must contain the original tree
>>> > + e
Tamar Christina via Gcc-patches writes:
>> > +/* Check to see if the supplied comparison in PTEST can be performed as a
>> > + bit-test-and-branch instead. VAL must contain the original tree
>> > + expression of the non-zero operand which will be used to rewrite the
>> > + comparison in
t;> Subject: Re: [PATCH]AArch64 Fix vector re-interpretation between partial
>> SIMD modes
>>
>> Richard Sandiford via Gcc-patches writes:
>> > Tamar Christina writes:
>> >> Hi All,
>> >>
>> >> While writing a patch serie
"Pop, Sebastian" writes:
> Hi,
>
> Currently patchable area is at the wrong place on AArch64. It is placed
> immediately after function label, before .cfi_startproc. This patch
> adds UNSPECV_PATCHABLE_AREA for pseudo patchable area instruction and
> modifies
Richard Sandiford via Gcc-patches writes:
> Prathamesh Kulkarni writes:
>> Hi,
>> For the following test-case:
>>
>> int16x8_t foo(int16_t x, int16_t y)
>> {
>> return (int16x8_t) { x, y, x, y, x, y, x, y };
>> }
>>
>> Code gen at -O3:
Prathamesh Kulkarni writes:
> Hi,
> For the following test-case:
>
> int16x8_t foo(int16_t x, int16_t y)
> {
> return (int16x8_t) { x, y, x, y, x, y, x, y };
> }
>
> Code gen at -O3:
> foo:
> dupv0.8h, w0
> ins v0.h[1], w1
> ins v0.h[3], w1
> ins
Alexandre Oliva via Gcc-patches writes:
> On arm-eabi, and possibly on other platforms, -fshort-enums is enabled
> by default, which breaks some tests' expectations as to enum sizes
> with DEFERRED_INIT. Disable short enums so that the expectations are
> met.
>
> Regstraped on x86_64-linux-gnu,
"Kewen.Lin" writes:
> Hi,
>
> I'm working to find one solution for PR106736, which requires us to
> make some built-in types only valid for some target features, and
> emit error messages for the types when the condition isn't satisfied.
> A straightforward idea is to guard the registry of
Tamar Christina writes:
>> -Original Message-
>> From: Richard Sandiford
>> Sent: Monday, November 14, 2022 9:59 PM
>> To: Tamar Christina
>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> ; Marcus Shawcroft
>> ; Kyrylo Tkachov
>> Subject: Re: [PATCH 2/2]AArch64 Perform more
Alex Coplan via Gcc-patches writes:
> Hi,
>
> This patch fixes a type confusion bug in varasm.cc:assemble_variable.
> The problem is that the current code calls:
>
> sect = get_variable_section (decl, false);
>
> and then accesses sect->named.name without checking whether the section
> is in
"Kewen.Lin" writes:
> Hi Richard,
>
> on 2022/11/24 17:24, Richard Sandiford wrote:
>> "Kewen.Lin" writes:
>>> Hi,
>>>
>>> As the test case in PR107412 shows, we can fold IFN .LEN_{LOAD,
>>> STORE} into normal vector load/store if the given length is known
>>> to be equal to the length of the
Wilco Dijkstra writes:
> Hi Richard,
>
>> Just to make sure I understand: isn't it really just MOVN? I would have
>> expected a 32-bit MOVZ to be equivalent to (and add no capabilities over)
>> a 64-bit MOVZ.
>
> The 32-bit MOVZ immediates are equivalent, MOVN never overlaps, and
> MOVI has some
Richard Earnshaw via Gcc-patches writes:
> This pulls in the toplevel portion of this binutils-gdb commit:
>19e559f1c91bfaedbd2f91d85ee161f3f03fda3c libsframe: add the SFrame library
>
> ChangeLog:
> * Makefile.def: Add libsframe as new module with its dependencies.
> *
Tamar Christina writes:
>> -Original Message-
>> From: Richard Sandiford
>> Sent: Wednesday, November 23, 2022 4:18 PM
>> To: Tamar Christina
>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> ; Marcus Shawcroft
>> ; Kyrylo Tkachov
>> Subject: Re: [PATCH]AArch64 sve2: Fix
Sorry for the very long delay in reviewing this.
Wilco Dijkstra writes:
> Hi Richard,
>
> Here is the immediate cleanup splitoff from the previous patch:
>
> Simplify, refactor and improve various move immediate functions.
> Allow 32-bit MOVZ/N as a valid 64-bit immediate which removes special
>
"Kewen.Lin" writes:
> Hi,
>
> As the test case in PR107412 shows, we can fold IFN .LEN_{LOAD,
> STORE} into normal vector load/store if the given length is known
> to be equal to the length of the whole vector. It would help to
> improve overall cycles as normally the latency of vector access
>
Wilco Dijkstra writes:
> Hi Richard,
>
>>> A smart reassociation pass could form more FMAs while also increasing
>>> parallelism, but the way it currently works always results in fewer FMAs.
>>
>> Yeah, as Richard said, that seems the right long-term fix.
>> It would also avoid the hack of
Tamar Christina writes:
> Hi All,
>
> SVE has an actual division optab, and when using -Os we don't
> optimize the division away. This means that we need to distinguish
> between a div which we can optimize and one we cannot even during
> expansion.
>
> Bootstrapped Regtested on
"Kewen.Lin" writes:
> Hi Richard,
>
> Many thanks for your review comments!
>
on 2022/8/24 16:17, Kewen.Lin via Gcc-patches wrote:
> Hi,
>
> As discussed in PR98125, -fpatchable-function-entry with
> SECTION_LINK_ORDER support doesn't work well on powerpc64
> ELFv1
Wilco Dijkstra writes:
> Hi Richard,
>
>> I guess an obvious question is: if 1 (rather than 2) was the right value
>> for cores with 2 FMA pipes, why is 4 the right value for cores with 4 FMA
>> pipes? It would be good to clarify how, conceptually, the core property
>> should map to the
Tamar Christina writes:
>> -Original Message-
>> From: Richard Sandiford
>> Sent: Tuesday, November 15, 2022 11:34 AM
>> To: Tamar Christina
>> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
>> ; nd ; Marcus Shawcroft
>>
>> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
>>
Sam James via Gcc-patches writes:
>> On 8 Nov 2022, at 07:14, Sam James wrote:
>>
>> 1. This should speed up decompression for folks, as parallel xz
>> creates a different archive which can be decompressed in parallel.
>>
>> Note that this different method is enabled by default in a new
>>
Richard Earnshaw via Gcc-patches writes:
> On 22/11/2022 09:01, Christophe Lyon via Gcc-patches wrote:
>> gcc.target/aarch64/aapcs64/test_dfp_17.c has been failing on
>> big-endian, because the _Decimal32 on-stack argument is not padded in
>> the same direction depending on endianness.
>>
>>
Tamar Christina writes:
>> -Original Message-
>> From: Richard Biener
>> Sent: Tuesday, November 22, 2022 10:59 AM
>> To: Richard Sandiford
>> Cc: Tamar Christina via Gcc-patches ; Tamar
>> Christina ; Richard Biener
>> ; nd
>> Subject: Re: [PATCH 1/8]middle-end: Recognize scalar
Christophe Lyon via Gcc-patches writes:
> gcc.target/aarch64/aapcs64/test_dfp_17.c has been failing on
> big-endian, because the _Decimal32 on-stack argument is not padded in
> the same direction depending on endianness.
>
> This patch fixes the testcase so that it expects the argument in the
>
Tamar Christina via Gcc-patches writes:
>> So it's not easily possible the within current infrastructure. But it does
>> look
>> like ARM might eventually benefit from something like STV on x86?
>>
>
> I'm not sure. The problem with trying to do this in RTL is that you'd have
> to be
> able
Wilco Dijkstra writes:
> Add a reassocation width for FMAs in per-CPU tuning structures. Keep the
> existing setting for cores with 2 FMA pipes, and use 4 for cores with 4
> FMA pipes. This improves SPECFP2017 on Neoverse V1 by ~1.5%.
>
> Passes regress/bootstrap, OK for commit?
>
> gcc/
>
"Kewen.Lin" writes:
> Hi,
>
> Gentle ping: https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600190.html
>
> Any comments are highly appreciated.
>
> BR,
> Kewen
>
> on 2022/9/28 13:41, Kewen.Lin via Gcc-patches wrote:
>> Hi,
>>
>> Gentle ping:
>>
"Andre Vieira (lists)" writes:
> Sorry for the late reply on this. I was wondering though why the check
> made sense. The way I see it, SI -> SI mode is either wrong or useless.
> So why not:
> if it is wrong, error (gcc_assert?) so we know it was generated wrongly
> somehow and fix it;
> if
[I posted this late in stage 4 as an RFC, but it wasn't suitable for
GCC 12 at that point. I kind-of dropped the ball after that, sorry.]
Various parts of the omp code checked whether the size of a decl
was an INTEGER_CST in order to determine whether the decl was
variable-sized or not. If it
Jeff Law via Gcc-patches writes:
> On 11/11/22 09:21, Richard Sandiford via Gcc-patches wrote:
>> Arm's SME adds a new processor mode called streaming mode.
>> This mode enables some new (matrix-oriented) instructions and
>> disables several existing groups of instr
Richard Sandiford via Gcc-patches writes:
> Tamar Christina writes:
>> Hi All,
>>
>> While writing a patch series I started getting incorrect codegen out from
>> VEC_PERM on partial struct types.
>>
>> It turns out that this was happening b
Hongtao Liu writes:
> On Thu, Nov 17, 2022 at 9:59 PM Richard Sandiford
> wrote:
>>
>> Hongtao Liu writes:
>> > On Thu, Nov 17, 2022 at 5:39 PM Richard Sandiford
>> > wrote:
>> >>
>> >> Hongtao Liu writes:
>> >> > On Wed, Nov 16, 2022 at 1:39 AM Richard Sandiford
>> >> > wrote:
>> >> >>
>>
Tamar Christina writes:
> Hi All,
>
> While writing a patch series I started getting incorrect codegen out from
> VEC_PERM on partial struct types.
>
> It turns out that this was happening because the TARGET_CAN_CHANGE_MODE_CLASS
> implementation has a slight bug in it. The hook only checked for
Wilco Dijkstra writes:
> Hi Richard,
>
>> Can you go into more detail about:
>>
>>Use :option:`-mdirect-extern-access` either in shared libraries or in
>>executables, but not in both. Protected symbols used both in a shared
>>library and executable may cause linker errors or fail to
Hongtao Liu writes:
> On Thu, Nov 17, 2022 at 5:39 PM Richard Sandiford
> wrote:
>>
>> Hongtao Liu writes:
>> > On Wed, Nov 16, 2022 at 1:39 AM Richard Sandiford
>> > wrote:
>> >>
>> >> Tamar Christina writes:
>> >> >> -Original Message-
>> >> >> From: Hongtao Liu
>> >> >> Sent:
Hongtao Liu writes:
> On Wed, Nov 16, 2022 at 1:39 AM Richard Sandiford
> wrote:
>>
>> Tamar Christina writes:
>> >> -Original Message-
>> >> From: Hongtao Liu
>> >> Sent: Tuesday, November 15, 2022 9:37 AM
>> >> To: Tamar Christina
>> >> Cc: Richard Sandiford ; Tamar Christina via
>>
Wilco Dijkstra writes:
> Add a new option -mdirect-extern-access similar to other targets. This
> removes
> GOT indirections on external symbols with -fPIE, resulting in significantly
> better code quality. With -fPIC it only affects protected symbols, allowing
> for more efficient shared
"Andre Vieira (lists)" writes:
> On 07/11/2022 11:05, Richard Biener wrote:
>> On Fri, 4 Nov 2022, Andre Vieira (lists) wrote:
>>
>>> Sorry for the delay, just been reminded I still had this patch outstanding
>>> from last stage 1. Hopefully since it has been mostly reviewed it could go
>>> in
"Andre Vieira (lists)" writes:
> Updated version of the patch to account for the testsuite changes in the
> first patch.
>
> On 10/11/2022 11:20, Andre Vieira (lists) via Gcc-patches wrote:
>> Hi,
>>
>> This patch adds support for the widening LDAPR instructions.
>>
>> Bootstrapped and
Tamar Christina writes:
>> -Original Message-
>> From: Hongtao Liu
>> Sent: Tuesday, November 15, 2022 9:37 AM
>> To: Tamar Christina
>> Cc: Richard Sandiford ; Tamar Christina via
>> Gcc-patches ; nd ;
>> rguent...@suse.de
>> Subject: Re: [PATCH 3/8]middle-end: Support extractions of
Tamar Christina writes:
>> -Original Message-
>> From: Richard Sandiford
>> Sent: Tuesday, November 15, 2022 11:59 AM
>> To: Tamar Christina via Gcc-patches
>> Cc: Tamar Christina ; nd ;
>> rguent...@suse.de; j...@ventanamicro.com
>> Subject: Re: [PATCH]middle-end: replace
Tamar Christina via Gcc-patches writes:
> Hi All,
>
> After the fix to the addsub patch yesterday for bootstrap I had only
> regtested on x86.
> While looking today it seemed the new tests were failing, this was caused
> by a change in the behavior of the GET_MODE_WIDER_MODE macro on trunk.
>
>
Tamar Christina writes:
>> -Original Message-
>> From: Richard Sandiford
>> Sent: Tuesday, November 15, 2022 11:15 AM
>> To: Tamar Christina
>> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
>> ; nd ; Marcus Shawcroft
>>
>> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
>>
Tamar Christina writes:
>> -Original Message-
>> From: Richard Sandiford
>> Sent: Tuesday, November 15, 2022 10:51 AM
>> To: Tamar Christina
>> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
>> ; nd ; Marcus Shawcroft
>>
>> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
>>
Tamar Christina writes:
> Hi,
>
>> > --- a/gcc/config/aarch64/aarch64-simd.md
>> > +++ b/gcc/config/aarch64/aarch64-simd.md
>> > @@ -4259,7 +4259,7 @@ (define_insn
>> "*aarch64_get_lane_zero_extend"
>> > ;; Extracting lane zero is split into a simple move when it is
>> > between SIMD ;;
Tamar Christina writes:
>> -Original Message-
>> From: Richard Sandiford
>> Sent: Tuesday, November 15, 2022 10:36 AM
>> To: Tamar Christina
>> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
>> ; nd ; Marcus Shawcroft
>>
>> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
>>
Philipp Tomsich writes:
> Richard,
>
> is this OK for backport to GCC-12 and GCC-11?
The fusion part seems potentially risky for a stable branch, but since
it's conditional on the new flag (and thus new CPU), I think it should
be OK.
So yeah, OK for both, thanks.
Richard
> Thanks,
> Philipp.
Tamar Christina writes:
> Hello,
>
> Ping and updated patch.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64.md (*tb1): Rename to...
> (*tb1): ... this.
> (tbranch4):
(Sorry, immediately following up to myself for a second time recently.)
Richard Sandiford writes:
> Tamar Christina writes:
>>>
>>> The same thing ought to work for smov, so it would be good to do both.
>>> That would also make the split between the original and new patterns more
>>> obvious:
Tamar Christina writes:
>>
>> The same thing ought to work for smov, so it would be good to do both.
>> That would also make the split between the original and new patterns more
>> obvious: left shift for the old pattern, right shift for the new pattern.
>>
>
> Done, though because umov can do
Wilco Dijkstra via Gcc-patches writes:
> Add support for AArch64 LSE and LSE2 to libatomic. Disable outline atomics,
> and use LSE ifuncs for 1-8 byte atomics and LSE2 ifuncs for 16-byte atomics.
> On Neoverse V1, 16-byte atomics are ~4x faster due to avoiding locks.
>
> Note this is safe since
Philipp Tomsich writes:
> This patch adds support for Ampere-1A CPU:
> - recognize the name of the core and provide detection for -mcpu=native,
> - updated extra_costs,
> - adds a new fusion pair for (A+B+1 and A-B-1).
>
> Ampere-1A and Ampere-1 have more timing difference than the extra
>
This patch adds a new interface to function_resolver::resolve_to
in which the mode suffix stays the same (which is the common case).
It then moves the handling of explicit first type suffixes from
function_resolver::resolve_unary to this new function.
This makes things slightly simpler for
A function that has local ZA state cannot be inlined into its caller,
since we only support managing ZA switches at function scope.
A function whose body requires a particular PSTATE.SM setting can only
be inlined into a function body that guarantees that PSTATE.SM setting.
(The callee's function
So far, all intrinsics covered by the aarch64-sve-builtins*
framework have (naturally enough) required at least SVE.
However, arm_sme.h defines a couple of intrinsics that can
be called by any code. It's therefore necessary to make
the implicit SVE requirement explicit.
gcc/
*
In SVE there was a simple rule that unary merging (_m) intrinsics
had a separate initial argument to specify the values of inactive
lanes, whereas other merging functions took inactive lanes from
the first operand to the operation.
That rule began to break down in SVE2, and it continues to do
so
SME has an array called ZA that can be enabled and disabled separately
from streaming mode. A status bit called PSTATE.ZA indicates whether
ZA is currently enabled or not.
In C and C++, the state of PSTATE.ZA is controlled using function
attributes. If a function's type has an arm_shared_za
We only support tail calls between functions with the same PSTATE.ZA
setting ("private-ZA" to "private-ZA" and "shared-ZA" to "shared-ZA").
Only a normal non-streaming function can tail-call another non-streaming
function, and only a streaming function can tail-call another streaming
function.
Until now, SVE intrinsics that map directly to unspecs
have always used type suffix 0 to distinguish between signed
integers, unsigned integers, and floating-point values.
SME adds functions that need to use type suffix 1 instead.
This patch generalises the classes accordingly.
gcc/
*
This patch adds support for the arm_locally_streaming attribute,
which allows a function to use SME internally without changing
the function's ABI. The attribute is valid but redundant for
arm_streaming functions.
gcc/
* config/aarch64/aarch64.cc (aarch64_attribute_table): Add
This patch adds support for switching to the appropriate SME mode
for each call. Switching to streaming mode requires an SMSTART SM
instruction and switching to non-streaming mode requires an SMSTOP SM
instruction. If the call is being made from streaming-compatible code,
these switches are
Some SME instructions use w12-w15 to index ZA. This patch
adds a register class for that range.
gcc/
* config/aarch64/aarch64.h (ZA_INDEX_REGNUM_P): New macro.
(ZA_INDEX_REGS): New register class.
(REG_CLASS_NAMES, REG_CLASS_CONTENTS): Add entries for it.
*
Although TI isn't really a native SVE element mode, it's convenient
for SME if we define VNx1TI anyway, so that it can be used to
distinguish .Q ZA operations from others. It's purely an RTL
convenience and isn't (yet) a valid storage mode.
gcc/
* config/aarch64/aarch64-modes.def: Add
The vast majority of Advanced SIMD instructions are not
available in streaming mode, but some of the load/store/move
instructions are. This patch adds a new target feature macro
called TARGET_BASE_SIMD for this streaming-compatible subset.
The vector-to-vector move instructions are not
This patch adds the +sme ISA feature and requires it to be present
when compiling arm_streaming code. (arm_streaming_compatible code
does not necessarily assume the presence of SME. It just has to
work when SME is present and streaming mode is enabled.)
gcc/
*
This patch adds support for recognising the SME arm_streaming
and arm_streaming_compatible attributes. These attributes
respectively describe whether the processor is definitely in
"streaming mode" (PSTATE.SM==1), whether the processor is
definitely not in streaming mode (PSTATE.SM==0), or
This series adds support for the Armv9-A Scalable Matrix Extension (SME).
Details about the extension are available here:
https://developer.arm.com/documentation/ddi0616/aa/?lang=en
The ABI and ACLE documentation is available on github:
expand_builtin_longjmp and expand_builtin_nonlocal_goto both
emit nonlocal gotos. They first try to use a target-provided
pattern and fall back to generic code otherwise. These pieces
of generic code are almost identical, and having them inline
like this makes it difficult to define a
Richard Sandiford writes:
> Tamar Christina writes:
>> Hi All,
>>
>> This adds an RTL pattern for when two NARROWB instructions are being combined
>> with a PACK. The second NARROWB is then transformed into a NARROWT.
>>
>> For the example:
>>
>> void draw_bitmap1(uint8_t* restrict pixel,
Tamar Christina writes:
> Hi All,
>
> This adds an RTL pattern for when two NARROWB instructions are being combined
> with a PACK. The second NARROWB is then transformed into a NARROWT.
>
> For the example:
>
> void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n)
> {
> for (int i =
Sorry for the slow review, been snowed under with stage1 stuff.
Tamar Christina writes:
> Hi All,
>
> In plenty of image and video processing code it's common to modify pixel
> values
> by a widening operation and then scale them back into range by dividing by
> 255.
>
> This patch adds an
We didn't previously use SVE's RDVL instruction, since the CNT*
forms are preferred and provide most of the range. However,
there are some cases that RDVL can handle and CNT* can't,
and using RDVL-like instructions becomes important for SME.
Tested on aarch64-linux-gnu. I plan to apply this
Arm's SME has an array called ZA that for inline asm purposes
is effectively a form of special-purpose memory. It doesn't
have an associated storage type and so can't be passed and
returned in normal C/C++ objects.
We'd therefore like "za" in a clobber list to mean that an inline
asm can read
We have the following two hooks into the call expansion code:
- TARGET_CALL_ARGS is called for each argument before arguments
are moved into hard registers.
- TARGET_END_CALL_ARGS is called after the end of the call
sequence (specifically, after any return value has been
moved to a
Epilogues for sibling calls are generated using the
sibcall_epilogue pattern. One disadvantage of this approach
is that the target doesn't know which call the epilogue is for,
even though the code that generates the pattern has the call
to hand.
Although call instructions are currently rtxes,
Arm's SME adds a new processor mode called streaming mode.
This mode enables some new (matrix-oriented) instructions and
disables several existing groups of instructions, such as most
Advanced SIMD vector instructions and a much smaller set of SVE
instructions. It can also change the current
The prologue/epilogue pass allows the prologue sequence
to contain jumps. The sequence is then partitioned into
basic blocks using find_many_sub_basic_blocks.
This patch treats epilogues in the same way. It's needed for
a follow-on aarch64 patch that adds conditional code to both
the prologue
Tamar Christina writes:
> Hi All,
>
> Currently we often times generate an r -> r add even if it means we need two
> reloads to perform it, i.e. in the case that the values are on the SIMD side.
>
> The pairwise operations expose these more now and so we get suboptimal
> codegen.
>
> Normally I
Tamar Christina writes:
> Hi All,
>
> The backend has an existing V2HFmode that is used by pairwise operations.
> This mode was however never made fully functional. Amongst other things it
> was
> never declared as a vector type which made it unusable from the mid-end.
>
> It's also lacking an
Tamar Christina writes:
> Hi All,
>
> This implements the new widening reduction optab in the backend.
> Instead of introducing a duplicate definition for the same thing I have
> renamed the intrinsics defintions to use the same optab.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no
Tamar Christina via Gcc-patches writes:
> Hi All,
>
> The current vector extract pattern can only extract from a vector when the
> position to extract is a multiple of the vector bitsize as a whole.
>
> That means extract something like a V2SI from a V4SI vector from position 32
> isn't possible
Tamar Christina writes:
> Hi All,
>
> Our zero and sign extend and extract patterns are currently very limited and
> only work for the original register size of the instructions. i.e. limited by
> GPI patterns. However these instructions extract bits and extend. This means
> that any register
2022 at 21:38, Richard Sandiford
>> > wrote:
>> >>
>> >> Richard Sandiford via Gcc-patches writes:
>> >> > Prathamesh Kulkarni writes:
>> >> >> Sorry to ask a silly question but in which case shall we select 2nd
>> >
In 9482a5e4eac8d696129ec2854b331e1bb5dbab42 I'd replaced uses
of CONSTEXPR with direct uses of constexpr. However, it turns
out that we still have CONSTEXPR for a reason: GCC 4.8 doesn't
implement constexpr properly, and for example rejects things like:
extern const int x;
constexpr int x =
Sorry for the slow response. I wanted to find some time to think
about this a bit more.
Prathamesh Kulkarni writes:
> On Fri, 30 Sept 2022 at 21:38, Richard Sandiford
> wrote:
>>
>> Richard Sandiford via Gcc-patches writes:
>> > Prathamesh Kulkarni writes:
>>
Eric Botcazou via Gcc-patches writes:
> Hi,
>
> we have been using an Ada compiler for the Aarch64 architecture configured
> with SJLJ exceptions as for the other architectures for some time, and have
> not run into any problems so far so the setting looks obsolete now.
>
> OK for the mainline?
Segher Boessenkool writes:
> On Thu, Oct 20, 2022 at 07:34:13AM +, Jiang, Haochen wrote:
>> > > + /* Argument 3 must be either zero or one. */
>> > > + if (INTVAL (op3) != 0 && INTVAL (op3) != 1)
>> > > +{
>> > > + warning (0, "invalid fourth argument to %<__builtin_prefetch%>;"
Wilco Dijkstra writes:
> Hi Richard,
>
>> Can you do the aarch64_mov_imm changes as a separate patch? It's difficult
>> to review the two changes folded together like this.
>
> Sure, I'll send a separate patch. So here is version 2 again:
I still think we should move the functions to avoid the
Richard Sandiford writes:
> Philipp Tomsich writes:
>> This brings the extensions detected by -mcpu=native on Ampere-1 systems
>> in sync with the defaults generated for -mcpu=ampere1.
>>
>> Note that some early kernel versions on Ampere1 may misreport the
>> presence of PAUTH and PREDRES (i.e.,
Now that the codebase is C++11, we can use using directives
to inherit constructors from base classes.
Tested on aarch64-linux-gnu & pushed.
Richard
gcc/
* config/aarch64/aarch64-sve-builtins-functions.h (quiet)
(rtx_code_function, rtx_code_function_rotated,
701 - 800 of 2292 matches
Mail list logo