Richard Sandiford writes:
> "yanzhang.wang--- via Gcc-patches" writes:
>> From: Yanzhang Wang
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/aarch64/sve/acle/asm/subr_s8.c: Modify subr with -1
>> to not.
>>
>> Signed-off-by: Yanzhang Wang
>> ---
>>
>> Tested on my local arm
Thiago Jung Bauermann via Gcc-patches writes:
> Since commit e7a36e4715c7 "[PATCH] RISC-V: Support simplify (-1-x) for
> vector." these tests fail on aarch64-linux:
>
> === g++ tests ===
>
> Running g++:g++.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp ...
> FAIL:
Marc Poulhiès via Gcc-patches writes:
> Richard Sandiford via Gcc-patches writes:
>>> +# this regex matches the first line of the "end" in the initial commit
>>> message
>>> +FIRST_LINE_OF_END_RE = re.compile('(?i)^(signed-off-by|co-authored-by|#):
>
Christophe Lyon via Gcc-patches writes:
> Tests under gcc.dg/vect use check_vect_support_and_set_flags to set
> compilation flags as appropriate for the target, but they also set
> dg-do-what-default to 'run' or 'compile', depending on the actual
> target hardware (or simulator) capabilities.
>
>
"yanzhang.wang--- via Gcc-patches" writes:
> From: Yanzhang Wang
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/sve/acle/asm/subr_s8.c: Modify subr with -1
> to not.
>
> Signed-off-by: Yanzhang Wang
> ---
>
> Tested on my local arm environment and passed. Thanks Andrew Pinski's
Tamar Christina writes:
>> -Original Message-
>> From: Richard Sandiford
>> Sent: Friday, September 1, 2023 2:36 PM
>> To: Tamar Christina
>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> ; Marcus Shawcroft
>> ; Kyrylo Tkachov
>> Subject: Re: [PATCH]AArch64 xorsign: Fix scalar
Tamar Christina writes:
> Hi All,
>
> In GCC-9 our scalar xorsign pattern broke and we didn't notice it because the
> testcase was not strong enough. With this commit
>
> 8d2d39587d941a40f25ea0144cceb677df115040 is the first bad commit
> commit 8d2d39587d941a40f25ea0144cceb677df115040
> Author:
Uros Bizjak via Gcc-patches writes:
> On Thu, Aug 31, 2023 at 11:18 AM Jakub Jelinek via Gcc-patches
> wrote:
>>
>> On Thu, Aug 31, 2023 at 04:20:17PM +0800, Hongyu Wang via Gcc-patches wrote:
>> > From: Kong Lingling
>> >
>> > In inline asm, we do not know if the insn can use EGPR, so disable
Robin Dapp via Gcc-patches writes:
>> It's not just a question of which byte though. It's also a question
>> of which bit.
>>
>> One option would be to code-generate for even X and for odd X, and select
>> between them at runtime. But that doesn't scale well to 2+2X and 1+1X.
>>
>> Otherwise
While backporting another patch to an earlier release, I hit a
situation in which lra_eliminate_regs_1 would eliminate an address to:
(plus (reg:P R) (const_int 0))
This address compared not-equal to plain:
(reg:P R)
which caused an ICE in a later peephole2. (The ICE showed up in
While working on another patch, I hit a problem with the aarch64
expansion of untyped_call. The expander emits the usual:
(set (mem ...) (reg resN))
instructions to store the result registers to memory, but it didn't
say in RTL where those resN results came from. This eventually led
to a
"juzhe.zh...@rivai.ai" writes:
> Thanks Richi.
>
> I am trying to figure out how to adjust finish_cost to lower the LMUL
>
> For example:
>
> void
> foo (int32_t *__restrict a, int32_t *__restrict b, int n)
> {
> for (int i = 0; i < n; i++)
> a[i] = a[i] + b[i];
> }
>
> preferred_simd_mode
Robin Dapp writes:
>> But in the VLA case, doesn't it instead have precision 4+4X?
>> The problem then is that we can't tell at compile time which
>> byte that corresponds to. So...
>
> Yes 4 + 4x. I keep getting confused with poly modes :)
> In this case we want to extract the bitnum [3 4] = 3
[Sorry for any weird MUA issues, don't have access to my usual set-up.]
> when looking at a riscv ICE in vect-live-6.c I noticed that we
> assume that the variable part (coeffs[1] * x1) of the to-be-extracted
> bit number in extract_bit_field_1 is a multiple of BITS_PER_UNIT.
>
> This means that
excl_hash_traits can be defined more simply by reusing existing traits.
Tested on aarch64-linux-gnu. OK to install?
Richard
gcc/
* attribs.cc (excl_hash_traits): Delete.
(test_attribute_exclusions): Use pair_hash and nofree_string_hash
instead.
---
gcc/attribs.cc | 45
Jeff Law writes:
> On 8/24/23 08:06, Robin Dapp via Gcc-patches wrote:
>> Ping. I refined the code and some comments a bit and added a test
>> case.
>>
>> My question in general would still be: Is this something we want
>> given that we potentially move some of combine's work a bit towards
>>
Just some off-the-cuff thoughts. Might think differently when
I've had more time...
Richard Biener writes:
> On Mon, 28 Aug 2023, Jakub Jelinek wrote:
>
>> Hi!
>>
>> While the _BitInt series isn't committed yet, I had a quick look at
>> lifting the current lowest limitation on maximum _BitInt
Juzhe-Zhong writes:
> Hi, Richard and Richi.
>
> Currently, GCC support COND_LEN_FMA for floating-point **NO** -ffast-math.
> It's supported in tree-ssa-math-opts.cc. However, GCC failed to support
> COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS.
>
> Consider this following case:
> #define
Jeff Law writes:
> On 8/22/23 02:08, juzhe.zh...@rivai.ai wrote:
>> Yes, I agree long-term we want every-thing be optimized as early as
>> possible.
>>
>> However, IMHO, it's impossible we can support every conditional patterns
>> in the middle-end (match.pd).
>> It's a really big number.
>>
Richard Sandiford writes:
> Rather than hiding this in target code, perhaps we should add a
> target-independent concept of an "eh_return taken" flag, say
> EH_RETURN_TAKEN_RTX.
>
> We could define it so that, on targets that define EH_RETURN_TAKEN_RTX,
> a register EH_RETURN_STACKADJ_RTX and a
Richard Biener writes:
> The following adds the capability to do SLP on .MASK_STORE, I do not
> plan to add interleaving support.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?
LGTM, thanks.
Richard
> Thanks,
> Richard.
>
> PR tree-optimization/15
> gcc/
> *
The scalar FNMADD/FNMSUB and SVE FNMLA/FNMLS instructions mean
that either side of a subtraction can start an accumulator chain.
However, Advanced SIMD doesn't have an equivalent instruction.
This means that, for Advanced SIMD, a subtraction can only be
fused if the second operand is a
Wilco Dijkstra writes:
> Hi Richard,
>
> (that's quick!)
>
>> + if (size > max_copy_size || size > max_mops_size)
>> +return aarch64_expand_cpymem_mops (operands, is_memmove);
>>
>> Could you explain this a bit more? If I've followed the logic correctly,
>> max_copy_size will always be 0
"Richard Earnshaw (lists)" writes:
> On 23/08/2023 16:49, Richard Sandiford via Gcc-patches wrote:
>> Richard Earnshaw via Gcc-patches writes:
>>> Now that we require C++ 11, we can safely forward declare rtx_code
>>> so that we can use it i
Richard Earnshaw via Gcc-patches writes:
> Note, this patch is dependent on the patch I posted yesterday to
> forward declare rtx_code in coretypes.h.
>
> --
> Now that we have a forward declaration of rtx_code in coretypes.h, we
> can adjust these hooks to take rtx_code arguments rather than
Richard Earnshaw via Gcc-patches writes:
> Now that we require C++ 11, we can safely forward declare rtx_code
> so that we can use it in target hooks.
>
> gcc/ChangeLog
> * coretypes.h (rtx_code): Add forward declaration.
> * rtl.h (rtx_code): Make compatible with forward declaration.
Wilco Dijkstra writes:
> A MOPS memmove may corrupt registers since there is no copy of the input
> operands to temporary
> registers. Fix this by calling aarch64_expand_cpymem which does this. Also
> fix an issue with
> STRICT_ALIGNMENT being ignored if TARGET_MOPS is true, and avoid
Szabolcs Nagy writes:
> The expected way to handle eh_return is to pass the stack adjustment
> offset and landing pad address via
>
> EH_RETURN_STACKADJ_RTX
> EH_RETURN_HANDLER_RTX
>
> to the epilogue that is shared between normal return paths and the
> eh_return paths. EH_RETURN_HANDLER_RTX
Marc Poulhiès via Gcc-patches writes:
> Consider Signed-Off-By lines as part of the ending of the initial
> commit to avoid having these in the middle of the log when the
> changelog part is injected after.
>
> This is particularly usefull with:
>
> $ git gcc-commit-mklog --amend -s
>
> that can
Thiago Jung Bauermann via Gcc-patches writes:
> This test passes since commit e41103081bfa "Fix undefined behaviour in
> profile_count::differs_from_p", so remove the xfail annotation.
>
> Tested on aarch64-linux-gnu, armv8l-linux-gnueabihf and x86_64-linux-gnu.
>
> gcc/testsuite/ChangeLog:
>
Richard Biener writes:
> On Wed, 16 Aug 2023, Juzhe-Zhong wrote:
>
>> Hi, Richard and Richi.
>>
>> Currently, GCC support COND_LEN_FMA for floating-point **NO** -ffast-math.
>> It's supported in tree-ssa-math-opts.cc. However, GCC failed to support
>> COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS.
Juzhe-Zhong writes:
> Hi, Richard and Richi.
>
> Currently, GCC support COND_LEN_FMA for floating-point **NO** -ffast-math.
> It's supported in tree-ssa-math-opts.cc. However, GCC failed to support
> COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS.
>
> Consider this following case:
> #define
Prathamesh Kulkarni writes:
> On Mon, 21 Aug 2023 at 12:26, Richard Biener wrote:
>>
>> On Sat, 19 Aug 2023, Prathamesh Kulkarni wrote:
>>
>> > On Fri, 18 Aug 2023 at 14:52, Richard Biener wrote:
>> > >
>> > > On Fri, 18 Aug 2023, Richard Sandiford wrote:
>> > >
>> > > > Richard Biener writes:
Richard Sandiford writes:
> Joseph Myers writes:
>> On Wed, 16 Aug 2023, Richard Sandiford via Gcc-patches wrote:
>>
>>> Would it be OK to add support for:
>>>
>>> [[__extension__ ...]]
>>>
>>> to suppress the pedwarn about
Richard Biener writes:
> The following avoids running into somehow flawed logic in fold_vec_perm
> for non-VLA vectors.
>
> Bootstrap & regtest running on x86_64-unknown-linux-gnu.
>
> Richard.
>
> PR tree-optimization/111048
> * fold-const.cc (fold_vec_perm_cst): Check for non-VLA
>
Richard Biener writes:
>> Am 17.08.2023 um 13:25 schrieb Richard Sandiford via Gcc-patches
>> :
>>
>> Joseph Myers writes:
>>>> On Wed, 16 Aug 2023, Richard Sandiford via Gcc-patches wrote:
>>>>
>>>> Would it be OK to add support f
Joseph Myers writes:
> On Wed, 16 Aug 2023, Richard Sandiford via Gcc-patches wrote:
>
>> Would it be OK to add support for:
>>
>> [[__extension__ ...]]
>>
>> to suppress the pedwarn about using [[]] prior to C2X? Then we can
>
> That seem
Alex Coplan writes:
> Hi,
>
> This patch fixes up the code examples in the RTL-SSA documentation (the
> sections on making insn changes) to reflect the current API.
>
> The main issues are as follows:
> - rtl_ssa::recog takes an obstack_watermark & as the first parameter.
>Presumably this is
Joseph Myers writes:
> On Mon, 17 Jul 2023, Michael Matz via Gcc-patches wrote:
>
>> So, essentially you want unignorable attributes, right? Then implement
>> exactly that: add one new keyword "__known_attribute__" (invent a better
>> name, maybe :) ), semantics exactly as with __attribute__
Richard Ball writes:
> v2: Add missing PROFILE feature flag.
>
> This patch adds support for the Cortex-A720 CPU to GCC.
>
> No regressions on aarch64-none-elf.
>
> Ok for master?
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-cores.def (AARCH64_CORE): Add Cortex-
> A720 CPU.
>
Robin Dapp writes:
>> However:
>>
>> | #define vec_extract_direct { 3, 3, false }
>>
>> This looks wrong. The numbers are argument numbers (or -1 for a return
>> value). vec_extract only takes 2 arguments, so 3 looks to be out-of-range.
>>
>> | #define direct_vec_extract_optab_supported_p
Prathamesh Kulkarni writes:
>> Unfortunately, the patch regressed following tests on ppc64le and
>> armhf respectively:
>> gcc.target/powerpc/vec-perm-ctor.c scan-tree-dump-not optimized
>> "VIEW_CONVERT_EXPR"
>> gcc.dg/tree-ssa/forwprop-20.c scan-tree-dump-not forwprop1 "VEC_PERM_EXPR"
>>
>>
"juzhe.zh...@rivai.ai" writes:
> Hi, Robin, Richard and Richi.
>
> I am wondering whether we can just simply replace the VEC_EXTRACT expander
> with binary?
>
> Like this :?
>
> DEF_INTERNAL_OPTAB_FN (VEC_EXTRACT, ECF_CONST | ECF_NOTHROW,
> - vec_extract, vec_extract)
> +
Richard Biener writes:
> The following changes the gate to perform vectorization of BB reductions
> to use needs_fold_left_reduction_p which in turn requires handling
> TYPE_OVERFLOW_UNDEFINED types in the epilogue code generation by
> promoting any operations generated there to use unsigned
Richard Biener writes:
>> OK, fair enough. So the idea is: see where we end up and then try to
>> improve/factor the APIs in a less peephole way?
>
> Yeah, I think that's the only good way forward.
OK, no objection from me. Sorry for holding the patch up.
Richard
Richard Biener writes:
> On Tue, 15 Aug 2023, Richard Sandiford wrote:
>
>> Richard Biener writes:
>> > On Tue, 15 Aug 2023, Kewen.Lin wrote:
>> >
>> >> Hi Stefan,
>> >>
>> >> on 2023/8/15 02:51, Stefan Schulze Frielinghaus wrote:
>> >> > Hi everyone,
>> >> >
>> >> > I have bootstrapped and
Richard Biener writes:
> On Mon, 14 Aug 2023, Prathamesh Kulkarni wrote:
>> On Mon, 7 Aug 2023 at 13:19, Richard Biener
>> wrote:
>> > It doesn't seem to make a difference for x86. That said, the "fix" is
>> > probably sticking the correct target on the dump-check, it seems
>> > that
Richard Biener writes:
> On Tue, 15 Aug 2023, Kewen.Lin wrote:
>
>> Hi Stefan,
>>
>> on 2023/8/15 02:51, Stefan Schulze Frielinghaus wrote:
>> > Hi everyone,
>> >
>> > I have bootstrapped and regtested the patch below on s390. For the
>> > 64-bit target I do not see any changes regarding the
Richard Biener writes:
> On Tue, Aug 15, 2023 at 4:44 AM Kewen.Lin wrote:
>>
>> on 2023/8/14 22:16, Richard Sandiford wrote:
>> > No, it was more that 219-142=77, so it seems like a lot of lines
>> > are being duplicated rather than simply being moved. (Unlike for
>> > VMAT_LOAD_STORE_LANES,
Andrew Pinski via Gcc-patches writes:
> Like the support conditional neg (r12-4470-g20dcda98ed376cb61c74b2c71),
> this just adds conditional not too.
> Also we should be able to turn `(a ? -1 : 0) ^ b` into a conditional
> not.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu and
I think it would help to clarify what the aim of the security policy is.
Specifically:
(1) What service do we want to provide to users by classifying one thing
as a security bug and another thing as not a security bug?
(2) What service do we want to provide to the GNU community by the same
"Kewen.Lin" writes:
> Hi Richard,
>
> on 2023/8/14 20:20, Richard Sandiford wrote:
>> Thanks for the clean-ups. But...
>>
>> "Kewen.Lin" writes:
>>> Hi,
>>>
>>> Following Richi's suggestion [1], this patch is to move the
>>> handlings on VMAT_GATHER_SCATTER in the final loop nest
>>> of
Prathamesh Kulkarni writes:
> On Thu, 10 Aug 2023 at 21:27, Richard Sandiford
> wrote:
>>
>> Prathamesh Kulkarni writes:
>> >> static bool
>> >> is_simple_vla_size (poly_uint64 size)
>> >> {
>> >> if (size.is_constant ())
>> >> return false;
>> >> for (int i = 1; i < ARRAY_SIZE
Thanks for the clean-ups. But...
"Kewen.Lin" writes:
> Hi,
>
> Following Richi's suggestion [1], this patch is to move the
> handlings on VMAT_GATHER_SCATTER in the final loop nest
> of function vectorizable_load to its own loop. Basically
> it duplicates the final loop nest, clean up some
Juzhe-Zhong writes:
> Hi, there is genrecog issue happens in RISC-V backend.
>
> This is the ICE info:
>
> 0xfa3ba4 poly_int_pod<2u, unsigned short>::to_constant() const
> ../../../riscv-gcc/gcc/poly-int.h:504
> 0x28eaa91 recog_5
>
Richard Biener writes:
> On Fri, 11 Aug 2023, juzhe.zh...@rivai.ai wrote:
>
>> Hi, Richi.
>>
>> > 1. Target is using loop MASK as the partial vector loop control.
>> >> I don't think it checks for this?
>>
>> I am not sure whether I understand EXTRACT_LAST correctly.
>> But if target doesn't
Richard Biener writes:
> When we vectorize fold-left reductions with partial vectors but
> no target operation available we use a vector conditional to force
> excess elements to zero. But that doesn't correctly preserve
> the sign of zero. The following patch disables partial vector
> support
Siddhesh Poyarekar writes:
> On 2023-08-08 10:30, Siddhesh Poyarekar wrote:
>>> Do you have a suggestion for the language to address libgcc,
>>> libstdc++, etc. and libiberty, libbacktrace, etc.?
>>
>> I'll work on this a bit and share a draft.
>
> Hi David,
>
> Here's what I came up with for
Prathamesh Kulkarni writes:
>> static bool
>> is_simple_vla_size (poly_uint64 size)
>> {
>> if (size.is_constant ())
>> return false;
>> for (int i = 1; i < ARRAY_SIZE (size.coeffs); ++i)
>> if (size[i] != (i <= 1 ? size[0] : 0))
> Just wondering is this should be (i == 1 ? size[0] :
Richard Biener writes:
> On Thu, Aug 10, 2023 at 3:44 PM Richard Sandiford
> wrote:
>>
>> Richard Biener via Gcc-patches writes:
>> > On Wed, Aug 9, 2023 at 6:16 PM Andrew Pinski via Gcc-patches
>> > wrote:
>> >>
>> >> If `A` has a range of `[0,0][100,INF]` and the comparison
>> >> of `A <
Richard Biener via Gcc-patches writes:
> On Wed, Aug 9, 2023 at 6:16 PM Andrew Pinski via Gcc-patches
> wrote:
>>
>> If `A` has a range of `[0,0][100,INF]` and the comparison
>> of `A < 50`. This should be optimized to `A <= 0` (which then
>> will be optimized to just `A == 0`).
>> This patch
Jakub Jelinek writes:
> On Wed, Aug 09, 2023 at 06:27:20PM +0100, Richard Sandiford wrote:
>> Jakub Jelinek writes:
>> > On Wed, Aug 09, 2023 at 05:55:28PM +0100, Richard Sandiford wrote:
>> >> Jakub: do you remember what the reason was? I don't mind dropping
>> >> "function", but it feels
Jakub Jelinek writes:
> On Wed, Aug 09, 2023 at 05:55:28PM +0100, Richard Sandiford wrote:
>> Jakub: do you remember what the reason was? I don't mind dropping
>> "function", but it feels weird to drop the quotes around "simd".
>> Seems like, if we do that, there'll one day be a patch to add
>>
"Andre Vieira (lists)" writes:
> Here is my new version, see inline response to your comments.
>
> New cover letter:
>
> This patch enables the use of mixed-types for simd clones for AArch64,
> adds aarch64 as a target_vect_simd_clones and corrects the way the
> simdlen is chosen for
"juzhe.zh...@rivai.ai" writes:
> Hi, Richi.
>
>>> that should be
>
>>> || (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
>>> && !LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
>
>>> I think. It seems to imply that SLP isn't supported with
>>> masking/lengthing.
>
> Oh, yes. At first glance, the
Richard Ball writes:
> ACLE has added intrinsics to bridge between SVE and Neon.
>
> The NEON_SVE Bridge adds intrinsics that allow conversions between NEON and
> SVE vectors.
>
> This patch adds support to GCC for the following 3 intrinsics:
> svset_neonq, svget_neonq and svdup_neonq
>
>
Richard Ball writes:
> This patch adds support for the Cortex-A520 CPU to GCC.
>
> No regressions on aarch64-none-elf.
>
> Ok for master?
>
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-cores.def (AARCH64_CORE): Add
> Cortex-A520 CPU.
> * config/aarch64/aarch64-tune.md:
"Andre Vieira (lists)" writes:
> Hi,
>
> This patch enables the use of mixed-types for simd clones for AArch64
> and adds aarch64 as a target_vect_simd_clones.
>
> Bootstrapped and regression tested on aarch64-unknown-linux-gnu
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64.cc
Prathamesh Kulkarni writes:
> On Fri, 4 Aug 2023 at 20:36, Richard Sandiford
> wrote:
>>
>> Full review this time, sorry for the skipping the tests earlier.
> Thanks for the detailed review! Please find my responses inline below.
>>
>> Prathamesh Kulkarni writes:
>> > diff --git
Full review this time, sorry for the skipping the tests earlier.
Prathamesh Kulkarni writes:
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index 7e5494dfd39..680d0e54fd4 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -85,6 +85,10 @@ along with GCC; see the file
Richard Biener writes:
> The following fixes a problem with my last attempt of avoiding
> out-of-bound shift values for vectorized right shifts of widened
> operands. Instead of truncating the shift amount with a bitwise
> and we actually need to saturate it to the target precision.
>
> The
YunQiang Su writes:
> PR #104914
>
> On TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)) == true platforms,
> zero_extract (SI, SI) can be sign-extended. So, if a zero_extract (DI,
> DI) following with an sign_extend(SI, DI) can be merged to a single
> zero_extract (SI, SI).
>
> gcc/ChangeLog:
>
Tamar Christina writes:
>> >> Do you see vect_constant_defs in practice, or is this just for
>> >> completeness?
>> >> I would expect any constants to appear as direct operands. I don't
>> >> mind keeping it if it's just a belt-and-braces thing though.
>> >
>> > In the latency case where I had
Richard Sandiford writes:
> Prathamesh Kulkarni writes:
>> On Tue, 25 Jul 2023 at 18:25, Richard Sandiford
>> wrote:
>>>
>>> Hi,
>>>
>>> Thanks for the rework and sorry for the slow review.
>> Hi Richard,
>> Thanks for the suggestions! Please find my responses inline below.
>>>
>>> Prathamesh
Prathamesh Kulkarni writes:
> On Tue, 25 Jul 2023 at 18:25, Richard Sandiford
> wrote:
>>
>> Hi,
>>
>> Thanks for the rework and sorry for the slow review.
> Hi Richard,
> Thanks for the suggestions! Please find my responses inline below.
>>
>> Prathamesh Kulkarni writes:
>> > Hi Richard,
>> >
Tamar Christina writes:
>> > +
>> > +(define_constraint "D3"
>> > + "@internal
>> > + A constraint that matches vector of immediates that is with 0 to
>> > +(bits(mode)/2)-1."
>> > + (and (match_code "const,const_vector")
>> > + (match_test "aarch64_const_vec_all_same_in_range_p (op, 0,
>>
can_div_trunc_p (a, b, , ) tries to compute a Q and r that
satisfy the usual conditions for truncating division:
(1) a = b * Q + r
(2) |b * Q| <= |a|
(3) |r| < |b|
We can compute Q using the constant component (the case when
all indeterminates are zero). Since |r| < |b| for the
Hao Liu OS writes:
> Hi Richard,
>
> Update the patch with a simple case (see below case and comments). It shows
> a live stmt may not have reduction def, which introduce the ICE.
>
> Is it OK for trunk?
OK, thanks.
Richard
>
> Fix the assertion failure on empty reduction define in
Richard Biener writes:
> [...]
>> >> in vect_determine_precisions_from_range. Maybe we should drop
>> >> the shift handling from there and instead rely on
>> >> vect_determine_precisions_from_users, extending:
>> >>
>> >> if (TREE_CODE (shift) != INTEGER_CST
>> >> || !wi::ltu_p
Tamar Christina writes:
> Hi All,
>
> Currently we segfault when len == 0 for an attribute list.
>
> essentially [cons: =0, 1, 2, 3; attrs: ] segfaults but should be equivalent to
> [cons: =0, 1, 2, 3] and [cons: =0, 1, 2, 3; attrs:]. This fixes it by just
> returning early and leaving it to the
Tamar Christina writes:
> Hi All,
>
> In GCC 11 we implemented the vectorizer optab for widening left shifts,
> however this optab is only supported for uniform shift constants.
>
> At the moment GCC still has two loop vectorization strategy (classical loop
> and
> SLP based loop vec) and the
Richard Biener writes:
> On Tue, 1 Aug 2023, Richard Sandiford wrote:
>
>> Richard Sandiford writes:
>> > Richard Biener via Gcc-patches writes:
>> >> The following makes sure to limit the shift operand when vectorizing
>> >> (short)((int)x >> 31) via (short)x >> 31 as the out of bounds shift
Tamar Christina writes:
>> Tamar Christina writes:
>> > Hi All,
>> >
>> > When determining issue rates we currently discount non-constant MLA
>> > accumulators for Advanced SIMD but don't do it for the latency.
>> >
>> > This means the costs for Advanced SIMD with a constant accumulator are
>> >
Tamar Christina writes:
> Hi All,
>
> boolean comparisons have different cost depending on the mode. e.g.
> a && b when predicated doesn't require an addition instruction, the AND is
> free
Nit (for the commit msg): additional
Maybe:
for SVE, a && b doesn't require an additional instruction
Tamar Christina writes:
> Hi All,
>
> When determining issue rates we currently discount non-constant MLA
> accumulators
> for Advanced SIMD but don't do it for the latency.
>
> This means the costs for Advanced SIMD with a constant accumulator are wrong
> and
> results in us costing SVE and
Jeff Law via Gcc-patches writes:
> On 8/1/23 05:18, Richard Sandiford wrote:
>>
>> Where were you seeing the requirement for pointer equality? genrecog.cc
>> at least uses rtx_equal_p, and I think it has to. E.g. some patterns
>> use (match_dup ...) to match output and input mems, and mem
Jeff Law via Gcc-patches writes:
> On 7/19/23 04:11, Xiao Zeng wrote:
>> This patch completes the recognition of the basic semantics
>> defined in the spec, namely:
>>
>> Conditional zero, if condition is equal to zero
>>rd = (rs2 == 0) ? 0 : rs1
>> Conditional zero, if condition is non zero
Richard Sandiford writes:
> Richard Biener via Gcc-patches writes:
>> The following makes sure to limit the shift operand when vectorizing
>> (short)((int)x >> 31) via (short)x >> 31 as the out of bounds shift
>> operand otherwise invokes undefined behavior. When we determine
>> whether we can
Richard Biener via Gcc-patches writes:
> The following makes sure to limit the shift operand when vectorizing
> (short)((int)x >> 31) via (short)x >> 31 as the out of bounds shift
> operand otherwise invokes undefined behavior. When we determine
> whether we can demote the operand we know we at
Juzhe-Zhong writes:
> Hi, Richard and Richi.
>
> Base on the suggestions from Richard:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625396.html
>
> This patch choose (1) approach that Richard provided, meaning:
>
> RVV implements cond_* optabs as expanders. RVV therefore supports
> both
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong
>
> Hi, Richard and Richi.
>
> Base on previous disscussions, we should make COND_* and COND_LEN_*
> consistent.
>
> So, this patch define these internal function together by these 2
> wrappers:
>
> #ifndef DEF_INTERNAL_COND_FN
> #define
Richard Ball writes:
> Add POLY_INT_CST support to code within
> fold_ctor_reference. This code previously
> only supported INTEGER_CST which caused a
> bug when using VEC_PERM_EXPR with SVE vectors.
Just to add for others: this is a prerequisite for a follow-on patch,
so the change will be
Hao Liu OS writes:
>> Which test case do you see this for? The two tests in the patch still
>> seem to report correct latencies for me if I make the change above.
>
> Not the newly added tests. It is still the existing case causing the
> previous ICE (i.e. assertion problem):
Sorry for the slow response.
Hao Liu OS writes:
>> Ah, thanks. In that case, Hao, I think we can avoid the ICE by changing:
>>
>> if ((kind == scalar_stmt || kind == vector_stmt || kind == vec_to_scalar)
>> && vect_is_reduction (stmt_info))
>>
>> to:
>>
>> if ((kind == scalar_stmt ||
Richard Biener writes:
> On Wed, Jul 26, 2023 at 11:14 AM Richard Sandiford
> wrote:
>>
>> Richard Biener writes:
>> > On Wed, Jul 26, 2023 at 4:02 AM Hao Liu OS via Gcc-patches
>> > wrote:
>> >>
>> >> > When was STMT_VINFO_REDUC_DEF empty? I just want to make sure that
>> >> > we're not
Richard Biener writes:
> On Wed, Jul 26, 2023 at 4:02 AM Hao Liu OS via Gcc-patches
> wrote:
>>
>> > When was STMT_VINFO_REDUC_DEF empty? I just want to make sure that we're
>> > not papering over an issue elsewhere.
>>
>> Yes, I also wonder if this is an issue in vectorizable_reduction.
Was leaving a bit of time in case Richi had any comments, but:
Matthew Malcomson writes:
> Our checks for whether the vectorization of a given loop would make an
> out of bounds access miss the case when the vector we load is so large
> as to span multiple iterations worth of data (while only
Hi,
Thanks for the rework and sorry for the slow review.
Prathamesh Kulkarni writes:
> Hi Richard,
> This is reworking of patch to extend fold_vec_perm to handle VLA vectors.
> The attached patch unifies handling of VLS and VLA vector_csts, while
> using fallback code
> for ctors.
>
> For VLS
"juzhe.zh...@rivai.ai" writes:
> Hi, Richard.
>>> I think we should have an internal-fn helper that returns IFN_COND_LEN_*
>>> for a given IFN_COND_*. It could handle IFN_MASK_LOAD -> IFN_MASK_LEN_LOAD
>>> etc. too.
> Could you name this helper function for me? Does it call
>
"juzhe.zh...@rivai.ai" writes:
> Thanks Richard.
>
> Do you suggest we should add a macro like this first:
>
> #ifndef DEF_INTERNAL_COND_FN
> #define DEF_INTERNAL_COND_FN(NAME, FLAGS, OPTAB, TYPE) \
> DEF_INTERNAL_OPTAB_FN (COND_##NAME, FLAGS, cond_##optab, cond_##TYPE)
> DEF_INTERNAL_OPTAB_FN
101 - 200 of 2263 matches
Mail list logo