Richard Biener via Gcc-patches writes:
> On Sun, Jun 25, 2023 at 7:39 AM Roger Sayle
> wrote:
>>
>>
>> On Tue, 13 June 2023 12:02, Richard Biener wrote:
>> > On Mon, Jun 12, 2023 at 4:04 PM Roger Sayle
>> > wrote:
>> > > The following simple test case, from PR 104610, shows that memcmp ()
>> >
Philipp Tomsich writes:
> Richard,
>
> OK for backport to GCC-13?
Yeah, OK for GCC 13 too.
Thanks,
Richard
> Thanks,
> Philipp.
>
> On Thu, 22 Jun 2023 at 16:18, Richard Sandiford via Gcc-patches
> wrote:
>>
>> Di Zhao OS via Gcc-patches writes:
&
Richard Biener writes:
> The following fixes a bug that manifests itself during fold-left
> reduction transform in picking not the last scalar def to replace
> and thus double-counting some elements. But the underlying issue
> is that we merge a load permutation into the in-order reduction
>
g:6f19cf7526168f8 extended N-vector to N-vector conversions
to handle cases where an intermediate integer extension or
truncation is needed. This patch adjusts the cost to account
for these intermediate conversions.
Tested on aarch64-linux-gnu & x86_64-linux-gnu. OK to install?
Richard
gcc/
liuhongt writes:
> The new assembly looks better than original one, so I adjust those testcases.
The new loops are shorter, but they process only half the amount of data
per iteration.
The problem is that the new vectoriser code generates multiple statements
but only costs one. I'll post a fix
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong
>
> Address comments from Richard and Bernhard from V5 patch.
> V6 fixed all issues according their comments.
>
> gcc/ChangeLog:
>
> * internal-fn.cc (expand_partial_store_optab_fn): Adapt for
> LEN_MASK_STORE.
>
Bernhard Reutner-Fischer writes:
> On 23 June 2023 01:51:12 CEST, juzhe.zh...@rivai.ai wrote:
>>From: Ju-Zhe Zhong
>
> I am sorry but I somehow overlooked a trivial spot in V5.
> Nit which does not warrant an immediate next version, but please consider it
> before pushing iff approved:
>
>>+
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong
>
> gcc/ChangeLog:
>
> * internal-fn.cc (expand_partial_store_optab_fn): Adapt for
> LEN_MASK_STORE.
> (internal_load_fn_p): Add LEN_MASK_LOAD.
> (internal_store_fn_p): Add LEN_MASK_STORE.
>
Di Zhao OS via Gcc-patches writes:
> This patch enables reassociation of floating-point additions on ampere1.
> This brings about 1% overall benefit on spec2017 fprate cases. (There
> are minor regressions in 510.parest_r and 508.namd_r, analyzed here:
>
Richard Biener writes:
> On Wed, Jun 21, 2023 at 11:32 AM Richard Sandiford
> wrote:
>>
>> Richard Sandiford writes:
>> > Richard Biener via Gcc-patches writes:
>> >> On Fri, Jun 2, 2023 at 3:01 AM liuhongt via Gcc-patches
>> >> wrote:
>> >>>
>> >>> We have already use intermidate type in
Richard Biener writes:
> The issue in the PR the change is fixing is that we end up with
> an expression that overflows but uses signed arithmetic and so
> we miscompile it later. IIRC the fixes to split_constant_offset
> always were that the sum of the base + offset wasn't equal to
> the
Richard Sandiford writes:
> Richard Biener via Gcc-patches writes:
>> On Fri, Jun 2, 2023 at 3:01 AM liuhongt via Gcc-patches
>> wrote:
>>>
>>> We have already use intermidate type in case WIDEN, but not for NONE,
>>> this patch extended that.
>>>
>>> I didn't do that in pattern recog since we
Richard Biener via Gcc-patches writes:
> On Fri, Jun 2, 2023 at 3:01 AM liuhongt via Gcc-patches
> wrote:
>>
>> We have already use intermidate type in case WIDEN, but not for NONE,
>> this patch extended that.
>>
>> I didn't do that in pattern recog since we need to know whether the
>> stmt
Several gcc.target/aarch64/sve/pcs tests started failing after
6a2e8dcbbd4, because the tests weren't robust against whether
an indirect argument register or the stack pointer was used as
the base for stores.
The patch allows either base register when there is only one
indirect argument. It
The SVE handling of stack clash protection copied the stack
pointer to X11 before the probe and set up X11 as the CFA
for unwind purposes:
/* This is done to provide unwinding information for the stack
adjustments we're about to do, however to prevent the optimizers
from
Richard Biener writes:
> On Mon, 19 Jun 2023, Richard Sandiford wrote:
>
>> Jeff Law writes:
>> > On 6/16/23 06:34, Richard Biener via Gcc-patches wrote:
>> >> IVOPTs has strip_offset which suffers from the same issues regarding
>> >> integer overflow that split_constant_offset did but the
Tamar Christina writes:
> Hi All,
>
> define_cond_exec does not support the special @@ syntax
> and so can't support {@. As such just remove support
> for it.
>
> Bootstrapped and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> PR bootstrap/110324
> *
Jeff Law writes:
> On 6/16/23 06:34, Richard Biener via Gcc-patches wrote:
>> IVOPTs has strip_offset which suffers from the same issues regarding
>> integer overflow that split_constant_offset did but the latter was
>> fixed quite some time ago. The following implements strip_offset
>> in terms
Spot-tested on aarch64-linux-gnu, pushed as obvious.
Richard
gcc/
* tree-vect-loop-manip.cc (vect_set_loop_condition_partial_vectors):
Handle null niters_skip.
---
gcc/tree-vect-loop-manip.cc | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git
David Malcolm via Gcc-patches writes:
> Quoting "How a computer should talk to people" (as quoted
> in "Concepts Error Messages for Humans"):
>
> "Various negative tones or actions are unfriendly: being manipulative,
> not giving a second chance, talking down, using fashionable slang,
> blaming.
Andrew Stubbs writes:
> One
> comment: building a vector constant {0, 1, 2, 3, , 63} results in a
> very large entry in the constant pool and an unnecessary memory load (it
> literally has to use this sequence to generate the addresses to load the
> constant!) Generating the sequence via
Oluwatamilore Adebayo writes:
> From: oluade01
>
> This adds a recognition pattern for the non-widening
> absolute difference (ABD).
>
> gcc/ChangeLog:
>
> * doc/md.texi (sabd, uabd): Document them.
> * internal-fn.def (ABD): Use new optab.
> * optabs.def (sabd_optab,
Richard Sandiford writes:
>> +
>> + /* Skip any newlines or whitespaces needed. */
>> + while (ISSPACE(*templ))
>> + templ++;
>> + continue;
>> + }
>> + else if (templ[0] == '/' && templ[1] == '*')
>> + {
>> + templ += 2;
>> + /*
Richard Biener writes:
> On Wed, 14 Jun 2023, Richard Sandiford wrote:
>
>> Richard Biener via Gcc-patches writes:
>> > The function is only meaningful for LOOP_VINFO_MASKS processing so
>> > inline it into the single use.
>> >
>> > Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?
>> >
Tamar Christina writes:
> +The syntax rules are as follows:
> +@itemize @bullet
> +@item
> +Templates must start with @samp{@{@@} to use the new syntax.
> +
> +@item
> +@samp{@{@@} is followed by a layout in parentheses which is @samp{cons:}
s/parentheses/square brackets/
> +followed by a
Richard Biener via Gcc-patches writes:
> This implemens fully masked vectorization or a masked epilog for
> AVX512 style masks which single themselves out by representing
> each lane with a single bit and by using integer modes for the mask
> (both is much like GCN).
>
> AVX512 is also special in
Richard Biener via Gcc-patches writes:
> The function is only meaningful for LOOP_VINFO_MASKS processing so
> inline it into the single use.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?
>
> * tree-vect-loop.cc (vect_get_max_nscalars_per_iter): Inline
> into ...
>
Richard Biener via Gcc-patches writes:
> Currently vect_determine_partial_vectors_and_peeling will decide
> to apply fully masking to the main loop despite
> --param vect-partial-vector-usage=1 when the currently analyzed
> vector mode results in a vectorization factor that's bigger
> than the
Oluwatamilore Adebayo writes:
> From: oluade01
>
> This adds a recognition pattern for the non-widening
> absolute difference (ABD).
>
> gcc/ChangeLog:
>
> * doc/md.texi (sabd, uabd): Document them.
> * internal-fn.def (ABD): Use new optab.
> * optabs.def (sabd_optab,
Richard Biener writes:
> On Wed, 14 Jun 2023, Richard Sandiford wrote:
>
>> Richard Biener writes:
>> > On Wed, 14 Jun 2023, Richard Sandiford wrote:
>> >
>> >> Richard Biener writes:
>> >> > AFAIU this special instruction is only supposed to prevent
>> >> > code motion (of stack memory
Richard Biener writes:
> On Wed, 14 Jun 2023, Richard Sandiford wrote:
>
>> Richard Biener writes:
>> > AFAIU this special instruction is only supposed to prevent
>> > code motion (of stack memory accesses?) across this instruction?
>> > I'd say a
>> >
>> > (may_clobber (mem:BLK (reg:DI 1 1)))
Richard Biener writes:
> AFAIU this special instruction is only supposed to prevent
> code motion (of stack memory accesses?) across this instruction?
> I'd say a
>
> (may_clobber (mem:BLK (reg:DI 1 1)))
>
> might be more to the point? I've used "may_clobber" which doesn't
> exist since I'm
Jeff Law via Gcc-patches writes:
> On 6/9/23 04:41, juzhe.zh...@rivai.ai wrote:
>> @@ -4342,135 +4510,81 @@ pass_vsetvl::cleanup_insns (void) const
>> }
>> }
>>
>> +/* Return true if the SET result is not used by any instructions. */
>> +static bool
>> +has_no_uses (basic_block
Tejas Belagod writes:
> From: Tejas Belagod
>
> This PR optimizes an SVE intrinsics sequence where
> svlasta (svptrue_pat_b8 (SV_VL1), x)
> a scalar is selected based on a constant predicate and a variable vector.
> This sequence is optimized to return the correspoding element of a
Kyrylo Tkachov via Gcc-patches writes:
> Hi all,
>
> This patch implements RTL constant-folding for the SS_TRUNCATE and
> US_TRUNCATE codes.
> The semantics are a clamping operation on the argument with the min and max
> of the narrow mode,
> followed by a truncation. The signedness of the
"juzhe.zh...@rivai.ai" writes:
> Thanks, Richi.
>
> Should I wait for Richard ACK gain ?
> Since the last email of this patch, he just asked me to adjust comment no
> codes change.
> I am not sure whether he is ok.
Yeah, OK from my POV too, thanks.
Richard
Richard Biener writes:
> On Fri, Jun 9, 2023 at 11:45 AM Andrew Stubbs wrote:
>>
>> On 09/06/2023 10:02, Richard Sandiford wrote:
>> > Andrew Stubbs writes:
>> >> On 07/06/2023 20:42, Richard Sandiford wrote:
>> >>> I don't know if this helps (probably not), but we have a similar
>> >>>
Andrew Stubbs writes:
> On 07/06/2023 20:42, Richard Sandiford wrote:
>> I don't know if this helps (probably not), but we have a similar
>> situation on AArch64: a 64-bit mode like V8QI can be doubled to a
>> 128-bit vector or to a pair of 64-bit vectors. We used V16QI for
>> the former and
guojiufu writes:
> Hi,
>
> On 2023-06-09 16:00, Richard Biener wrote:
>> On Fri, 9 Jun 2023, Jiufu Guo wrote:
>>
>>> Hi,
>>>
>>> As checking the code, there is a "gcc_assert (SCALAR_INT_MODE_P
>>> (mode))"
>>> in "try_const_anchors".
>>> This assert seems correct because the function
In addition to Andreas's and Richard's comments:
Tamar Christina writes:
> +@item
> +@samp{@{@@} is followed by a layout in parentheses which is @samp{cons:}
> followed by
> +a list of @code{match_operand}/@code{match_scratch} comma operand numbers,
> then a
How about:
a comma-separated
Andrew Stubbs writes:
> On 30/05/2023 07:26, Richard Biener wrote:
>> On Fri, May 26, 2023 at 4:35 PM Andrew Stubbs wrote:
>>>
>>> Hi all,
>>>
>>> I want to implement a vector DIVMOD libfunc for amdgcn, but I can't just
>>> do it because the GCC middle-end models DIVMOD's return value as
>>>
"Andre Vieira (lists)" writes:
> Hi,
>
> This patch fixes an issue introduced by
> g:2f482a07365d9f4a94a56edd13b7f01b8f78b5a0, where a subtype was beeing
> passed to vect_widened_op_tree, when no subtype was to be used. This
> lead to an errorneous use of IFN_VEC_WIDEN_MINUS.
>
>
Alex Coplan writes:
> Hi,
>
> This patch series fixes various defects with the FEAT_LS64 ACLE
> implementation in the AArch64 backend.
>
> The series is organised as follows:
>
> - Patch 1/3 fixes whitespace errors in the existing code.
> - Patch 2/3 fixes PR110100 where we generate wrong code
Oluwatamilore Adebayo writes:
>> It would be good to mark all of these functions with __attribute__((noipa)),
>> since I think interprocedural optimisations might otherwise defeat the
>> runtime test in abd_run_1.c (in the sense that we might end up folding
>> things at compile time and not
Oluwatamilore Adebayo writes:
> From: oluade01
>
> This adds a recognition pattern for the non-widening
> absolute difference (ABD).
>
> gcc/ChangeLog:
>
> * doc/md.texi (sabd, uabd): Document them.
> * internal-fn.def (ABD): Use new optab.
> * optabs.def (sabd_optab,
Tamar Christina writes:
>> >int operand_number; /* Operand index in the big array. */
>> >int output_format; /* INSN_OUTPUT_FORMAT_*. */
>> > + bool compact_syntax_p;
>> >struct operand_data operand[MAX_MAX_OPERANDS]; };
>> >
>> > @@ -700,12 +702,57 @@
Richard Sandiford writes:
>> diff --git a/gcc/gensupport.h b/gcc/gensupport.h
>> index
>> a1edfbd71908b6244b40f801c6c01074de56777e..7925e22ed418767576567cad583bddf83c0846b1
>> 100644
>> --- a/gcc/gensupport.h
>> +++ b/gcc/gensupport.h
>> @@ -20,6 +20,7 @@ along with GCC; see the file COPYING3.
Looks good! Just some minor comments:
Tamar Christina writes:
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index
> 6a435eb44610960513e9739ac9ac1e8a27182c10..1437ab55b260ab5c876e92d59ba39d24bffc6276
> 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -27,6 +27,7 @@ See the next
"Roger Sayle" writes:
> This patch provides a wide-int implementation of bitreverse, that
> implements both of Richard Sandiford's suggestions from the review at
> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618215.html of an
> improved API (as a stand-alone function matching the bswap
"juzhe.zh...@rivai.ai" writes:
> Hi, Richard.
>
>>> No, I meant that the comment I quoted seemed to be saying that solution
>>> 3 wasn't possible. The comment seemed to say that we would need to do
>>> solution 1.
> I am so sorry that I didn't write the comments accurately.
> Could you help me
Richard Sandiford writes:
> "juzhe.zh...@rivai.ai" writes:
>> Hi, Richard. Thanks for the comments.
>>
If we use SELECT_VL to refer only to the target-independent ifn, I don't
see why this last bit is true.
>> Could you give me more details and information about this since I am not
>>
"juzhe.zh...@rivai.ai" writes:
> Hi, Richard. Thanks for the comments.
>
>>> If we use SELECT_VL to refer only to the target-independent ifn, I don't
>>> see why this last bit is true.
> Could you give me more details and information about this since I am not sure
> whether I catch up with you.
juzhe.zh...@rivai.ai writes:
> + /* If we're using decrement IV approach in loop control, we can use output
> of
> + SELECT_VL to adjust IV of loop control and data reference when it
> satisfies
> + the following checks:
> +
> + (a) SELECT_VL is supported by the target.
> + (b)
Sorry for the slow review.
I don't know the IV-related parts well enough to review those properly,
but they looked reasonable to me. Hopefully Richi can comment.
I'm curious though. For:
> + tree step = vect_dr_behavior (vinfo, dr_info)->step;
> +
> + [...]
> + poly_uint64 bytesize =
Just some very minor things.
"Andre Vieira (lists)" writes:
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index
> 5c9da73ea11f8060b18dcf513599c9694fa4f2ad..348bee35a35ae4ed9a8652f5349f430c2733e1cb
> 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -90,6 +90,71 @@
"juzhe.zh...@rivai.ai" writes:
> Thanks Richi. I am gonna merge it after Richard's final approve.
Thanks for checking, but no need to wait for a second ack from me!
Please go ahead and commit.
Richard
Christophe Lyon writes:
> After commit g:d8545fb2c71683f407bfd96706103297d4d6e27b, we missed a
> pattern to match the new GIMPLE form.
>
> With this patch, gcc.target/aarch64/rev16_2.c passes again.
>
> 2023-05-31 Christophe Lyon
>
> PR target/110039
> gcc/
> *
Richard Biener via Gcc-patches writes:
> On Fri, May 26, 2023 at 4:35 PM Andrew Stubbs wrote:
>>
>> Hi all,
>>
>> I want to implement a vector DIVMOD libfunc for amdgcn, but I can't just
>> do it because the GCC middle-end models DIVMOD's return value as
>> "complex int" type, and there are no
Christophe Lyon writes:
> On Wed, 31 May 2023 at 11:49, Richard Sandiford
> wrote:
>
>> Christophe Lyon writes:
>> > After commit g:d8545fb2c71683f407bfd96706103297d4d6e27b, we missed a
>> > pattern to match the new GIMPLE form.
>> >
>> > With this patch, gcc.target/aarch64/rev16_2.c passes
Christophe Lyon writes:
> After commit g:d8545fb2c71683f407bfd96706103297d4d6e27b, we missed a
> pattern to match the new GIMPLE form.
>
> With this patch, gcc.target/aarch64/rev16_2.c passes again.
>
> 2023-05-31 Christophe Lyon
>
> PR target/110039
> gcc/
> *
Richard Biener via Gcc-patches writes:
> On Wed, 31 May 2023, Richard Sandiford wrote:
>
>> Richard Biener writes:
>> > On Wed, 31 May 2023, juzhe.zh...@rivai.ai wrote:
>> >
>> >> Hi?all. I have posted my several investigations:
>> >>
Richard Biener writes:
> On Wed, 31 May 2023, juzhe.zh...@rivai.ai wrote:
>
>> Hi?all. I have posted my several investigations:
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620101.html
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620105.html
>>
"Roger Sayle" writes:
> This patch implements Richard Sandiford's suggestion from
> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618215.html
> that wi::bswap (and a new wi::bitreverse) should be functions,
> and ideally only accessors are member functions. This patch
> implements the first
Prathamesh Kulkarni writes:
> Hi Richard,
> The s32 case for single constant patch doesn't regress now after the
> above commit.
> Bootstrapped+tested on aarch64-linux-gnu, and verified that the new
> tests pass for aarch64_be-linux-gnu.
> Is it OK to commit ?
>
> Thanks,
> Prathamesh
>
>
"juzhe.zhong" writes:
> Maybe we can include rgroup number into select vl pattern?So that, I always
> use select vl pattern. In my backend, if it is single rgroup,we gen vsetvl,
> otherwise we gen min.
That just seems to be a way of hiding an “is the target RVV?” test though.
IMO targets
"juzhe.zh...@rivai.ai" writes:
> Before this patch:
> foo:
> ble a2,zero,.L5
> csrr a3,vlenb
> srli a4,a3,2
> .L3:
> minu a5,a2,a4
> vsetvli zero,a5,e32,m1,ta,ma
> vle32.v v2,0(a1)
> vle32.v v1,0(a0)
> vsetvli t1,zero,e32,m1,ta,ma
> vadd.vv v1,v1,v2
> vsetvli zero,a5,e32,m1,ta,ma
> vse32.v
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong
>
> Follow Richi's suggestion, I change current decrement IV flow from:
>
> do {
>remain -= MIN (vf, remain);
> } while (remain != 0);
>
> into:
>
> do {
>old_remain = remain;
>len = MIN (vf, remain);
>remain -= vf;
> } while
Richard Biener writes:
>> But how easy would it be to extend SCEV analysis, via a pattern match?
>> The evolution of the IV phi wrt the inner loop is still a normal SCEV.
>
> No, the IV isn't a normal SCEV, the final value is different.
Which part of the IV though? Won't all executions of the
My understanding was that we went into this knowing that the IVs
would defeat SCEV analysis. Apparently that wasn't a problem for RVV,
but it's not surprising that it is a problem in general.
This isn't just about SELECT_VL though. We use the same type of IV
for cases what aren't going to use
Kyrylo Tkachov via Gcc-patches writes:
> Hi all,
>
> This patch expresses the intrinsics for the SRA and RSRA instructions with
> standard RTL codes rather than relying on UNSPECs.
> These instructions perform a vector shift right plus accumulate with an
> optional rounding constant addition for
This looks good to me. Just a couple of very minor cosmetic things:
juzhe.zh...@rivai.ai writes:
> @@ -753,17 +846,35 @@ vect_set_loop_condition_partial_vectors (class loop
> *loop,
> continue;
> }
>
> - /* See whether zero-based IV would ever generate all-false masks
"juzhe.zh...@rivai.ai" writes:
> Hi, Richard. Thanks for the comments.
>
>>> if (!LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)
>>> || !iv_rgc
>>> || (iv_rgc->max_nscalars_per_iter * iv_rgc->factor
>>> != rgc->max_nscalars_per_iter * rgc->factor))
>>> {
> >> /* See
Jin Ma writes:
> When the last insn1 of BB1 and the first insn2 of BB2 are fusion, insn2 will
> clear all dependencies in the function chain_to_prev_insn, resulting in insn2
> may mov to any BB, and the program calculation result is wrong.
>
> gcc/ChangeLog:
>
> * sched-deps.cc
"Jin Ma" writes:
>> > On 5/17/23 03:03, Jin Ma wrote:
>> >> For example:
>> >> (define_insn "mov_lowpart_sidi2"
>> >>[(set (match_operand:SI0 "register_operand" "=r")
>> >> (subreg:SI (match_operand:DI 1 "register_operand" " r") 0))]
>> >>"TARGET_64BIT"
>> >>
Thanks, this looks functionally correct to me. And I agree it handles
the cases that previously needed multiplication.
But I think it regresses code quality when no multiplication was needed.
We can now generate duplicate IVs. Perhaps ivopts would remove the
duplicates, but it might be hard,
LGTM, just a couple of comment tweaks:
Prathamesh Kulkarni writes:
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index d6fc94015fa..db7ca4c28c3 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -22332,6 +22332,46 @@
I'll look at the samples tomorrow, but just to address one thing:
钟居哲 writes:
>>> What gives the best code in these cases? Is emitting a multiplication
>>> better? Or is using a new IV better?
> Could you give me more detail information about "new refresh IV" approach.
> I'd like to try that.
Prathamesh Kulkarni writes:
> On Wed, 24 May 2023 at 15:40, Richard Sandiford
> wrote:
>>
>> Prathamesh Kulkarni writes:
>> > On Mon, 22 May 2023 at 14:18, Richard Sandiford
>> > wrote:
>> >>
>> >> Prathamesh Kulkarni writes:
>> >> > Hi Richard,
>> >> > Thanks for the suggestions. Does the
钟居哲 writes:
> Oh. I see. Thank you so much for pointing this.
> Could you tell me what I should do in the codes?
> It seems that I should adjust it in
> vect_adjust_loop_lens_control
>
> muliply by some factor ? Is this correct multiply by max_nscalars_per_iter
> ?
max_nscalars_per_iter *
钟居哲 writes:
> Hi, Richard. I still don't understand it. Sorry about that.
>
>>> loop_len_48 = MIN_EXPR ;
> >> _74 = loop_len_34 * 2 - loop_len_48;
>
> I have the tests already tested.
> We have a MIN_EXPR to calculate the total elements:
> loop_len_34 = MIN_EXPR ;
> I think "8" is already
钟居哲 writes:
> Hi, the .optimized dump is like this:
>
>[local count: 21045336]:
> ivtmp.26_36 = (unsigned long)
> ivtmp.27_3 = (unsigned long)
> ivtmp.30_6 = (unsigned long) [(void *) + 16B];
> ivtmp.31_10 = (unsigned long) [(void *) + 32B];
> ivtmp.32_14 = (unsigned long)
Thanks for trying it. I'm still surprised that no multiplication
is needed though. Does the patch work for:
short x[100];
int y[200];
void f() {
for (int i = 0, j = 0; i < 100; i += 2, j += 4) {
x[i + 0] += 1;
x[i + 1] += 2;
y[j + 0] += 1;
y[j + 1] += 2;
y[j + 2] += 3;
钟居哲 writes:
>>> Both approaches are fine. I'm not against one or the other.
>
>>> What I didn't understand was why your patch only reuses existing IVs
>>> for max_nscalars_per_iter == 1. Was it to avoid having to do a
>>> multiplication (well, really a shift left) when moving from one
>>>
钟居哲 writes:
>>> In other words, why is this different from what
>>>vect_set_loop_controls_directly would do?
> Oh, I see. You are confused that why I do not make multiple-rgroup vec_trunk
> handling inside "vect_set_loop_controls_directly".
>
> Well. Frankly, I just replicate the handling of ARM
Sorry, I realised later that I had an implicit assumption here:
if there are multiple rgroups, it's better to have a single IV
for the smallest rgroup and scale that up to bigger rgroups.
E.g. if the loop control IV is taken from an N-control rgroup
and has a step S, an N*M-control rgroup would
Sorry for the slow review. I needed some time to go through this
patch and surrounding code to understand it, and to understand
why it wasn't structured the way I was expecting.
I've got some specific comments below, and then a general comment
about how I think we should structure this.
Prathamesh Kulkarni writes:
> On Mon, 22 May 2023 at 14:18, Richard Sandiford
> wrote:
>>
>> Prathamesh Kulkarni writes:
>> > Hi Richard,
>> > Thanks for the suggestions. Does the attached patch look OK ?
>> > Boostrap+test in progress on aarch64-linux-gnu.
>>
>> Like I say, please wait for the
Richard Biener writes:
> On Tue, May 23, 2023 at 5:05 PM wrote:
>>
>> From: Juzhe-Zhong
>>
>> This patch enable RVV auto-vectorization including floating-point
>> unorder and order comparison.
>>
>> The testcases are leveraged from Richard.
>> So include Richard as co-author.
>>
>>
Thanks for the update. Mostly LGTM, just some minor things left below.
Oluwatamilore Adebayo writes:
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index
> a49b09539776c0056e77f99b10365d0a8747fbc5..3a2248263cf67834a1cb41167a1783a3b6400014
> 100644
> ---
When I wrote early-remat, the DF_FORWARD block order was a postorder
of a reverse/backward walk (i.e. of the inverted cfg), rather than a
reverse postorder of a forward walk. A postorder of a backward walk
lacked the important property that dominators come before the blocks
they dominate; instead
Richard Biener writes:
> The x86 backend looks at the SLP node passed to the add_stmt_cost
> hook when costing vec_construct, looking for elements that require
> a move from a GPR to a vector register and cost that. But since
> vect_prologue_cost_for_slp decomposes the cost for an external
> SLP
Jeff Law via Gcc-patches writes:
> On 5/17/23 03:03, Jin Ma wrote:
>> For example:
>> (define_insn "mov_lowpart_sidi2"
>>[(set (match_operand:SI0 "register_operand" "=r")
>> (subreg:SI (match_operand:DI 1 "register_operand" " r") 0))]
>>"TARGET_64BIT"
>>
"juzhe.zh...@rivai.ai" writes:
> Yeah. I know.
> Like ARM does everywhere:
> (define_expand "vcond"
> [(set (match_operand:SVE_ALL 0 "register_operand")
> (if_then_else:SVE_ALL
> (match_operator 3 "comparison_operator"
> [(match_operand:SVE_I 4 "register_operand")
>
Richard Biener writes:
> On Tue, May 23, 2023 at 12:38 PM Richard Sandiford via Gcc-patches
> wrote:
>>
>> At -O2, and so with SLP vectorisation enabled:
>>
>> struct complx_t { float re, im; };
>> complx_t add(complx_t a, complx_t b) {
>&
In a follow-up patch, I wanted to use an int iterator to iterate
over various possible values of a const_int. But one problem
with int iterators was that there was no way of referring to the
current value of the iterator. This is unlike modes and codes,
which provide automatic "mode", "MODE",
At -O2, and so with SLP vectorisation enabled:
struct complx_t { float re, im; };
complx_t add(complx_t a, complx_t b) {
return {a.re + b.re, a.im + b.im};
}
generates:
fmovw3, s1
fmovx0, d0
fmovx1, d2
fmovw2, s3
bfi
Prathamesh Kulkarni writes:
> Hi Richard,
> Thanks for the suggestions. Does the attached patch look OK ?
> Boostrap+test in progress on aarch64-linux-gnu.
Like I say, please wait for the tests to complete before sending an RFA.
It saves a review cycle if the tests don't in fact pass.
> diff
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong
>
> Address comments from Richard that splits the patch of fixing multiple-rgroup
> handling of length counting elements.
>
> This patch is fixing issue of handling multiple-rgroup of length is counting
> elements
>
> Before this patch, multiple
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong
>
> Address comments from Richard that splits the patch of fixing multiple-rgroup
> handling of length counting elements.
>
> This patch is fixing issue of handling multiple-rgroup of length is counting
> elements
>
> Before this patch, multiple
"juzhe.zh...@rivai.ai" writes:
> Hi, Richard. Thanks for the comments.
>
> Would you mind telling me whether it is possible that we can make decrement
> IV support into GCC middle-end ?
>
> If yes, could you tell what I should do next for the patches since I am
> confused that it seems the
301 - 400 of 2292 matches
Mail list logo