On Thu, Jun 27, 2024 at 5:57 AM liuhongt wrote:
>
> > But rtx_cost invokes targetm.rtx_cost which allows to avoid that
> > recursive processing at any level. You're dealing with MEM [addr]
> > here, so why's rtx_cost (addr, Pmode, MEM, 0, speed) not always
> > the best way to deal with this?
On Thu, Jun 27, 2024 at 3:31 AM wrote:
>
> From: Pan Li
OK
> The zip benchmark of coremark-pro have one SAT_SUB like pattern but
> truncated as below:
>
> void test (uint16_t *x, unsigned b, unsigned n)
> {
> unsigned a = 0;
> register uint16_t *p = x;
>
> do {
> a = *--p;
> *p =
The previous fix breaks in the degenerate case when the discovered
last_stmt is equal to the first stmt in the block since then we
undo a required stmt advancement.
Bootstrap and regtest running on x86_64-unknown-linux-gnu.
PR tree-optimization/115652
* tree-vect-slp.cc
The following fixes the 2nd occurance of new_temp missed with the
previous fix.
Bootstrap and regtest running on x86_64-unknown-linux-gnu.
PR tree-optimization/115493
* tree-vect-loop.cc (vect_create_epilog_for_reduction): Use
first scalar result.
---
On Wed, Jun 26, 2024 at 4:58 PM Feng Xue OS wrote:
>
> Allow shift-by-induction for slp node, when it is single lane, which is
> aligned with the original loop-based handling.
OK.
Did you try whether we handle multiple lanes correctly? The simplest
case would be a loop
body with say
a[2*i]
On Wed, 26 Jun 2024, Tamar Christina wrote:
> > -Original Message-
> > From: Richard Biener
> > Sent: Wednesday, June 26, 2024 2:23 PM
> > To: Tamar Christina
> > Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> > Subject: Re: [PATCH]mi
On Mon, Jun 24, 2024 at 3:55 PM wrote:
>
> From: Pan Li
>
> The zip benchmark of coremark-pro have one SAT_SUB like pattern but
> truncated as below:
>
> void test (uint16_t *x, unsigned b, unsigned n)
> {
> unsigned a = 0;
> register uint16_t *p = x;
>
> do {
> a = *--p;
> *p =
On Wed, Jun 26, 2024 at 3:46 AM wrote:
>
> From: Pan Li
>
> This patch would like to add the middle-end presentation for the
> saturation truncation. Aka set the result of truncated value to
> the max value when overflow. It will take the pattern similar
> as below.
>
> Form 1:
> #define
ion pattern
> + won't hit on the pattern statement. */
> + cmp_ls = build_mask_conversion (vinfo, var, gs_vectype, stmt_vinfo);
Isn't this somewhat redundant with the below call?
I fear of bad [non-]interactions with bool pattern recognition btw.
> +}
> +
> + tree mask = vect_conve
On Wed, Jun 26, 2024 at 2:28 PM Aleksandar Rakic
wrote:
>
> Hi!
>
> I'd like to ping the following patch:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647966.html
> a patch for the computation of the complexity for the unsupported
> addressing modes in ivopts
The thread starting
The following fixes wrong-code when using outer loop vectorization
and an inner loop SLP access with permutation. A wrong adjustment
to the IV increment is then applied on GCN.
Bootstrap and regtest running on x86_64-unknown-linux-gnu.
PR tree-optimization/115640
*
The following adjusts how SLP computes the insertion location. In
particular it advanced the insert iterator of the found last_stmt.
The vectorizer will later insert stmts _before_ it. But we also
have the constraint that possibly masked ops may not be scheduled
outside of the loop and as we do
On Wed, Jun 26, 2024 at 9:14 AM Hongtao Liu wrote:
>
> On Wed, Jun 26, 2024 at 2:52 PM Richard Biener
> wrote:
> >
> > On Wed, Jun 26, 2024 at 8:09 AM liuhongt wrote:
> > >
> > > 416.gamess regressed 4-6% on x86_64 since my r15-882-g1d6199e5f8c1c0.
On Wed, Jun 26, 2024 at 8:09 AM liuhongt wrote:
>
> 416.gamess regressed 4-6% on x86_64 since my r15-882-g1d6199e5f8c1c0.
> The commit adjust rtx_cost of mem to reduce cost of (add op0 disp).
> But Cost of ADDR could be cheaper than XEXP (addr, 0) when it's a lea.
> It is the case in the PR, the
The following makes analysis and transform agree on constraints.
Bootstrap and regtest pending on x86_64-unknown-linux-gnu.
PR tree-optimization/115646
* tree-call-cdce.cc (check_pow): Check for bit_sz values
as allowed by transform.
* gcc.dg/pr115646.c: New
The following replaces conditional is_export_p calls as is_export_p
handles a NULL bb itself.
Bootstrap running on x86_64-unknown-linux-gnu, OK?
Thanks,
Richard.
* gimple-range-gori.cc (gori_compute::may_recompute_p):
Call is_export_p with NULL bb.
---
gcc/gimple-range-gori.cc
The following fixes a missed tail-merging observed for the testcase
in PR115629. The issue is that when deps_ok_for_redirect doesn't
compute both would be valid prevailing blocks it rejects the merge.
The following instead makes sure to record the working block as
prevailing. Also stmt
mode))
> + return false;
> + }
> + code1 = float_expr_p ? (tree_code) code : NOP_EXPR;
> + code2 = float_expr_p ? NOP_EXPR : (tree_code) code;
> + opt_scalar_mode mode_iter;
> + FOR_EACH_2XWIDER_MODE (mode_iter, intermediate_mode)
> + {
> +
On Mon, 24 Jun 2024, Tamar Christina wrote:
>
>
> > -Original Message-
> > From: Richard Biener
> > Sent: Thursday, June 20, 2024 8:49 AM
> > To: Tamar Christina
> > Cc: gcc-patches@gcc.gnu.org; nd ; bin.ch...@linux.alibaba.com
> > Subject:
On Mon, Jun 24, 2024 at 1:28 AM liuhongt wrote:
>
> > I think the check for TYPE_UNSIGNED should be of TREE_TYPE (@0) rather
> > than type here.
>
> Changed
>
> > Or maybe you need `types_match (type, TREE_TYPE (@0))` too.
> And use tree_nop_conversion_p (type, TREE_TYPE (@0)) and add
On Tue, 25 Jun 2024, Thomas Schwinge wrote:
> Hi!
>
> On 2024-06-14T11:08:15+0200, Richard Biener wrote:
> > We can at least mimic single def-use cycle optimization when doing
> > single-lane SLP reductions and that's required to avoid regressing
> > compared to no
On Mon, Jun 24, 2024 at 9:38 PM Segher Boessenkool
wrote:
>
> I didn't see this before. Sigh.
>
> On Tue, Jan 02, 2024 at 09:47:11AM +, Richard Sandiford wrote:
> > Segher Boessenkool writes:
> > > On Tue, Oct 24, 2023 at 07:49:10PM +0100, Richard Sandiford wrote:
> > >> This patch adds a
On Tue, Jun 25, 2024 at 11:32 AM Feng Xue OS
wrote:
>
> >>
> >> >> - if (slp_node)
> >> >> + if (slp_node && SLP_TREE_LANES (slp_node) > 1)
> >> >
> >> > Hmm, that looks wrong. It looks like SLP_TREE_NUMBER_OF_VEC_STMTS is off
> >> > instead, which is bad.
> >> >
> >> >>
t; >
> >> + for (unsigned i = 0; i < op.num_ops - 1; i++)
> >> + {
> >> + gcc_assert (vec_oprnds[i].length () == using_ncopies);
> >> + vec_oprnds[i].safe_grow_cleared (reduc_ncopies);
> >> + }
> >> +}
> >
On Thu, 20 Jun 2024, Hu, Lin1 wrote:
> > >else if (ret_elt_bits > arg_elt_bits)
> > > modifier = WIDEN;
> > >
> > > + if (supportable_convert_operation (code, ret_type, arg_type, ))
> > > +{
> > > + g = gimple_build_assign (lhs, code1, arg);
> > > + gsi_replace (gsi, g,
On Sat, Jun 22, 2024 at 12:26 AM David Malcolm wrote:
>
> PR analyzer/115564 reports a missing warning from the analyzer
> on this infinite loop at -O2 and above:
>
> void test (unsigned b)
> {
>for (unsigned i = b; i >= 0; --i) {}
> }
>
> The issue is that there are no useful location_t
ed from outside
> this file, and guaranteeing that it is dominated by stmt_can_throw_internal
> checking.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures. Ok for mainline?
OK.
Th
On Mon, Jun 24, 2024 at 1:34 PM Richard Sandiford
wrote:
>
> Richard Biener writes:
> > On Mon, Jun 24, 2024 at 10:03 AM Richard Sandiford
> > wrote:
> >>
> >> Richard Biener writes:
> >> > On Sat, Jun 22, 2024 at 6:50 PM Richard Sandifo
The following prevents SLP CSE to create new cycles which happened
because of a 1:1 permute node being present where its child was then
CSEd to the permute node. Fixed by making a node only available to
CSE to after recursing.
Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
On Mon, Jun 24, 2024 at 1:18 PM Malladi, Rama wrote:
>
> From: Rama Malladi
Hmm, if we offer the ability to set -O3 inline limits why wouldn't we
offer a way to set -O2 inline limits for example with -O3? So ... wouldn't
a -finline-limit={default,O2,O3} option be a more generic and
extensible
On Mon, Jun 24, 2024 at 10:03 AM Richard Sandiford
wrote:
>
> Richard Biener writes:
> > On Sat, Jun 22, 2024 at 6:50 PM Richard Sandiford
> >> The traditional (and IMO correct) way to handle this is to make the
> >> pattern reserve the temporary
On Mon, Jun 24, 2024 at 3:39 AM HAO CHEN GUI wrote:
>
> Hi,
> Gently ping it.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653001.html
OK
> Thanks
> Gui Haochen
>
> 在 2024/6/17 13:30, HAO CHEN GUI 写道:
> > Hi,
> > Gently ping it.
> >
On Mon, Jun 24, 2024 at 3:38 AM HAO CHEN GUI wrote:
>
> Hi,
> Gently ping it.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652991.html
OK
> Thanks
> Gui Haochen
>
> 在 2024/6/17 13:29, HAO CHEN GUI 写道:
> > Hi,
> > Gently ping it.
> >
The compare_repeat_factors comparator fails qsort checking eventually
because it uses rf2->rank - rf1->rank to compare unsigned numbers
which causes issues for ranks that interpret negative as signed.
Fixed by re-writing the obvious way. I've also fixed the count
comparison which suffers from
The following makes sure to always CSE when there's SLP_TREE_SCALAR_STMTS
as otherwise a chain of two-operator node operations can result in
exponential behavior of the CSE process as likely seen when building
510.parest on aarch64.
Bootstrap and regtest running on x86_64-unknown-linux-gnu.
On Sat, Jun 22, 2024 at 6:50 PM Richard Sandiford
wrote:
>
> Takayuki 'January June' Suwa writes:
> > On 2024/06/20 22:34, Richard Sandiford wrote:
> >> This patch adds a combine pass that runs late in the pipeline.
> >> There are two instances: one between combine and split1, and one
> >> after
The recent change to relax store motion for variables that cannot have
store data races broke the optimization to share flag vars for stores
that all happen in the same single BB. The following fixes this.
Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
Richard.
PR
0x1919f69 execute_function_todo
>
> /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/passes.cc:2096
> 0x1918b46 do_per_function
>
> /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/passes.cc:1688
> 0x191a116 execute_todo
>
>
On Fri, Jun 21, 2024 at 3:02 PM Andrew MacLeod wrote:
>
> This patch adds
>
> --param=vrp-block-limit=N
>
> When the basic block counter for a function exceeded 'N' , VRP is
> invoked with the new fast_vrp algorithm instead. This algorithm uses a
> lot less memory and processing power,
For outer loop vectorization of a data reference in the inner loop
we have to look at both steps to see if they preserve alignment.
What is special for this testcase is that the outer loop step is
one element but the inner loop step four and that we now use SLP
and the vectorization factor is
On Thu, 20 Jun 2024, Richard Sandiford wrote:
> Richard Biener writes:
> > On Mon, 17 Jun 2024, Richard Sandiford wrote:
> >
> >> Richard Biener writes:
> >> > On Fri, 14 Jun 2024, Richard Biener wrote:
> >> >
> >> >> On Fri, 1
to do is simply to add a conversion stmt to the pattern sequence in case
the types differ?
But maybe I'm missing something.
Richard.
> Pan
>
> -Original Message-
> From: Richard Biener
> Sent: Friday, June 21, 2024 3:00 PM
> To: Li, Pan2
> Cc: gcc-patches@gcc.
On Fri, Jun 21, 2024 at 10:21 AM Richard Sandiford
wrote:
>
> Richard Biener writes:
> > [...]
> > I wonder if you can amend doc/passes.texi, specifically noting differences
> > between fwprop, combine and late-combine?
>
> Ooh, we have a doc/passes.texi? :) Some
On Fri, Jun 21, 2024 at 9:12 AM Eikansh Gupta wrote:
>
> We can optimize (vec_cond eq/ne vec_cond) when vec_cond is a
> result of (vec CMP vec). The optimization is because of the
> observation that in vec_cond, (-1 != 0) is true. So, we can
> generate vec_cond of xor of vec resulting in a single
On Fri, Jun 21, 2024 at 5:53 AM wrote:
>
> From: Pan Li
>
> The zip benchmark of coremark-pro have one SAT_SUB like pattern but
> truncated as below:
>
> void test (uint16_t *x, unsigned b, unsigned n)
> {
> unsigned a = 0;
> register uint16_t *p = x;
>
> do {
> a = *--p;
> *p =
On Fri, Jun 21, 2024 at 5:11 AM Andrew Pinski wrote:
>
> On Thu, Jun 20, 2024 at 7:56 PM liuhongt wrote:
> >
> > Try to optimize x < 0 ? -1 : 0 into (signed) x >> 31
> > and x < 0 ? 1 : 0 into (unsigned) x >> 31.
> >
> > Move the optimization did in ix86_expand_int_vcond to match.pd
> >
> >
This applies some maintainance to passes.texi by removing references
to no longer existing passes. It also fixes a few minor things but
doesn't fill the gaps that meanwhile exist.
make pdf tested, pushed.
* doc/passes.texi: Remove references to no longer existing
passes.
---
On Thu, Jun 20, 2024 at 3:37 PM Richard Sandiford
wrote:
>
> This patch adds a combine pass that runs late in the pipeline.
> There are two instances: one between combine and split1, and one
> after postreload.
>
> The pass currently has a single objective: remove definitions by
> substituting
> Am 21.06.2024 um 04:35 schrieb Andrew Pinski :
>
> When PAREN_EXPR tree code was added in r0-85884-gdedd42d511b6e4,
> a simplified handling was added to complex lowering. Which means
> we would get:
> ```
> _9 = COMPLEX_EXPR <_15, _14>;
> _11 = ((_9));
> _19 = REALPART_EXPR <_11>;
> _20
> Am 20.06.2024 um 17:40 schrieb Stefan Schulze Frielinghaus
> :
>
> On Thu, Jun 20, 2024 at 09:06:11AM +0200, Juergen Christ wrote:
>> Some casts were missing leading to missed of bad vectorizations where
>> casting was done scalar followed by a vector creation from the
>> individual
> Am 20.06.2024 um 16:05 schrieb Andrew MacLeod :
>
>
>> On 6/20/24 05:31, Richard Biener wrote:
>>> On Thu, 20 Jun 2024, Aldy Hernandez wrote:
>>>
>>> Hi.
>>>
>>> I came around to this, and whipped up the proposed patch. Howeve
On Thu, Jun 20, 2024 at 1:32 PM Georg-Johann Lay wrote:
>
> cc0 has been removed long ago, removed mentions.
OK
> Johann
>
> diff --git a/htdocs/simtest-howto.html b/htdocs/simtest-howto.html
> index ea69c9ed..f18a78f6 100644
> --- a/htdocs/simtest-howto.html
> +++ b/htdocs/simtest-howto.html
>
te lock. No other lock can be held on this lockfile.
> + Blocking call. */
> + int lock_write ();
> +
> + /* Unique write lock. No other lock can be held on this lockfile.
> + Only locks if this filelock is not locked by any other process.
> + Return whether locking was successful. */
> + int try_lock_write ();
> +
> + /* Shared read lock. Only read lock can be held concurrently.
> + If write lock is already held by this process, it will be
> + changed to read lock.
> + Blocking call. */
> + int lock_read ();
> +
> + /* Unlock all previously placed locks. */
> + void unlock ();
> +
> + /* Returns whether any lock is held. */
> + bool
> + locked ()
> + {
> +return fd < 0;
> + }
> +
> + /* Are lockfiles supported? */
> + static bool lockfile_supported ();
> +private:
> + std::string filename;
> + int fd;
> +};
> +
> +#endif
>
--
Richard Biener
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
On Sun, Jun 16, 2024 at 9:31 AM Feng Xue OS wrote:
>
> For lane-reducing operation(dot-prod/widen-sum/sad) in loop reduction, current
> vectorizer could only handle the pattern if the reduction chain does not
> contain other operation, no matter the other is normal or lane-reducing.
>
> Actually,
short>(short_c0_lo, short_c1_lo, sum_v0);
> sum_v1 = dot_prod<8 * short>(short_c0_hi, short_c1_hi, sum_v1);
> }
>
> For this purpose, we need to track the vectype_in that results in
> the most ncopies, for this case, the type is <8 * short>.
So the VF
On Thu, Jun 20, 2024 at 12:57 PM Maciej W. Rozycki wrote:
>
> On Thu, 20 Jun 2024, YunQiang Su wrote:
>
> > The DIV instructions of MIPS won't be trapped themself if the divisor
> > is zero. The compiler will emit a conditional trap instruct for it.
> > So the signal will be SIGTRAP instead of
Status
==
The gcc-11 branch nears its retirement with the last release from it,
GCC 11.5, on the horizon.
Please look through bugzilla and see which of your regression fixes
for GCC 12 are also applicable for the GCC 11 branch and do the
necessary backporting. Please error on the safe side
obstack is released after each pass.
But ranger instances are also not expected to be created multiple
times each pass, right?
I don't have a strong opinion.
Richard.
> Aldy
>
> On Mon, Apr 8, 2024 at 7:47 PM Richard Biener
> wrote:
> >
> >
> >
> > > Am 08.04.2
Status
==
GCC 12.4 has been released and the branch is again open for regression
and documentation fixes.
Quality Data
Priority#Change from last report
------
P1 0
P2588- 31
P3 76- 1
On Wed, 19 Jun 2024, Tamar Christina wrote:
> > -Original Message-
> > From: Richard Biener
> > Sent: Wednesday, June 19, 2024 1:14 PM
> > To: Tamar Christina
> > Cc: gcc-patches@gcc.gnu.org; nd ; bin.ch...@linux.alibaba.com
> > Subject: Re:
On Wed, 19 Jun 2024, Tamar Christina wrote:
> > -Original Message-
> > From: Richard Biener
> > Sent: Wednesday, June 19, 2024 12:55 PM
> > To: Tamar Christina
> > Cc: gcc-patches@gcc.gnu.org; nd ; bin.ch...@linux.alibaba.com
> > Subject: Re: [
th this pattern, it
> requires vectors and it fails only on targets where there is no vector
> support enabled.
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95906
>
> Thanks,
> Andrew Pinski
>
> >
> >
> > Kind regards,
> > Vasee
> >
> >
> Am 19.06.2024 um 20:44 schrieb Jakub Jelinek :
>
> Hi!
>
> We don't really support _Complex _BitInt(N), the only place we use
> bitint complex types is for the .{ADD,SUB,MUL}_OVERFLOW internal function
> results and COMPLEX_EXPR in the usual case should be either not present
> yet because
> Am 19.06.2024 um 20:25 schrieb Toon Moene :
>
> On 6/17/24 16:05, Richard Biener wrote:
>
>> Automatic arrays that are not address-taken should not be subject to
>> store data races. This applies to OMP SIMD in-branch lowered
>> functions result array wh
On Wed, 19 Jun 2024, Richard Sandiford wrote:
> Richard Biener writes:
> > We currently fail to re-CSE SLP nodes after optimizing permutes
> > which results in off cost estimates. For gcc.dg/vect/bb-slp-32.c
> > this shows in not re-using the SLP node with the load and ar
On Sun, Jun 16, 2024 at 9:27 AM Feng Xue OS wrote:
>
> It's better to place 3 relevant independent variables into array, since we
> have requirement to access them via an index in the following patch. At the
> same time, this change may get some duplicated code be more compact.
OK. I might have
On Sun, Jun 16, 2024 at 9:28 AM Feng Xue OS wrote:
>
> According to logic of code nearby the assertion, all lane-reducing operations
> should not appear, not just DOT_PROD_EXPR. Since "use_mask_by_cond_expr_p"
> treats SAD_EXPR same as DOT_PROD_EXPR, and WIDEN_SUM_EXPR should not be
> allowed
>
On Sun, Jun 16, 2024 at 9:25 AM Feng Xue OS wrote:
>
> The input vectype of reduction PHI statement must be determined before
> vect cost computation for the reduction. Since lance-reducing operation has
> different input vectype from normal one, so we need to traverse all reduction
> statements
On Sun, Jun 16, 2024 at 9:23 AM Feng Xue OS wrote:
>
> Two local variables were defined to refer same STMT_VINFO_REDUC_TYPE, better
> to keep only one.
OK.
Richard.
> Thanks,
> Feng
>
> ---
> gcc/
> * tree-vect-loop.cc (vectorizable_reduction): Remove v_reduc_type, and
>
On Sun, Jun 16, 2024 at 9:22 AM Feng Xue OS wrote:
>
> In vectorizable_reduction, one check on a reduction operand via index could be
> contained by another one check via pointer, so remove the former.
OK.
Thanks,
Richard.
> Thanks,
> Feng
>
> ---
> gcc/
> * tree-vect-loop.cc
On Sun, Jun 16, 2024 at 9:21 AM Feng Xue OS wrote:
>
> The series of patches are meant to support multiple lane-reducing reduction
> statements. Since the original ones conflicted with the new single-lane slp
> node patches, I have reworked most of the patches, and split them as small as
>
We currently fail to re-CSE SLP nodes after optimizing permutes
which results in off cost estimates. For gcc.dg/vect/bb-slp-32.c
this shows in not re-using the SLP node with the load and arithmetic
for both the store and the reduction. The following implements
CSE by re-bst-mapping nodes as
t;original" iv->base
to be used for code generation (and there only the unexpanded form)
and a variant used for the various sorts of canonicalization/compare
(I see we eventually add/subtract step and then compare against
sth else). And then apply this normalization always to the not
"original" form.
The above STRIP_NOPS (expr) + expand might turn an unsigned
affine combination into a signed one which might be problematic.
So what happens if you change the above to simply always
unsigned expand?
Richard.
>iv->base = base;
>iv->base_object = determine_base_object (data, base);
>iv->step = step;
>
>
>
>
>
--
Richard Biener
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
iv_step, use->iv->step)
> + && affine_compare_eq (iv_addr_base, use->addr_base))
There's only this use of addr_base so I think the opportunity is to
turn iv_use->addr_base into aff_tree (even though that's a quite big
representation).
For the testcase, what are the two IVs we are comparing? I wonder
why you need the affine compare for iv->step?
> break;
> }
>if (i == data->vgroups.length ())
> @@ -2231,6 +2248,14 @@ constant_multiple_of (tree top, tree bot, widest_int
> *mul)
>return true;
> }
>
> + aff_tree aff_top, aff_bot;
> + tree_to_aff_combination (top, TREE_TYPE (top), _top);
> + tree_to_aff_combination (bot, TREE_TYPE (bot), _bot);
> + poly_widest_int poly_mul;
> + if (aff_combination_constant_multiple_p (_top, _bot, _mul)
> + && poly_mul.is_constant (mul))
> +return true;
> +
So why does stripping nops not work here?
>code = TREE_CODE (top);
>switch (code)
> {
>
>
>
>
>
--
Richard Biener
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
The following adds a correctness check to the combined store/reduce
vectorization.
Tested on x86_64-unknown-linux-gnu, pushed.
* gcc.dg/vect/bb-slp-32.c: Add check for correctness.
---
gcc/testsuite/gcc.dg/vect/bb-slp-32.c | 26 --
1 file changed, 20
vectorize_conversion?).
> Would like to get some hint from you before the next step, thanks a lot.
>
> patt_34 = .SAT_SUB (m_11, wsize_12(D));
> patt_35 = (vector([8,8]) short unsigned int) patt_34;
>
> Pan
>
> -Original Message-
> From: Richard Biener
>
On Wed, 19 Jun 2024, Martin Uecker wrote:
> Am Mittwoch, dem 19.06.2024 um 08:57 +0200 schrieb Richard Biener:
> > On Wed, 19 Jun 2024, Martin Uecker wrote:
> >
> > > Am Mittwoch, dem 19.06.2024 um 08:04 +0200 schrieb Richard Biener:
> > > >
> > &
On Wed, 19 Jun 2024, Jakub Jelinek wrote:
> On Wed, Jun 19, 2024 at 08:04:55AM +0200, Richard Biener wrote:
> > >> Note a canonical type should always be unqualified (for
> > >> classical qualifiers, not address space or atomic qualification)
> > >
>
On Wed, 19 Jun 2024, Martin Uecker wrote:
> Am Mittwoch, dem 19.06.2024 um 08:04 +0200 schrieb Richard Biener:
> >
> > > Am 18.06.2024 um 20:18 schrieb Martin Uecker :
> > >
> > > Am Dienstag, dem 18.06.2024 um 17:27 +0200 schrieb Richard Biener:
> &g
> Am 18.06.2024 um 20:18 schrieb Martin Uecker :
>
> Am Dienstag, dem 18.06.2024 um 17:27 +0200 schrieb Richard Biener:
>>
>>>> Am 18.06.2024 um 17:20 schrieb Martin Uecker :
>>>
>>>
>>> As discussed this replaces the use of check
> Am 18.06.2024 um 17:20 schrieb Martin Uecker :
>
>
> As discussed this replaces the use of check_qualified_type with
> a simple check for qualifiers as suggested by Jakub in
> c_update_type_canonical.
Note a canonical type should always be unqualified (for classical qualifiers,
not
The condition rejecting "multiple-type" SLP condition reduction lacks
handling EXTRACT_LAST reductions.
Bootstrap and regtest in progress on x86_64-unknown-linux-gnu.
Richard.
PR tree-optimization/115537
* tree-vect-loop.cc (vectorizable_reduction): Also reject
SLP
diate_mode) > target_size)
> + break;
> +
> + scalar_mode cvt_mode;
> + if (!int_mode_for_size
> + (GET_MODE_BITSIZE (intermediate_mode), 0).exists (_mode))
> + break;
> +
> + cvt_type = build_nonstandard_integer_type
On Mon, Jun 17, 2024 at 11:55 AM Richard Sandiford
wrote:
>
> This series expands on the fix for PR115464 by using force_subreg
> in more places. It also adds some convenience wrappers for lowpart
> and highpart subregs.
>
> A part of this will need to be backported after a grace period,
> but
On Mon, Jun 17, 2024 at 9:07 AM wrote:
>
> From: Pan Li
>
> We missed one match pattern for the unsigned scalar .SAT_SUB, aka
> form 11.
>
> Form 11:
> #define SAT_SUB_U_11(T) \
> T sat_sub_u_11_##T (T x, T y) \
> { \
> T ret; \
> bool overflow = __builtin_sub_overflow (x, y, ); \
On Mon, Jun 17, 2024 at 3:41 AM wrote:
>
> From: Pan Li
>
> When investigate the vectorization of .SAT_ADD, we notice there
> are additional 2 forms, aka form 7 and 8 for .SAT_ADD.
>
> Form 7:
> #define DEF_SAT_U_ADD_FMT_7(T) \
> T __attribute__((noinline)) \
>
On Tue, Jun 18, 2024 at 10:35 AM Sam James wrote:
>
> YunQiang Su writes:
>
> > OK for trunk?
>
> It looks good to me, but I can't approve. (I'd dare say it's obvious,
> even.)
>
> Richard, any chance you could give it a quick ack?
OK
On Tue, Jun 18, 2024 at 2:11 AM David Malcolm wrote:
>
> Be explicit when we use "cfun".
>
> No functional change intended.
>
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
>
> OK for trunk?
>
> gcc/ChangeLog:
> * dominance.cc (compute_dom_fast_query): Replace uses of
>
On Mon, 17 Jun 2024, Richard Sandiford wrote:
> Richard Biener writes:
> > On Fri, 14 Jun 2024, Richard Biener wrote:
> >
> >> On Fri, 14 Jun 2024, Richard Sandiford wrote:
> >>
> >> > Richard Biener writes:
> >> > > On Fri, 14 J
> > --git a/gcc/testsuite/gcc.target/aarch64/fix_trunc2.c
> > > b/gcc/testsuite/gcc.target/aarch64/fix_trunc2.c
> > > new file mode 100644
> > > index 000..57cc00913a3
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/aarch64/fix_trunc2.c
> > > @@ -0,0 +1,14 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2" } */
> > > +
> > > +void
> > > +f (short *__restrict a, float *__restrict b) {
> > > + a[0] = b[0];
> > > + a[1] = b[1];
> > > + a[2] = b[2];
> > > + a[3] = b[3];
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-times {fcvtzs\tv[0-9]+.4s, v[0-9]+.4s}
> > > +1 } } */
> > > +/* { dg-final { scan-assembler-times {xtn\tv[0-9]+.4h, v[0-9]+.4s} 1
> > > +} } */
>
--
Richard Biener
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
Automatic arrays that are not address-taken should not be subject to
store data races. This applies to OMP SIMD in-branch lowered
functions result array which for the testcase otherwise prevents
vectorization with SSE and for AVX and AVX512 ends up with spurious
.MASK_STORE to the stack
The following fixes a bad final value being used when doing single-lane
SLP integer induction cond reduction vectorization.
Bootstrap and regtest running on x86_64-unknown-linux-gnu.
PR tree-optimization/115493
* tree-vect-loop.cc (vect_create_epilog_for_reduction): Use
When there's a permute after an extern vector we can run into a case
that didn't consider the scheduled node being a permute which lacks
a representative.
Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
PR tree-optimization/115508
* tree-vect-slp.cc
On Fri, Jun 14, 2024 at 9:20 PM Andrew MacLeod wrote:
>
> gimple_range_fold makes an assumption that if there is a LHS on a call
> that it is an ssa_name. Especially later in compilation that may not be
> true.
It's always true if the LHS is of register type (is_gimple_reg_type) and
never true
On Mon, 17 Jun 2024, Kewen.Lin wrote:
> Hi Richi,
>
> on 2024/6/14 18:31, Richard Biener wrote:
> > The following retires vcond{,u,eq} optabs by stopping to use them
> > from the middle-end. Targets instead (should) implement vcond_mask
> > and vec_cmp{,u,eq} optabs.
On Fri, 14 Jun 2024, Andrew Pinski wrote:
> On Fri, Jun 14, 2024 at 5:54 AM Richard Biener wrote:
> >
> > Automatic arrays that are not address-taken should not be subject to
> > store data races.
>
> That seems conservative enough. Though I would think if the array
&
nts here?
It definitely looks like a latent issue being triggered. Either in LRA
or in how the target presents itself.
Richard.
> Pan
>
> -----Original Message-
> From: Richard Biener
> Sent: Wednesday, May 15, 2024 5:39 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [PAT
Automatic arrays that are not address-taken should not be subject to
store data races. This applies to OMP SIMD in-branch lowered
functions result array which for the testcase otherwise prevents
vectorization with SSE and for AVX and AVX512 ends up with spurious
.MASK_STORE to the stack
On Fri, 14 Jun 2024, Richard Biener wrote:
> On Fri, 14 Jun 2024, Richard Sandiford wrote:
>
> > Richard Biener writes:
> > > On Fri, 14 Jun 2024, Richard Sandiford wrote:
> > >
> > >> Richard Biener writes:
> > >> > The foll
On Fri, 14 Jun 2024, Richard Sandiford wrote:
> Richard Biener writes:
> > On Fri, 14 Jun 2024, Richard Sandiford wrote:
> >
> >> Richard Biener writes:
> >> > The following retires vcond{,u,eq} optabs by stopping to use them
> >> > from the
301 - 400 of 25849 matches
Mail list logo