The following fixes the code generation difference when using
a typedef for the scalar type. The issue is using a pointer
equality test for an INTEGER_CST which fails when the types
are different variants.
Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
PR tree-optimization/
When we move a store out of an inner loop and remove a clobber in
the process, analysis of the inner loop can run into the clobber
via the meta-data and crash when accessing its basic-block. The
following avoids this by clearing the VDEF which is how it identifies
already processed stores.
Bootst
The following addresses a behavioral difference in vector type
analysis for typedef vs. non-typedef. It doesn't fix the issue
at hand but avoids a spurious difference in the dumps.
Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
PR tree-optimization/116081
* tree-vec
There's a FIXME comment in the PTA constraint solver that the vector
of complex constraints can get unsorted which can lead to duplicate
entries piling up during node unification. The following fixes this
with the assumption that delayed updates to constraints are uncommon
(otherwise re-sorting th
The following fixes an issue with CCPs likely_value when faced with
a vector CTOR containing undef SSA names and constants. This should
be classified as CONSTANT and not UNDEFINED.
Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
PR tree-optimization/116057
* tree-ssa
On Fri, Jul 19, 2024 at 7:19 PM Eikansh Gupta wrote:
>
> Min and max could be optimized if both operands are defined by
> (same) variable restricted by an and(&). For signed types,
> optimization can be done when both constant have same sign bit.
> The patch also adds optimization for specific cas
On Wed, Jul 24, 2024 at 1:31 AM Edwin Lu wrote:
>
>
> On 7/23/2024 11:20 AM, Richard Sandiford wrote:
> > Edwin Lu writes:
> >> On 7/23/2024 4:56 AM, Richard Biener wrote:
> >>> On Tue, Jul 23, 2024 at 1:03 AM Edwin Lu wrote:
> >>>> Hi Richard,
On Wed, Jul 24, 2024 at 9:38 AM Kewen.Lin wrote:
>
> Hi Andrew,
>
> on 2024/7/24 10:49, Andrew Pinski wrote:
> > When I was trying to add an scalar version of iorc and andc, the optab that
> > got matched was for and/ior with the mode of csi and cdi instead of iorc and
> > andc optabs for si and d
On Fri, Jul 19, 2024 at 1:10 PM wrote:
>
> From: Pan Li
>
> The direct_internal_fn_supported_p has no restrictions for the type
> modes. For example the bitfield like below will be recog as .SAT_TRUNC.
>
> struct e
> {
> unsigned pre : 12;
> unsigned a : 4;
> };
>
> __attribute__((noipa))
>
On Tue, Jul 23, 2024 at 10:27 AM Kugan Vivekanandarajah
wrote:
>
> On Tue, Jul 23, 2024 at 10:35 AM Andrew Pinski wrote:
> >
> > On Mon, Jul 22, 2024 at 5:26 PM Kugan Vivekanandarajah
> > wrote:
> > >
> > > Revised based on the comment and moved it into existing patterns as.
> > >
> > > gcc/Chan
On Tue, Jul 23, 2024 at 1:03 AM Edwin Lu wrote:
>
> Hi Richard,
>
> On 5/31/2024 1:48 AM, Richard Biener wrote:
> > On Thu, May 30, 2024 at 2:11 AM Patrick O'Neill
> > wrote:
> >>
> >> From: Greg McGary
> >
> > Still a NACK. If rema
-8060";>r12-8060 commit on top
> + of GCC 11.5.0. See https://gcc.gnu.org/PR116029";>PR116029
> +for more details.
> +
> +
>
>
>
> Jakub
>
>
--
Richard Biener
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
When the constraint graph consists of N nodes with only complex
constraints and no copy edges we have to be lucky to arrive at
a constraint solving order that requires the optimal number of
iterations. What happens in the testcase is that we bottle-neck
on computing the visitation order but propag
On Tue, 23 Jul 2024, Jakub Jelinek wrote:
> On Tue, Jul 23, 2024 at 08:42:24AM +0200, Richard Biener wrote:
> > On Tue, 23 Jul 2024, Jakub Jelinek wrote:
> > > The folding into REALPART_EXPR is correct, used only when the mem_offset
> > > is zero, but for IMAGPART_EXP
es-nomask=0" } */
> > /* { dg-require-effective-target vect_float } */
> >
> > +/* This test requires +-Inf and NaN, so disable finite-math-only */
> > +/* { dg-additional-options "-fno-finite-math-only" } */
> > +
> > #include "tsvc.h"
> &g
g;
> +
> +static inline int
> +foo (_Complex unsigned short c)
> +{
> + __builtin_memmove (&g, 1 + (char *) &c, 2);
> + return g;
> +}
> +
> +int
> +main ()
> +{
> + if (__SIZEOF_SHORT__ == 2
> + && __CHAR_BIT__ == 8
> + && foo (
We're hashing operand 2 to the temporary hash.
Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
* fold-const.cc (operand_compare::hash_operand): Fix hash
of WIDEN_*_EXPR.
---
gcc/fold-const.cc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/fol
On Sun, 21 Jul 2024, Richard Biener wrote:
> As in other places we have to be careful to use FP modes to represent
> the underlying bit representation of an object. With x87 floating-point
> types there are no load or store instructions that preserve this and
> XFmode can have paddin
The following addresses the bad hash function of cselib which uses
integer plus for merging. This causes a huge number of collisions
for the testcase in the PR and thus very large compile-time.
The following rewrites it to use inchash, eliding duplicate mixing
of RTX code and mode in some cases a
The following constifies parts of inchash.
Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
* inchash.h (inchash::end): Make const.
(inchash::merge): Take const reference hash argument.
(inchash::add_commutative): Likewise.
---
gcc/inchash.h | 6 +++---
1 file
On Fri, Jul 19, 2024 at 4:25 AM Joern Wolfgang Rennecke
wrote:
>
> As discussed before on gcc@gcc,gnu.org, this patch reduces the iteration
> counts of the tsvc tests to avoid timeouts when using simulators.
> A few tests needed special attention because they divided "iterations"
> by some constan
The following addresses another case where x87 FP loads mangle the
bit representation and thus are not suitable for a representative
in other types. VN was value-numbering a later integer load of 'x'
as the same as a former float load of 'x'.
The following disables this when the result is not kno
As in other places we have to be careful to use FP modes to represent
the underlying bit representation of an object. With x87 floating-point
types there are no load or store instructions that preserve this and
XFmode can have padding.
When SRA faces the situation that a field is accessed with mu
> Am 20.07.2024 um 02:31 schrieb Andrew Pinski :
>
> On Fri, Jul 19, 2024 at 5:23 PM Sam James wrote:
>>
>> Originally added in r0-44646-g204250d2fcd084 and r0-44627-gfd350d241fecf6
>> whic
>> moved -fno-common from all builds to just checking builds.
>>
>> Since r10-4867-g6271dd984d7f92,
> Am 19.07.2024 um 19:44 schrieb Richard Sandiford :
>
> Previously we built vector boolean constants using 1 for true
> elements and 0 for false elements. This matches the predicates
> produced by SVE's PTRUE instruction, but leads to a miscompilation
> on AVX512, where all bits of a boolean
The following addresses the bad hash function of cselib which uses
integer plus for merging. This causes a huge number of collisions
for the testcase in the PR and thus very large compile-time.
The following rewrites it to use inchash, eliding duplicate mixing
of RTX code and mode in some cases a
> Am 18.07.2024 um 17:37 schrieb Alexander Monakov :
>
>
> On Thu, 18 Jul 2024, Richard Biener wrote:
>
>>> If both b and c are scalars and the type of true?b:c has the same size
>>> as the element type of a, then b and c are converted to a vector ty
> Am 18.07.2024 um 16:22 schrieb Alexander Monakov :
>
>
>> On Thu, 18 Jul 2024, Richard Biener wrote:
>>
>> The following adds support for vector conditionals in C. The support
>> was nearly there already but c_objc_common_truthvalue_conversion
>&
On Thu, Jul 18, 2024 at 2:27 PM wrote:
>
> From: Pan Li
>
> The SAT_TRUNC form 2 has below pattern matching.
> From:
> _18 = MIN_EXPR ;
> iftmp.0_11 = (unsigned int) _18;
>
> To:
> _18 = MIN_EXPR ;
> iftmp.0_11 = .SAT_TRUNC (_18);
.SAT_TRUNC (left_8);
> But if there is another use of _1
On Wed, Jul 17, 2024 at 1:29 PM Eikansh Gupta wrote:
>
> Min and max could be optimized if both operands are defined by
> (same) variable restricted by an and(&). For signed types,
> optimization can be done when both constant have same sign bit.
> The patch also adds optimization for specific cas
The following adds support for vector conditionals in C. The support
was nearly there already but c_objc_common_truthvalue_conversion
rejecting vector types. Instead of letting them pass there unchanged
I chose to instead skip it when parsing conditionals instead as a
variant with less possible f
fold_truth_andor_1 via make_bit_field_ref builds an address of
a CALL_EXPR which isn't valid GENERIC and later causes an ICE.
The following simply avoids the folding for f ().a != 1 || f ().b != 2
as it is a premature optimization anyway. The alternative would
have been to build a TARGET_EXPR arou
m_exp_index_transform_applied = true;
> +}
> +
> /* Checks whether the range given by individual case statements of the switch
> switch statement isn't too big and whether the number of branches actually
> satisfies the size of the new array. */
> @@ -973,8 +1274,9 @@ switch_conversion::gen_inbound_check ()
> bbf->count = e1f->count () + e2f->count ();
>
>/* Tidy blocks that have become unreachable. */
> - prune_bbs (bbd, m_final_bb,
> - m_default_case_nonstandard ? m_default_bb : NULL);
> + bool prune_default_bb = !m_default_case_nonstandard
> +&& !m_exp_index_transform_applied;
> + prune_bbs (bbd, m_final_bb, prune_default_bb ? NULL : m_default_bb);
>
>/* Fixup the PHI nodes in bbF. */
>fix_phi_nodes (e1f, e2f, bbf);
> @@ -1053,8 +1355,19 @@ switch_conversion::expand (gswitch *swtch)
>return;
> }
>
> - /* Check the case label values are within reasonable range: */
> - if (!check_range ())
> + /* Sometimes it is possible to use the "exponential index transform" to
> help
> + switch conversion convert switches which it otherwise could not convert.
> + However, we want to do this transform only when we know that switch
> + conversion will then really be able to convert the switch. So we first
> + check if the transformation is applicable and then maybe later do the
> + transformation. */
> + bool exp_transform_viable = is_exp_index_transform_viable (swtch);
> +
> + /* Check the case label values are within reasonable range.
> +
> + If we will be doing exponential index transform, the range will be
> always
> + reasonable. */
> + if (!exp_transform_viable && !check_range ())
> {
>gcc_assert (m_reason);
>return;
> @@ -1076,6 +1389,9 @@ switch_conversion::expand (gswitch *swtch)
>/* At this point all checks have passed and we can proceed with the
> transformation. */
>
> + if (exp_transform_viable)
> +exp_index_transform (swtch);
> +
>create_temp_arrays ();
>gather_default_values (m_default_case_nonstandard
>? gimple_switch_label (swtch, 1)
> diff --git a/gcc/tree-switch-conversion.h b/gcc/tree-switch-conversion.h
> index 6939eec6018..1a865f85f3a 100644
> --- a/gcc/tree-switch-conversion.h
> +++ b/gcc/tree-switch-conversion.h
> @@ -743,6 +743,19 @@ public:
>/* Collection information about SWTCH statement. */
>void collect (gswitch *swtch);
>
> + /* Check that the 'exponential index transform' can be applied.
> +
> + See the comment at the function definition for more details. */
> + bool is_exp_index_transform_viable (gswitch *swtch);
> +
> + /* Perform the 'exponential index transform'.
> +
> + The exponential index transform shrinks the range of case numbers which
> + helps switch conversion convert switches it otherwise could not.
> +
> + See the comment at the function definition for more details. */
> + void exp_index_transform (gswitch *swtch);
> +
>/* Checks whether the range given by individual case statements of the
> switch
> switch statement isn't too big and whether the number of branches
> actually
> satisfies the size of the new array. */
> @@ -900,6 +913,11 @@ public:
>
>/* True if CFG has been changed. */
>bool m_cfg_altered;
> +
> + /* True if exponential index transform has been applied. See the comment
> at
> + the definition of exp_index_transform for details about the
> + transformation. */
> + bool m_exp_index_transform_applied;
> };
>
> void
>
--
Richard Biener
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
On Wed, 17 Jul 2024, Tamar Christina wrote:
> > -Original Message-
> > From: Richard Biener
> > Sent: Tuesday, July 16, 2024 4:08 PM
> > To: Tamar Christina
> > Cc: GCC Patches ; Richard Sandiford
> >
> > Subject: Re: [RFC][middle-end] S
st,
> please see this link:
>
> https://godbolt.org/z/5Tfqs9zqj
>
>
> 在 2024/07/18 15:05, Richard Biener 写道:
> > On Thu, 18 Jul 2024, Jiawei wrote:
> >
> >> This patch improves SLP reduction handling by ensuring proper processing
> >> even for a singl
On Wed, Jul 17, 2024 at 9:07 PM Rubin Gerritsen wrote:
>
> Sorry for the inconvenience, here the patch is attached as an attachment.
Pushed as r15-2134-gcee56fe0ba757c
> Rubin
> ____
> From: Richard Biener
> Sent: 17 July 2024 1:01 PM
> To:
if (scalar_stmts.length() > 1) {
> + vec roots = vNULL;
> + vec remain = vNULL;
> + if (!vect_build_slp_instance(loop_vinfo, slp_inst_kind_reduc_group,
> scalar_stmts, roots, remain, max_tree_size, &limit, bst_map, NULL)) {
> + scalar_stmt
On Thu, Jul 18, 2024 at 7:35 AM Andrew Pinski wrote:
>
> On Wed, Jul 17, 2024 at 9:20 PM wrote:
> >
> > From: Pan Li
> >
> > This patch would like to add the doc for the Standard-Names
> > ustrunc and sstrunc, include both the scalar and vector integer
> > modes.
>
> Thanks for doing this and t
On Thu, Jul 18, 2024 at 4:09 AM Sam James wrote:
>
> All of these are for wrong-code bugs. Confirmed to be used before but
> with no execution.
>
> Tested on x86_64-pc-linux-gnu and checked test logs before/after.
OK for both.
> 2024-07-18 Sam James
>
> PR c++/53288
> PR c++/5
On Thu, Jul 18, 2024 at 12:54 AM Roger Sayle wrote:
>
> I've been investigating some (float)i == CST optimizations for match.pd,
> and noticed there's already a TODO comment in match.pd that's relatively
> easy to implement. When CST is a NaN, we only need to worry about
> exceptions with flag_tr
> Am 17.07.2024 um 23:13 schrieb Richard Sandiford :
>
> Andrew Pinski writes:
>>> On Wed, Jul 17, 2024 at 1:03 PM Tamar Christina
>>> wrote:
>>>
>>>> -Original Message-
>>>> From: Richard Sandiford
>>>>
> Am 17.07.2024 um 16:45 schrieb Jakub Jelinek :
>
> On Wed, Jul 17, 2024 at 04:15:16PM +0200, Richard Biener wrote:
>> Ok. Is there a more general repeat byte op available?
>
> I think
>.skipbytes, fill
> but not sure what assemblers do support that, not
> Am 17.07.2024 um 16:01 schrieb Jakub Jelinek :
>
> Hi!
>
> The following testcase ICEs on x86_64-linux, because we try to
> gsi_insert_on_edge_immediate a statement on an edge which already has
> statements queued with gsi_insert_on_edge, and the deferral has been
> intentional so that we d
> Am 17.07.2024 um 15:55 schrieb Jakub Jelinek :
>
> Hi!
>
> When not using .base64 directive, we emit for long sequences of zeros
>.string"foobarbaz"
>.string ""
>.string ""
>.string ""
>.string ""
>.string ""
>.string ""
>.string ""
>.string ""
>.
On Wed, Jul 17, 2024 at 3:17 PM Richard Sandiford
wrote:
>
> Richard Biener writes:
> > On Wed, Jul 17, 2024 at 1:53 PM Tejas Belagod wrote:
> >>
> >> On 7/17/24 4:36 PM, Richard Biener wrote:
> >> > On Wed, Jul 17, 2024 at 10:17 AM Tejas Belagod
>
On Tue, Jul 16, 2024 at 3:36 PM Eikansh Gupta wrote:
>
> This patch adds match pattern for `(a ? x : y) eq/ne (b ? x : y)`.
> In forwprop1 pass, depending on the type of `a` and `b`, GCC produces
> `vec_cond` or `cond_expr`. Based on the observation that `(x != y)` is
> TRUE, the pattern can be op
On Mon, Jul 8, 2024 at 7:00 PM Andi Kleen wrote:
>
> When musttail is set, make tree-tailcall give error messages
> when it cannot handle a call. This avoids vague "other reasons"
> error messages later at expand time when it sees a musttail
> function not marked tail call.
>
> In various cases th
On Wed, Jul 17, 2024 at 1:53 PM Tejas Belagod wrote:
>
> On 7/17/24 4:36 PM, Richard Biener wrote:
> > On Wed, Jul 17, 2024 at 10:17 AM Tejas Belagod
> > wrote:
> >>
> >> On 7/15/24 6:05 PM, Richard Biener wrote:
> >>> On Mon, Jul
The following addresses an old regression when end-of-object/storage
clobbers were introduced. In particular when there's an end-of-object
clobber in a loop but no corresponding begin-of-object we can still
perform store motion of may-aliased refs when we re-issue the
end-of-object/storage on the
The following fixes how during reduction epilogue generation we
gather conditional compares for condition reductions, thereby
following the reduction chain via STMT_VINFO_REDUC_IDX. The issue
is that SLP nodes for COND_EXPRs can have either three or four
children dependent on whether we have legac
On Tue, 16 Jul 2024, Filip Kastl wrote:
> On Wed 2024-07-10 11:34:44, Richard Biener wrote:
> > On Mon, 8 Jul 2024, Filip Kastl wrote:
> >
> > > Hi,
> > >
> > > I'm replying to Richard and keeping Andrew in cc since your suggestions
> > >
On Wed, Jul 17, 2024 at 11:48 AM wrote:
>
> From: Pan Li
>
> The .SAT_TRUNC matching doesn't check the type has mode precision. Thus
> when bitfield like below will be recog as .SAT_TRUNC.
>
> struct e
> {
> unsigned pre : 12;
> unsigned a : 4;
> };
>
> __attribute__((noipa))
> void bug (e *
On Wed, Jul 17, 2024 at 10:17 AM Tejas Belagod wrote:
>
> On 7/15/24 6:05 PM, Richard Biener wrote:
> > On Mon, Jul 15, 2024 at 1:22 PM Tejas Belagod wrote:
> >>
> >> On 7/15/24 12:16 PM, Tejas Belagod wrote:
> >>> On 7/12/24 6:40 PM, Richard Biener wro
On Wed, Jul 17, 2024 at 12:47 PM Richard Biener
wrote:
>
> On Tue, Jul 16, 2024 at 9:30 PM rubin.gerritsen wrote:
> >
> > Changes since v1:
> > * Added DCO signoff
> > * Removed tabs from commit message
> >
> > --
> > Previously only simplific
On Tue, Jul 16, 2024 at 9:30 PM rubin.gerritsen wrote:
>
> Changes since v1:
> * Added DCO signoff
> * Removed tabs from commit message
>
> --
> Previously only simplifications of the `__st[xrp]cpy_chk`
> were dumped. Now all call replacement simplifications are
> dumped.
>
> Examples of stateme
On Wed, 17 Jul 2024, Jakub Jelinek wrote:
> On Wed, Jul 17, 2024 at 11:10:34AM +0200, Richard Biener wrote:
> > OK. It's a bit late for the 11 branch without some soaking on trunk -
> > when do we use __builtin_clear_padding? IIRC for C++ atomics?
>
> Apparently in G
((t - s) > 1 || cnt <= 2))
> {
> @@ -8584,7 +8584,7 @@ default_elf_asm_output_ascii (FILE *f, c
> break;
> }
> }
> - if (cnt > (t - s + 2) / 3 * 4 && (t - s) >= 3)
> + if (cnt > ((unsigned) (t - s
builtin-convertvector-1.c
> 2024-07-16 18:54:55.907042232 +0200
> @@ -1,3 +1,4 @@
> +/* { dg-do run } */
> /* { dg-skip-if "double support is incomplete" { "avr-*-*" } } */
>
> extern
>
> Jakub
>
>
--
Richard Biener
SUSE Software
long b; char c; struct S d[3]; long long e; char f;
> } t1, t2;
> --- gcc/testsuite/c-c++-common/torture/builtin-clear-padding-6.c.jj
> 2024-07-16 16:55:10.331460214 +0200
> +++ gcc/testsuite/c-c++-common/torture/builtin-clear-padding-6.c
> 2024-07-16 17:06:56.940508833
On Tue, Jul 16, 2024 at 4:30 PM Richard Sandiford
wrote:
>
> In this PR, canonicalize_move_range walked off the end of a list
> and triggered a null dereference. There are multiple ways of fixing
> that, but I think the approach taken in the patch should be
> relatively efficient.
>
> Tested on a
bug_tree (_q40)
> type public unsigned DI
> size
> unit-size
> align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
> 0x76a437e0 precision:64 min max
>
> pointer_to_this >
> visited
> def_stmt _4 = *
oment, one of the big downsides of re-using the existing cbranch is
> that
> in the target we can't tell whether the result of the cbranch is actually
> needed
> or not.
>
> i.e. for SVE we can't tell if the predicate is needed. For the cases where we
> don't stay inside the vector loop we can generate more efficient code if we
> know
> that the loop only cares about any or all bits set and doesn't need to know
> which one.
>
> For this reason I propose adding new optabs cbranch_any_ and branch_all_ and
> have emit_cmp_and_jump_insns lower to these when appropriate.
Hmm, but isn't this then more a vec_cmpeq_any that produces a scalar
rather than a vector and then a regular scalar compare-and-jump? That is,
does SVE have such a compare instruction? Can you show the current
code-gen and how the improved one would look like?
> Are the general idea and steps OK?
See above. Thanks for the write-up.
Richard.
> If so I'll start implementation now.
>
> Thanks,
> Tamar
>
> [1] Yishen Chen, Charith Mendis, and Saman Amarasinghe. 2022. All
> You Need Is Superword-Level Parallelism: Systematic Control-Flow
> Vectorization with SLP. In Proceedings of the 43rd ACM SIGPLAN
> International Conference on Programming Language Design and
> Implementation (PLDI '22), June 13?17, 2022, San Diego, CA, USA.
> ACM, NewYork, NY, USA, 15 pages. https://doi.org/10.1145/3519939.
> 3523701
>
> [2] Predicated Static Single Assignment
>
> Lori Carter Beth Simon Brad Calder Larry Carter Jeanne Ferrante
> Department of Computer Science and Engineering
> University of California, San Diego
> flcarter,esimon,calder,carter,ferran...@cs.ucsd.edu
>
--
Richard Biener
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
On Tue, 16 Jul 2024, Richard Biener wrote:
> On Mon, 15 Jul 2024, Jan Hubicka wrote:
>
> > > Currently unaligned YMM and ZMM load and store costs are cheaper than
> > > aligned which causes the vectorizer to purposely mis-align accesses
> > > by adding an align
asetype = sizetype;
> - record_common_cand (data, build_int_cst (basetype, 0), iv->step, use);
> + record_common_cand (data, build_int_cst (basetype, 0),
> + fold_convert (basetype, iv->step), use);
But this looks redundant? iv->step should already be sizetyp
On Tue, 16 Jul 2024, Jakub Jelinek wrote:
> On Tue, Jul 16, 2024 at 01:04:50PM +0200, Richard Biener wrote:
> > Do you think this needs a new RC?
>
> Guess that depends on if somebody would actually perform testing with that
> option...
... on a Zen4 machine.
I think that
On Tue, 16 Jul 2024, Richard Biener wrote:
> On Tue, 16 Jul 2024, Jakub Jelinek wrote:
>
> > On Tue, Jul 16, 2024 at 10:55:30AM +0200, Richard Biener wrote:
> > > On Tue, 16 Jul 2024, Jakub Jelinek wrote:
> > >
> > > > On Tue, Jul 16, 2024
On Tue, 16 Jul 2024, Jakub Jelinek wrote:
> On Tue, Jul 16, 2024 at 10:55:30AM +0200, Richard Biener wrote:
> > On Tue, 16 Jul 2024, Jakub Jelinek wrote:
> >
> > > On Tue, Jul 16, 2024 at 10:43:27AM +0200, Richard Biener wrote:
> > > > I've pushed it to t
When emitting the compensation to the vectorized main loop for
a vector reduction value to be re-used in the vectorized epilogue
we fail to place it in the correct block when the main loop is
known to be entered (no loop_vinfo->main_loop_edge) but the
epilogue is not (a loop_vinfo->skip_this_loop_e
On Tue, 16 Jul 2024, Jakub Jelinek wrote:
> On Tue, Jul 16, 2024 at 10:43:27AM +0200, Richard Biener wrote:
> > I've pushed it to trunk now and am running local CPU 2017 to check for
> > obvious fallout on Zen4 so we can make 14.2 RC early next week. There's
> >
Currently unaligned YMM and ZMM load and store costs are cheaper than
aligned which causes the vectorizer to purposely mis-align accesses
by adding an alignment prologue. It looks like the unaligned costs
were simply copied from the bogus znver4 costs. The following makes
the unaligned costs equa
/* cost of unaligned stores. */
> > + {6, 6, 10, 10, 12}, /* cost of unaligned loads. */
> > + {8, 8, 8, 12, 12}, /* cost of unaligned stores. */
> >2, 2, 2, /* cost of moving XMM,YMM,ZMM
> >register. */
> >6, /* cost of moving SSE register
> > to integer. */
> > --
> > 2.35.3
>
--
Richard Biener
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> Am 15.07.2024 um 19:08 schrieb Richard Sandiford :
>
> Richard Biener writes:
>> The following adds a new --param for debugging the vectorizers alignment
>> peeling by increasing the cost of aligned stores.
>>
>> Bootstrap & regtest running on x86_64
On Sat, Jul 13, 2024 at 5:49 PM Feng Xue OS wrote:
>
> When transforming multiple lane-reducing operations in a loop reduction chain,
> originally, corresponding vectorized statements are generated into def-use
> cycles starting from 0. The def-use cycle with smaller index, would contain
> more st
On Sat, Jul 13, 2024 at 5:48 PM Feng Xue OS wrote:
>
> For lane-reducing operation(dot-prod/widen-sum/sad) in loop reduction, current
> vectorizer could only handle the pattern if the reduction chain does not
> contain other operation, no matter the other is normal or lane-reducing.
>
> This patch
On Sat, Jul 13, 2024 at 5:47 PM Feng Xue OS wrote:
>
> Vector stmts number of an operation is calculated based on output vectype.
> This is over-estimated for lane-reducing operation, which would cause vector
> def/use mismatched when we want to support loop reduction mixed with lane-
> reducing a
On Sat, Jul 13, 2024 at 5:46 PM Feng Xue OS wrote:
>
> Extend original vect_get_num_copies (pure loop-based) to calculate number of
> vector stmts for slp node regarding a generic vect region.
>
> Thanks,
> Feng
> ---
> gcc/
> * tree-vectorizer.h (vect_get_num_copies): New overload functio
ard.
>
> 2024-07-14 Roger Sayle
> Richard Biener
>
> gcc/ChangeLog
> PR tree-optimization/114661
> * match.pd ((X*C1)|(X*C2) to X*(C1+C2)): Allow optional useless
> type conversions around multiplicaitions, such as those inserted
> by this t
On Sun, Jul 14, 2024 at 10:15 AM rubin.gerritsen wrote:
>
> Previously only simplifications of the `__st[xrp]cpy_chk`
> were dumped. Now all call replacement simplifications are
> dumped.
>
> Examples of statements with corresponding dumpfile entries:
>
> `printf("mystr\n");`:
> optimized: s
gt; /* { dg-additional-options "--param vect-epilogues-nomask=0" } */
> /* { dg-require-effective-target vect_float } */
>
> +/* This test requires +-Inf and NaN, so disable finite-math-only */
> +/* { dg-additional-options "-fno-finite-math-only" } */
> +
The following adds a new --param for debugging the vectorizers alignment
peeling by increasing the cost of aligned stores.
Bootstrap & regtest running on x86_64-unknown-linux-gnu.
This makes the PR115843 testcase fail again on trunk (but not on the
branch), seemingly uncovering another backend is
On Mon, Jul 15, 2024 at 1:22 PM Tejas Belagod wrote:
>
> On 7/15/24 12:16 PM, Tejas Belagod wrote:
> > On 7/12/24 6:40 PM, Richard Biener wrote:
> >> On Fri, Jul 12, 2024 at 3:05 PM Jakub Jelinek wrote:
> >>>
> >>> On Fri, Jul 12, 2024 at 02:56:53PM +
On Mon, 15 Jul 2024, Maciej W. Rozycki wrote:
> On Sun, 30 Jun 2024, Maciej W. Rozycki wrote:
>
> > > The patch is OK for trunk, thanks. I agree that it's a regression
> > > from 08a692679fb8. Since it's fixing such a hard-to-diagnose wrong
> > > code bug, and since it seems very safe, I think
When AVX512 uses a fully masked loop and peeling we fail to create the
correct initial loop mask when the mask is composed of multiple
components in some cases. The following fixes this by properly applying
the bias for the component to the shift amount.
Bootstrap and regtest running on x86_64-un
Currently unaligned YMM and ZMM load and store costs are cheaper than
aligned which causes the vectorizer to purposely mis-align accesses
by adding an alignment prologue. It looks like the unaligned costs
were simply left untouched from znver3 where they equate the aligned
costs when tweaking alig
On Mon, 15 Jul 2024, Jakub Jelinek wrote:
> On Mon, Jul 15, 2024 at 09:52:18AM +0200, Richard Biener wrote:
> > > .string "k"
> > > .string ""
> > > .string ""
> > > .string "\37
On Mon, 15 Jul 2024, Jakub Jelinek wrote:
> On Mon, Jul 15, 2024 at 09:16:29AM +0200, Richard Biener wrote:
> > > Nick has implemented a new .base64 directive in gas (to be shipped in
> > > the upcoming binutils 2.43; big thanks for that).
> > > See https://sourcewar
> + prev_base64 = false;
> + }
> +#else
> + (void) last_base64;
> + (void) prev_base64;
> +#endif
> +
>if (p < limit && (p - s) <= (long) ELF_STRING_LIMIT)
> {
> if (bytes_in_chunk > 0)
> --- gcc/configure.jj
On Fri, Jul 12, 2024 at 3:05 PM Jakub Jelinek wrote:
>
> On Fri, Jul 12, 2024 at 02:56:53PM +0200, Richard Biener wrote:
> > Padding is only an issue for very small vectors - the obvious choice is
> > to disallow vector types that would require any padding. I can hardly
>
On Fri, Jul 12, 2024 at 12:44 PM Tejas Belagod wrote:
>
> On 7/12/24 11:46 AM, Richard Biener wrote:
> > On Fri, Jul 12, 2024 at 6:17 AM Tejas Belagod wrote:
> >>
> >> On 7/10/24 4:37 PM, Richard Biener wrote:
> >>> On Wed, Jul 10, 2024
On Fri, Jul 12, 2024 at 12:09 PM Ajit Agarwal wrote:
>
> Hello Richard:
>
> On 11/07/24 2:21 pm, Richard Biener wrote:
> > On Thu, Jul 11, 2024 at 10:30 AM Ajit Agarwal
> > wrote:
> >>
> >> Hello All:
> >>
> >> Unroll factor is determi
On Thu, 11 Jul 2024, Tamar Christina wrote:
> -Original Message-
> > From: Richard Biener
> > Sent: Thursday, July 11, 2024 12:39 PM
> > To: Tamar Christina
> > Cc: gcc-patches@gcc.gnu.org; nd ; bin.ch...@linux.alibaba.com
> > Subject: RE: [PATCH
set instead of SLP_TREE_VECTYPE? As said having wrong
> > SLP_TREE_NUMBER_OF_VEC_STMTS is going to backfire.
>
> Then the alternative is to limit special handling related to the vec_num only
> inside vect_transform_reduction. Is that ok? Or any other suggestion?
I think that's kind
On Fri, Jul 12, 2024 at 7:24 AM liuhongt wrote:
>
> >- _5 = __atomic_fetch_or_8 (&set_work_pending_p, 1, 0);
> >- # DEBUG old => (long int) _5
> >+ _6 = .ATOMIC_BIT_TEST_AND_SET (&set_work_pending_p, 0, 1, 0,
> >__atomic_fetch_or_8);
> >+ # DEBUG old => NULL
> > # DEBUG BEGIN_STMT
> >- # D
statement (GSI_SAME_STMT behavior). */
>
> static tree
> gen_pow2p (gimple_stmt_iterator *gsi, bool before, location_t loc, tree op)
> {
> tree result;
>
> /* Use either .POPCOUNT (op) == 1 or op & -op == op. */
> tree type = TREE_TYPE (op);
> gimple *s
The following adjusts mask recording which didn't take into account
that we can merge call arguments from two vectors like
_50 = {vect_d_1.253_41, vect_d_1.254_43};
_51 = VIEW_CONVERT_EXPR(mask__19.257_49);
_52 = (unsigned int) _51;
_53 = _Z3bazd.simdclone.7 (_50, _52);
_54 = BIT_FIELD_R
On Thu, Jul 11, 2024 at 2:18 PM Andrew Carlotti wrote:
>
> This class provides a constant-size bitmap that can be used as almost a
> drop-in replacement for bitmaps stored in integer types. The
> implementation is entirely within the header file and uses recursive
> templated operations to suppor
On Fri, Jul 12, 2024 at 6:17 AM Tejas Belagod wrote:
>
> On 7/10/24 4:37 PM, Richard Biener wrote:
> > On Wed, Jul 10, 2024 at 12:44 PM Richard Sandiford
> > wrote:
> >>
> >> Tejas Belagod writes:
> >>> On 7/10/24 2:38 PM, Richard Biener wrot
On Wed, 10 Jul 2024, Richard Sandiford wrote:
> Richard Biener writes:
> > The following is a prototype for how to represent load/store-lanes
> > within SLP. I've for now settled with having a single load node
> > with multiple permute nodes acting as selection, one fo
On Thu, Jul 11, 2024 at 2:45 PM YunQiang Su wrote:
>
> Richard Biener 于2024年7月11日周四 20:21写道:
> >
> > On Thu, Jul 11, 2024 at 2:13 PM YunQiang Su wrote:
> > >
> > > From: YunQiang Su
> > >
> > > PR target/115840.
> > >
> >
On Thu, Jul 11, 2024 at 2:14 PM YunQiang Su wrote:
>
> From: YunQiang Su
>
> Uninitialized internal temp variable may be useful in some case,
> such as for COND_LEN_MUL etc on RISC-V with V extension: If an
> const or pre-exists VAR is used, we have to use "undisturbed"
> policy; if an uninitiali
601 - 700 of 3489 matches
Mail list logo