[PATCH] tree-optimization/116081 - typedef vs. non-typedef in vectorization

2024-07-25 Thread Richard Biener
The following fixes the code generation difference when using a typedef for the scalar type. The issue is using a pointer equality test for an INTEGER_CST which fails when the types are different variants. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. PR tree-optimization/

[PATCH] tree-optimization/116079 - store motion and clobbers

2024-07-25 Thread Richard Biener
When we move a store out of an inner loop and remove a clobber in the process, analysis of the inner loop can run into the clobber via the meta-data and crash when accessing its basic-block. The following avoids this by clearing the VDEF which is how it identifies already processed stores. Bootst

[PATCH] tree-optimization/116081 - typedef vs. non-typedef in vectorization

2024-07-25 Thread Richard Biener
The following addresses a behavioral difference in vector type analysis for typedef vs. non-typedef. It doesn't fix the issue at hand but avoids a spurious difference in the dumps. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. PR tree-optimization/116081 * tree-vec

[PATCH] Maintain complex constraint vector order during PTA solving

2024-07-24 Thread Richard Biener
There's a FIXME comment in the PTA constraint solver that the vector of complex constraints can get unsorted which can lead to duplicate entries piling up during node unification. The following fixes this with the assumption that delayed updates to constraints are uncommon (otherwise re-sorting th

[PATCH] tree-optimization/116057 - wrong code with CCP and vector CTORs

2024-07-24 Thread Richard Biener
The following fixes an issue with CCPs likely_value when faced with a vector CTOR containing undef SSA names and constants. This should be classified as CONSTANT and not UNDEFINED. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. PR tree-optimization/116057 * tree-ssa

Re: [PATCH v2] MATCH: Add simplification for MAX and MIN to match.pd [PR109878]

2024-07-24 Thread Richard Biener
On Fri, Jul 19, 2024 at 7:19 PM Eikansh Gupta wrote: > > Min and max could be optimized if both operands are defined by > (same) variable restricted by an and(&). For signed types, > optimization can be done when both constant have same sign bit. > The patch also adds optimization for specific cas

Re: [PATCH v3 2/2] Prevent divide-by-zero

2024-07-24 Thread Richard Biener
On Wed, Jul 24, 2024 at 1:31 AM Edwin Lu wrote: > > > On 7/23/2024 11:20 AM, Richard Sandiford wrote: > > Edwin Lu writes: > >> On 7/23/2024 4:56 AM, Richard Biener wrote: > >>> On Tue, Jul 23, 2024 at 1:03 AM Edwin Lu wrote: > >>>> Hi Richard,

Re: [PATCH] optabs/rs6000: Rename iorc and andc to iorn and andn

2024-07-24 Thread Richard Biener
On Wed, Jul 24, 2024 at 9:38 AM Kewen.Lin wrote: > > Hi Andrew, > > on 2024/7/24 10:49, Andrew Pinski wrote: > > When I was trying to add an scalar version of iorc and andc, the optab that > > got matched was for and/ior with the mode of csi and cdi instead of iorc and > > andc optabs for si and d

Re: [PATCH v2] Internal-fn: Only allow type matches mode for internal fn[PR115961]

2024-07-23 Thread Richard Biener
On Fri, Jul 19, 2024 at 1:10 PM wrote: > > From: Pan Li > > The direct_internal_fn_supported_p has no restrictions for the type > modes. For example the bitfield like below will be recog as .SAT_TRUNC. > > struct e > { > unsigned pre : 12; > unsigned a : 4; > }; > > __attribute__((noipa)) >

Re: [PATCH] MATCH: add abs support for half float

2024-07-23 Thread Richard Biener
On Tue, Jul 23, 2024 at 10:27 AM Kugan Vivekanandarajah wrote: > > On Tue, Jul 23, 2024 at 10:35 AM Andrew Pinski wrote: > > > > On Mon, Jul 22, 2024 at 5:26 PM Kugan Vivekanandarajah > > wrote: > > > > > > Revised based on the comment and moved it into existing patterns as. > > > > > > gcc/Chan

Re: [PATCH v3 2/2] Prevent divide-by-zero

2024-07-23 Thread Richard Biener
On Tue, Jul 23, 2024 at 1:03 AM Edwin Lu wrote: > > Hi Richard, > > On 5/31/2024 1:48 AM, Richard Biener wrote: > > On Thu, May 30, 2024 at 2:11 AM Patrick O'Neill > > wrote: > >> > >> From: Greg McGary > > > > Still a NACK. If rema

Re: [wwwdocs] Add aarch64 11.5.0 caveat

2024-07-23 Thread Richard Biener
-8060";>r12-8060 commit on top > + of GCC 11.5.0. See https://gcc.gnu.org/PR116029";>PR116029 > +for more details. > + > + > > > > Jakub > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

[PATCH] tree-optimization/116002 - PTA solving slow with degenerate graph

2024-07-23 Thread Richard Biener
When the constraint graph consists of N nodes with only complex constraints and no copy edges we have to be lucky to arrive at a constraint solving order that requires the optimal number of iterations. What happens in the testcase is that we bottle-neck on computing the visitation order but propag

Re: [PATCH] ssa: Fix up maybe_rewrite_mem_ref_base complex type handling [PR116034]

2024-07-23 Thread Richard Biener
On Tue, 23 Jul 2024, Jakub Jelinek wrote: > On Tue, Jul 23, 2024 at 08:42:24AM +0200, Richard Biener wrote: > > On Tue, 23 Jul 2024, Jakub Jelinek wrote: > > > The folding into REALPART_EXPR is correct, used only when the mem_offset > > > is zero, but for IMAGPART_EXP

Re: [PING] [PATCH] testsuite: Disable finite math only for test [PR115826]

2024-07-23 Thread Richard Biener
es-nomask=0" } */ > > /* { dg-require-effective-target vect_float } */ > > > > +/* This test requires +-Inf and NaN, so disable finite-math-only */ > > +/* { dg-additional-options "-fno-finite-math-only" } */ > > + > > #include "tsvc.h" > &g

Re: [PATCH] ssa: Fix up maybe_rewrite_mem_ref_base complex type handling [PR116034]

2024-07-22 Thread Richard Biener
g; > + > +static inline int > +foo (_Complex unsigned short c) > +{ > + __builtin_memmove (&g, 1 + (char *) &c, 2); > + return g; > +} > + > +int > +main () > +{ > + if (__SIZEOF_SHORT__ == 2 > + && __CHAR_BIT__ == 8 > + && foo (

[PATCH] Fix hash of WIDEN_*_EXPR

2024-07-22 Thread Richard Biener
We're hashing operand 2 to the temporary hash. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. * fold-const.cc (operand_compare::hash_operand): Fix hash of WIDEN_*_EXPR. --- gcc/fold-const.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/fol

Re: [PATCH] tree-optimization/58416 - SRA wrt FP type replacements

2024-07-22 Thread Richard Biener
On Sun, 21 Jul 2024, Richard Biener wrote: > As in other places we have to be careful to use FP modes to represent > the underlying bit representation of an object. With x87 floating-point > types there are no load or store instructions that preserve this and > XFmode can have paddin

[PATCH] [v2] rtl-optimization/116002 - cselib hash is bad

2024-07-22 Thread Richard Biener
The following addresses the bad hash function of cselib which uses integer plus for merging. This causes a huge number of collisions for the testcase in the PR and thus very large compile-time. The following rewrites it to use inchash, eliding duplicate mixing of RTX code and mode in some cases a

[PATCH] constify inchash

2024-07-22 Thread Richard Biener
The following constifies parts of inchash. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. * inchash.h (inchash::end): Make const. (inchash::merge): Take const reference hash argument. (inchash::add_commutative): Likewise. --- gcc/inchash.h | 6 +++--- 1 file

Re: [PATCH] Reduce iteration counts of tsvc tests

2024-07-21 Thread Richard Biener
On Fri, Jul 19, 2024 at 4:25 AM Joern Wolfgang Rennecke wrote: > > As discussed before on gcc@gcc,gnu.org, this patch reduces the iteration > counts of the tsvc tests to avoid timeouts when using simulators. > A few tests needed special attention because they divided "iterations" > by some constan

[PATCH][RFC] tree-optimization/114659 - VN and FP to int punning

2024-07-21 Thread Richard Biener
The following addresses another case where x87 FP loads mangle the bit representation and thus are not suitable for a representative in other types. VN was value-numbering a later integer load of 'x' as the same as a former float load of 'x'. The following disables this when the result is not kno

[PATCH] tree-optimization/58416 - SRA wrt FP type replacements

2024-07-21 Thread Richard Biener
As in other places we have to be careful to use FP modes to represent the underlying bit representation of an object. With x87 floating-point types there are no load or store instructions that preserve this and XFmode can have padding. When SRA faces the situation that a field is accessed with mu

Re: [PATCH] gcc: stop adding -fno-common for checking builds

2024-07-19 Thread Richard Biener
> Am 20.07.2024 um 02:31 schrieb Andrew Pinski : > > On Fri, Jul 19, 2024 at 5:23 PM Sam James wrote: >> >> Originally added in r0-44646-g204250d2fcd084 and r0-44627-gfd350d241fecf6 >> whic >> moved -fno-common from all builds to just checking builds. >> >> Since r10-4867-g6271dd984d7f92,

Re: [PATCH] Treat boolean vector elements as 0/-1 [PR115406]

2024-07-19 Thread Richard Biener
> Am 19.07.2024 um 19:44 schrieb Richard Sandiford : > > Previously we built vector boolean constants using 1 for true > elements and 0 for false elements. This matches the predicates > produced by SVE's PTRUE instruction, but leads to a miscompilation > on AVX512, where all bits of a boolean

[PATCH] rtl-optimization/116002 - cselib hash is bad

2024-07-19 Thread Richard Biener
The following addresses the bad hash function of cselib which uses integer plus for merging. This causes a huge number of collisions for the testcase in the PR and thus very large compile-time. The following rewrites it to use inchash, eliding duplicate mixing of RTX code and mode in some cases a

Re: [PATCH][RFC] c/106800 - support vector condition operation in C

2024-07-18 Thread Richard Biener
> Am 18.07.2024 um 17:37 schrieb Alexander Monakov : > >  > On Thu, 18 Jul 2024, Richard Biener wrote: > >>> If both b and c are scalars and the type of true?b:c has the same size >>> as the element type of a, then b and c are converted to a vector ty

Re: [PATCH][RFC] c/106800 - support vector condition operation in C

2024-07-18 Thread Richard Biener
> Am 18.07.2024 um 16:22 schrieb Alexander Monakov : > >  >> On Thu, 18 Jul 2024, Richard Biener wrote: >> >> The following adds support for vector conditionals in C. The support >> was nearly there already but c_objc_common_truthvalue_conversion >&

Re: [PATCH v1] Match: Only allow single use of MIN_EXPR for SAT_TRUNC form 2 [PR115863]

2024-07-18 Thread Richard Biener
On Thu, Jul 18, 2024 at 2:27 PM wrote: > > From: Pan Li > > The SAT_TRUNC form 2 has below pattern matching. > From: > _18 = MIN_EXPR ; > iftmp.0_11 = (unsigned int) _18; > > To: > _18 = MIN_EXPR ; > iftmp.0_11 = .SAT_TRUNC (_18); .SAT_TRUNC (left_8); > But if there is another use of _1

Re: [PATCH] MATCH: Add simplification for MAX and MIN to match.pd [PR109878]

2024-07-18 Thread Richard Biener
On Wed, Jul 17, 2024 at 1:29 PM Eikansh Gupta wrote: > > Min and max could be optimized if both operands are defined by > (same) variable restricted by an and(&). For signed types, > optimization can be done when both constant have same sign bit. > The patch also adds optimization for specific cas

[PATCH][RFC] c/106800 - support vector condition operation in C

2024-07-18 Thread Richard Biener
The following adds support for vector conditionals in C. The support was nearly there already but c_objc_common_truthvalue_conversion rejecting vector types. Instead of letting them pass there unchanged I chose to instead skip it when parsing conditionals instead as a variant with less possible f

[PATCH] middle-end/115641 - invalid address construction

2024-07-18 Thread Richard Biener
fold_truth_andor_1 via make_bit_field_ref builds an address of a CALL_EXPR which isn't valid GENERIC and later causes an ICE. The following simply avoids the folding for f ().a != 1 || f ().b != 2 as it is a premature optimization anyway. The alternative would have been to build a TARGET_EXPR arou

Re: [PATCH v2] gimple ssa: Teach switch conversion to optimize powers of 2 switches

2024-07-18 Thread Richard Biener
m_exp_index_transform_applied = true; > +} > + > /* Checks whether the range given by individual case statements of the switch > switch statement isn't too big and whether the number of branches actually > satisfies the size of the new array. */ > @@ -973,8 +1274,9 @@ switch_conversion::gen_inbound_check () > bbf->count = e1f->count () + e2f->count (); > >/* Tidy blocks that have become unreachable. */ > - prune_bbs (bbd, m_final_bb, > - m_default_case_nonstandard ? m_default_bb : NULL); > + bool prune_default_bb = !m_default_case_nonstandard > +&& !m_exp_index_transform_applied; > + prune_bbs (bbd, m_final_bb, prune_default_bb ? NULL : m_default_bb); > >/* Fixup the PHI nodes in bbF. */ >fix_phi_nodes (e1f, e2f, bbf); > @@ -1053,8 +1355,19 @@ switch_conversion::expand (gswitch *swtch) >return; > } > > - /* Check the case label values are within reasonable range: */ > - if (!check_range ()) > + /* Sometimes it is possible to use the "exponential index transform" to > help > + switch conversion convert switches which it otherwise could not convert. > + However, we want to do this transform only when we know that switch > + conversion will then really be able to convert the switch. So we first > + check if the transformation is applicable and then maybe later do the > + transformation. */ > + bool exp_transform_viable = is_exp_index_transform_viable (swtch); > + > + /* Check the case label values are within reasonable range. > + > + If we will be doing exponential index transform, the range will be > always > + reasonable. */ > + if (!exp_transform_viable && !check_range ()) > { >gcc_assert (m_reason); >return; > @@ -1076,6 +1389,9 @@ switch_conversion::expand (gswitch *swtch) >/* At this point all checks have passed and we can proceed with the > transformation. */ > > + if (exp_transform_viable) > +exp_index_transform (swtch); > + >create_temp_arrays (); >gather_default_values (m_default_case_nonstandard >? gimple_switch_label (swtch, 1) > diff --git a/gcc/tree-switch-conversion.h b/gcc/tree-switch-conversion.h > index 6939eec6018..1a865f85f3a 100644 > --- a/gcc/tree-switch-conversion.h > +++ b/gcc/tree-switch-conversion.h > @@ -743,6 +743,19 @@ public: >/* Collection information about SWTCH statement. */ >void collect (gswitch *swtch); > > + /* Check that the 'exponential index transform' can be applied. > + > + See the comment at the function definition for more details. */ > + bool is_exp_index_transform_viable (gswitch *swtch); > + > + /* Perform the 'exponential index transform'. > + > + The exponential index transform shrinks the range of case numbers which > + helps switch conversion convert switches it otherwise could not. > + > + See the comment at the function definition for more details. */ > + void exp_index_transform (gswitch *swtch); > + >/* Checks whether the range given by individual case statements of the > switch > switch statement isn't too big and whether the number of branches > actually > satisfies the size of the new array. */ > @@ -900,6 +913,11 @@ public: > >/* True if CFG has been changed. */ >bool m_cfg_altered; > + > + /* True if exponential index transform has been applied. See the comment > at > + the definition of exp_index_transform for details about the > + transformation. */ > + bool m_exp_index_transform_applied; > }; > > void > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

RE: [RFC][middle-end] SLP Early break and control flow support in GCC

2024-07-18 Thread Richard Biener
On Wed, 17 Jul 2024, Tamar Christina wrote: > > -Original Message- > > From: Richard Biener > > Sent: Tuesday, July 16, 2024 4:08 PM > > To: Tamar Christina > > Cc: GCC Patches ; Richard Sandiford > > > > Subject: Re: [RFC][middle-end] S

Re: [PATCH] Update SLP reductions process.

2024-07-18 Thread Richard Biener
st, > please see this link: > > https://godbolt.org/z/5Tfqs9zqj > > > 在 2024/07/18 15:05, Richard Biener 写道: > > On Thu, 18 Jul 2024, Jiawei wrote: > > > >> This patch improves SLP reduction handling by ensuring proper processing > >> even for a singl

Re: [PATCH v2] gimple-fold: consistent dump of builtin call simplifications

2024-07-18 Thread Richard Biener
On Wed, Jul 17, 2024 at 9:07 PM Rubin Gerritsen wrote: > > Sorry for the inconvenience, here the patch is attached as an attachment. Pushed as r15-2134-gcee56fe0ba757c > Rubin > ____ > From: Richard Biener > Sent: 17 July 2024 1:01 PM > To:

Re: [PATCH] Update SLP reductions process.

2024-07-18 Thread Richard Biener
if (scalar_stmts.length() > 1) { > + vec roots = vNULL; > + vec remain = vNULL; > + if (!vect_build_slp_instance(loop_vinfo, slp_inst_kind_reduc_group, > scalar_stmts, roots, remain, max_tree_size, &limit, bst_map, NULL)) { > + scalar_stmt

Re: [PATCH v1] Doc: Add Standard-Names ustrunc and sstrunc for integer modes

2024-07-18 Thread Richard Biener
On Thu, Jul 18, 2024 at 7:35 AM Andrew Pinski wrote: > > On Wed, Jul 17, 2024 at 9:20 PM wrote: > > > > From: Pan Li > > > > This patch would like to add the doc for the Standard-Names > > ustrunc and sstrunc, include both the scalar and vector integer > > modes. > > Thanks for doing this and t

Re: [PATCH v3] testsuite: Add dg-do run to more tests

2024-07-17 Thread Richard Biener
On Thu, Jul 18, 2024 at 4:09 AM Sam James wrote: > > All of these are for wrong-code bugs. Confirmed to be used before but > with no execution. > > Tested on x86_64-pc-linux-gnu and checked test logs before/after. OK for both. > 2024-07-18 Sam James > > PR c++/53288 > PR c++/5

Re: [PATCH] Implement a -ftrapping-math/-fsignaling-nans TODO in match.pd.

2024-07-17 Thread Richard Biener
On Thu, Jul 18, 2024 at 12:54 AM Roger Sayle wrote: > > I've been investigating some (float)i == CST optimizations for match.pd, > and noticed there's already a TODO comment in match.pd that's relatively > easy to implement. When CST is a NaN, we only need to worry about > exceptions with flag_tr

Re: [PATCH v1] Match: Bugfix .SAT_TRUNC honor types has no mode precision [PR115961]

2024-07-17 Thread Richard Biener
> Am 17.07.2024 um 23:13 schrieb Richard Sandiford : > > Andrew Pinski writes: >>> On Wed, Jul 17, 2024 at 1:03 PM Tamar Christina >>> wrote: >>> >>>> -Original Message- >>>> From: Richard Sandiford >>>>

Re: [PATCH] varasm: Shorten assembly of strings with larger zero regions

2024-07-17 Thread Richard Biener
> Am 17.07.2024 um 16:45 schrieb Jakub Jelinek : > > On Wed, Jul 17, 2024 at 04:15:16PM +0200, Richard Biener wrote: >> Ok. Is there a more general repeat byte op available? > > I think >.skipbytes, fill > but not sure what assemblers do support that, not

Re: [PATCH] bitint: Use gsi_insert_on_edge rather than gsi_insert_on_edge_immediate [PR115887]

2024-07-17 Thread Richard Biener
> Am 17.07.2024 um 16:01 schrieb Jakub Jelinek : > > Hi! > > The following testcase ICEs on x86_64-linux, because we try to > gsi_insert_on_edge_immediate a statement on an edge which already has > statements queued with gsi_insert_on_edge, and the deferral has been > intentional so that we d

Re: [PATCH] varasm: Shorten assembly of strings with larger zero regions

2024-07-17 Thread Richard Biener
> Am 17.07.2024 um 15:55 schrieb Jakub Jelinek : > > Hi! > > When not using .base64 directive, we emit for long sequences of zeros >.string"foobarbaz" >.string "" >.string "" >.string "" >.string "" >.string "" >.string "" >.string "" >.string "" >.

Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-17 Thread Richard Biener
On Wed, Jul 17, 2024 at 3:17 PM Richard Sandiford wrote: > > Richard Biener writes: > > On Wed, Jul 17, 2024 at 1:53 PM Tejas Belagod wrote: > >> > >> On 7/17/24 4:36 PM, Richard Biener wrote: > >> > On Wed, Jul 17, 2024 at 10:17 AM Tejas Belagod >

Re: [PATCH v2] MATCH: Simplify (a ? x : y) eq/ne (b ? x : y) [PR111150]

2024-07-17 Thread Richard Biener
On Tue, Jul 16, 2024 at 3:36 PM Eikansh Gupta wrote: > > This patch adds match pattern for `(a ? x : y) eq/ne (b ? x : y)`. > In forwprop1 pass, depending on the type of `a` and `b`, GCC produces > `vec_cond` or `cond_expr`. Based on the observation that `(x != y)` is > TRUE, the pattern can be op

Re: [PATCH v9 07/10] Give better error messages for musttail

2024-07-17 Thread Richard Biener
On Mon, Jul 8, 2024 at 7:00 PM Andi Kleen wrote: > > When musttail is set, make tree-tailcall give error messages > when it cannot handle a call. This avoids vague "other reasons" > error messages later at expand time when it sees a musttail > function not marked tail call. > > In various cases th

Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-17 Thread Richard Biener
On Wed, Jul 17, 2024 at 1:53 PM Tejas Belagod wrote: > > On 7/17/24 4:36 PM, Richard Biener wrote: > > On Wed, Jul 17, 2024 at 10:17 AM Tejas Belagod > > wrote: > >> > >> On 7/15/24 6:05 PM, Richard Biener wrote: > >>> On Mon, Jul

[PATCH] tree-optimization/104515 - store motion and clobbers

2024-07-17 Thread Richard Biener
The following addresses an old regression when end-of-object/storage clobbers were introduced. In particular when there's an end-of-object clobber in a loop but no corresponding begin-of-object we can still perform store motion of may-aliased refs when we re-issue the end-of-object/storage on the

[PATCH] tree-optimization/115959 - ICE with SLP condition reduction

2024-07-17 Thread Richard Biener
The following fixes how during reduction epilogue generation we gather conditional compares for condition reductions, thereby following the reduction chain via STMT_VINFO_REDUC_IDX. The issue is that SLP nodes for COND_EXPRs can have either three or four children dependent on whether we have legac

Re: [PATCH] gimple ssa: Teach switch conversion to optimize powers of 2 switches

2024-07-17 Thread Richard Biener
On Tue, 16 Jul 2024, Filip Kastl wrote: > On Wed 2024-07-10 11:34:44, Richard Biener wrote: > > On Mon, 8 Jul 2024, Filip Kastl wrote: > > > > > Hi, > > > > > > I'm replying to Richard and keeping Andrew in cc since your suggestions > > >

Re: [PATCH v1] Match: Bugfix .SAT_TRUNC honor types has no mode precision [PR115961]

2024-07-17 Thread Richard Biener
On Wed, Jul 17, 2024 at 11:48 AM wrote: > > From: Pan Li > > The .SAT_TRUNC matching doesn't check the type has mode precision. Thus > when bitfield like below will be recog as .SAT_TRUNC. > > struct e > { > unsigned pre : 12; > unsigned a : 4; > }; > > __attribute__((noipa)) > void bug (e *

Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-17 Thread Richard Biener
On Wed, Jul 17, 2024 at 10:17 AM Tejas Belagod wrote: > > On 7/15/24 6:05 PM, Richard Biener wrote: > > On Mon, Jul 15, 2024 at 1:22 PM Tejas Belagod wrote: > >> > >> On 7/15/24 12:16 PM, Tejas Belagod wrote: > >>> On 7/12/24 6:40 PM, Richard Biener wro

Re: [PATCH v2] gimple-fold: consistent dump of builtin call simplifications

2024-07-17 Thread Richard Biener
On Wed, Jul 17, 2024 at 12:47 PM Richard Biener wrote: > > On Tue, Jul 16, 2024 at 9:30 PM rubin.gerritsen wrote: > > > > Changes since v1: > > * Added DCO signoff > > * Removed tabs from commit message > > > > -- > > Previously only simplific

Re: [PATCH v2] gimple-fold: consistent dump of builtin call simplifications

2024-07-17 Thread Richard Biener
On Tue, Jul 16, 2024 at 9:30 PM rubin.gerritsen wrote: > > Changes since v1: > * Added DCO signoff > * Removed tabs from commit message > > -- > Previously only simplifications of the `__st[xrp]cpy_chk` > were dumped. Now all call replacement simplifications are > dumped. > > Examples of stateme

Re: [PATCH] gimple-fold: Fix up __builtin_clear_padding lowering [PR115527]

2024-07-17 Thread Richard Biener
On Wed, 17 Jul 2024, Jakub Jelinek wrote: > On Wed, Jul 17, 2024 at 11:10:34AM +0200, Richard Biener wrote: > > OK. It's a bit late for the 11 branch without some soaking on trunk - > > when do we use __builtin_clear_padding? IIRC for C++ atomics? > > Apparently in G

Re: [PATCH] varasm: Fix bootstrap after the .base64 changes [PR115958]

2024-07-17 Thread Richard Biener
((t - s) > 1 || cnt <= 2)) > { > @@ -8584,7 +8584,7 @@ default_elf_asm_output_ascii (FILE *f, c > break; > } > } > - if (cnt > (t - s + 2) / 3 * 4 && (t - s) >= 3) > + if (cnt > ((unsigned) (t - s

Re: [PATCH] testsuite: Add dg-do run to another test

2024-07-17 Thread Richard Biener
builtin-convertvector-1.c > 2024-07-16 18:54:55.907042232 +0200 > @@ -1,3 +1,4 @@ > +/* { dg-do run } */ > /* { dg-skip-if "double support is incomplete" { "avr-*-*" } } */ > > extern > > Jakub > > -- Richard Biener SUSE Software

Re: [PATCH] gimple-fold: Fix up __builtin_clear_padding lowering [PR115527]

2024-07-17 Thread Richard Biener
long b; char c; struct S d[3]; long long e; char f; > } t1, t2; > --- gcc/testsuite/c-c++-common/torture/builtin-clear-padding-6.c.jj > 2024-07-16 16:55:10.331460214 +0200 > +++ gcc/testsuite/c-c++-common/torture/builtin-clear-padding-6.c > 2024-07-16 17:06:56.940508833

Re: [PATCH] rtl-ssa: Fix move range canonicalisation [PR115929]

2024-07-16 Thread Richard Biener
On Tue, Jul 16, 2024 at 4:30 PM Richard Sandiford wrote: > > In this PR, canonicalize_move_range walked off the end of a list > and triggered a null dereference. There are multiple ways of fixing > that, but I think the approach taken in the patch should be > relatively efficient. > > Tested on a

Re: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call

2024-07-16 Thread Richard Biener
bug_tree (_q40) > type public unsigned DI > size > unit-size > align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type > 0x76a437e0 precision:64 min max > > pointer_to_this > > visited > def_stmt _4 = *

Re: [RFC][middle-end] SLP Early break and control flow support in GCC

2024-07-16 Thread Richard Biener
oment, one of the big downsides of re-using the existing cbranch is > that > in the target we can't tell whether the result of the cbranch is actually > needed > or not. > > i.e. for SVE we can't tell if the predicate is needed. For the cases where we > don't stay inside the vector loop we can generate more efficient code if we > know > that the loop only cares about any or all bits set and doesn't need to know > which one. > > For this reason I propose adding new optabs cbranch_any_ and branch_all_ and > have emit_cmp_and_jump_insns lower to these when appropriate. Hmm, but isn't this then more a vec_cmpeq_any that produces a scalar rather than a vector and then a regular scalar compare-and-jump? That is, does SVE have such a compare instruction? Can you show the current code-gen and how the improved one would look like? > Are the general idea and steps OK? See above. Thanks for the write-up. Richard. > If so I'll start implementation now. > > Thanks, > Tamar > > [1] Yishen Chen, Charith Mendis, and Saman Amarasinghe. 2022. All > You Need Is Superword-Level Parallelism: Systematic Control-Flow > Vectorization with SLP. In Proceedings of the 43rd ACM SIGPLAN > International Conference on Programming Language Design and > Implementation (PLDI '22), June 13?17, 2022, San Diego, CA, USA. > ACM, NewYork, NY, USA, 15 pages. https://doi.org/10.1145/3519939. > 3523701 > > [2] Predicated Static Single Assignment > > Lori Carter Beth Simon Brad Calder Larry Carter Jeanne Ferrante > Department of Computer Science and Engineering > University of California, San Diego > flcarter,esimon,calder,carter,ferran...@cs.ucsd.edu > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] Fixup unaligned load/store cost for znver4

2024-07-16 Thread Richard Biener
On Tue, 16 Jul 2024, Richard Biener wrote: > On Mon, 15 Jul 2024, Jan Hubicka wrote: > > > > Currently unaligned YMM and ZMM load and store costs are cheaper than > > > aligned which causes the vectorizer to purposely mis-align accesses > > > by adding an align

Re: [PATCH]middle-end: fix 0 offset creation and folding [PR115936]

2024-07-16 Thread Richard Biener
asetype = sizetype; > - record_common_cand (data, build_int_cst (basetype, 0), iv->step, use); > + record_common_cand (data, build_int_cst (basetype, 0), > + fold_convert (basetype, iv->step), use); But this looks redundant? iv->step should already be sizetyp

Re: [PATCH] Fixup unaligned load/store cost for znver4

2024-07-16 Thread Richard Biener
On Tue, 16 Jul 2024, Jakub Jelinek wrote: > On Tue, Jul 16, 2024 at 01:04:50PM +0200, Richard Biener wrote: > > Do you think this needs a new RC? > > Guess that depends on if somebody would actually perform testing with that > option... ... on a Zen4 machine. I think that

Re: [PATCH] Fixup unaligned load/store cost for znver4

2024-07-16 Thread Richard Biener
On Tue, 16 Jul 2024, Richard Biener wrote: > On Tue, 16 Jul 2024, Jakub Jelinek wrote: > > > On Tue, Jul 16, 2024 at 10:55:30AM +0200, Richard Biener wrote: > > > On Tue, 16 Jul 2024, Jakub Jelinek wrote: > > > > > > > On Tue, Jul 16, 2024

Re: [PATCH] Fixup unaligned load/store cost for znver4

2024-07-16 Thread Richard Biener
On Tue, 16 Jul 2024, Jakub Jelinek wrote: > On Tue, Jul 16, 2024 at 10:55:30AM +0200, Richard Biener wrote: > > On Tue, 16 Jul 2024, Jakub Jelinek wrote: > > > > > On Tue, Jul 16, 2024 at 10:43:27AM +0200, Richard Biener wrote: > > > > I've pushed it to t

[PATCH] tree-optimization/115841 - reduction epilogue placement issue

2024-07-16 Thread Richard Biener
When emitting the compensation to the vectorized main loop for a vector reduction value to be re-used in the vectorized epilogue we fail to place it in the correct block when the main loop is known to be entered (no loop_vinfo->main_loop_edge) but the epilogue is not (a loop_vinfo->skip_this_loop_e

Re: [PATCH] Fixup unaligned load/store cost for znver4

2024-07-16 Thread Richard Biener
On Tue, 16 Jul 2024, Jakub Jelinek wrote: > On Tue, Jul 16, 2024 at 10:43:27AM +0200, Richard Biener wrote: > > I've pushed it to trunk now and am running local CPU 2017 to check for > > obvious fallout on Zen4 so we can make 14.2 RC early next week. There's > >

[PATCH] Fixup unaligned load/store cost for znver5

2024-07-16 Thread Richard Biener
Currently unaligned YMM and ZMM load and store costs are cheaper than aligned which causes the vectorizer to purposely mis-align accesses by adding an alignment prologue. It looks like the unaligned costs were simply copied from the bogus znver4 costs. The following makes the unaligned costs equa

Re: [PATCH] Fixup unaligned load/store cost for znver4

2024-07-16 Thread Richard Biener
/* cost of unaligned stores. */ > > + {6, 6, 10, 10, 12}, /* cost of unaligned loads. */ > > + {8, 8, 8, 12, 12}, /* cost of unaligned stores. */ > >2, 2, 2, /* cost of moving XMM,YMM,ZMM > >register. */ > >6, /* cost of moving SSE register > > to integer. */ > > -- > > 2.35.3 > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] Add --param vect-aligned-ldst-cost-bias to allow alignment peeling testing

2024-07-15 Thread Richard Biener
> Am 15.07.2024 um 19:08 schrieb Richard Sandiford : > > Richard Biener writes: >> The following adds a new --param for debugging the vectorizers alignment >> peeling by increasing the cost of aligned stores. >> >> Bootstrap & regtest running on x86_64

Re: [PATCH 4/4] vect: Optimize order of lane-reducing statements in loop def-use cycles

2024-07-15 Thread Richard Biener
On Sat, Jul 13, 2024 at 5:49 PM Feng Xue OS wrote: > > When transforming multiple lane-reducing operations in a loop reduction chain, > originally, corresponding vectorized statements are generated into def-use > cycles starting from 0. The def-use cycle with smaller index, would contain > more st

Re: [PATCH 3/4] vect: Support multiple lane-reducing operations for loop reduction [PR114440]

2024-07-15 Thread Richard Biener
On Sat, Jul 13, 2024 at 5:48 PM Feng Xue OS wrote: > > For lane-reducing operation(dot-prod/widen-sum/sad) in loop reduction, current > vectorizer could only handle the pattern if the reduction chain does not > contain other operation, no matter the other is normal or lane-reducing. > > This patch

Re: [PATCH 2/4] vect: Refit lane-reducing to be normal operation

2024-07-15 Thread Richard Biener
On Sat, Jul 13, 2024 at 5:47 PM Feng Xue OS wrote: > > Vector stmts number of an operation is calculated based on output vectype. > This is over-estimated for lane-reducing operation, which would cause vector > def/use mismatched when we want to support loop reduction mixed with lane- > reducing a

Re: [PATCH 1/4] vect: Add a unified vect_get_num_copies for slp and non-slp

2024-07-15 Thread Richard Biener
On Sat, Jul 13, 2024 at 5:46 PM Feng Xue OS wrote: > > Extend original vect_get_num_copies (pure loop-based) to calculate number of > vector stmts for slp node regarding a generic vect region. > > Thanks, > Feng > --- > gcc/ > * tree-vectorizer.h (vect_get_num_copies): New overload functio

Re: [match.pd PATCH] PR tree-optimization/114661: Generalize MULT_EXPR recognition (take #2)

2024-07-15 Thread Richard Biener
ard. > > 2024-07-14 Roger Sayle > Richard Biener > > gcc/ChangeLog > PR tree-optimization/114661 > * match.pd ((X*C1)|(X*C2) to X*(C1+C2)): Allow optional useless > type conversions around multiplicaitions, such as those inserted > by this t

Re: [PATCH] gimple-fold: consistent dump of builtin call simplifications

2024-07-15 Thread Richard Biener
On Sun, Jul 14, 2024 at 10:15 AM rubin.gerritsen wrote: > > Previously only simplifications of the `__st[xrp]cpy_chk` > were dumped. Now all call replacement simplifications are > dumped. > > Examples of statements with corresponding dumpfile entries: > > `printf("mystr\n");`: > optimized: s

Re: [PATCH] testsuite: Disable finate math only for test [PR115826]

2024-07-15 Thread Richard Biener
gt; /* { dg-additional-options "--param vect-epilogues-nomask=0" } */ > /* { dg-require-effective-target vect_float } */ > > +/* This test requires +-Inf and NaN, so disable finite-math-only */ > +/* { dg-additional-options "-fno-finite-math-only" } */ > +

[PATCH] Add --param vect-aligned-ldst-cost-bias to allow alignment peeling testing

2024-07-15 Thread Richard Biener
The following adds a new --param for debugging the vectorizers alignment peeling by increasing the cost of aligned stores. Bootstrap & regtest running on x86_64-unknown-linux-gnu. This makes the PR115843 testcase fail again on trunk (but not on the branch), seemingly uncovering another backend is

Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-15 Thread Richard Biener
On Mon, Jul 15, 2024 at 1:22 PM Tejas Belagod wrote: > > On 7/15/24 12:16 PM, Tejas Belagod wrote: > > On 7/12/24 6:40 PM, Richard Biener wrote: > >> On Fri, Jul 12, 2024 at 3:05 PM Jakub Jelinek wrote: > >>> > >>> On Fri, Jul 12, 2024 at 02:56:53PM +

Re: [PATCH][PR115565] cse: Don't use a valid regno for non-register in comparison_qty

2024-07-15 Thread Richard Biener
On Mon, 15 Jul 2024, Maciej W. Rozycki wrote: > On Sun, 30 Jun 2024, Maciej W. Rozycki wrote: > > > > The patch is OK for trunk, thanks. I agree that it's a regression > > > from 08a692679fb8. Since it's fixing such a hard-to-diagnose wrong > > > code bug, and since it seems very safe, I think

[PATCH] tree-optimization/115843 - fix wrong-code with fully-masked loop and peeling

2024-07-15 Thread Richard Biener
When AVX512 uses a fully masked loop and peeling we fail to create the correct initial loop mask when the mask is composed of multiple components in some cases. The following fixes this by properly applying the bias for the component to the shift amount. Bootstrap and regtest running on x86_64-un

[PATCH] Fixup unaligned load/store cost for znver4

2024-07-15 Thread Richard Biener
Currently unaligned YMM and ZMM load and store costs are cheaper than aligned which causes the vectorizer to purposely mis-align accesses by adding an alignment prologue. It looks like the unaligned costs were simply left untouched from znver3 where they equate the aligned costs when tweaking alig

Re: [PATCH] varasm: Add support for emitting binary data with the new gas .base64 directive

2024-07-15 Thread Richard Biener
On Mon, 15 Jul 2024, Jakub Jelinek wrote: > On Mon, Jul 15, 2024 at 09:52:18AM +0200, Richard Biener wrote: > > > .string "k" > > > .string "" > > > .string "" > > > .string "\37

Re: [PATCH] varasm: Add support for emitting binary data with the new gas .base64 directive

2024-07-15 Thread Richard Biener
On Mon, 15 Jul 2024, Jakub Jelinek wrote: > On Mon, Jul 15, 2024 at 09:16:29AM +0200, Richard Biener wrote: > > > Nick has implemented a new .base64 directive in gas (to be shipped in > > > the upcoming binutils 2.43; big thanks for that). > > > See https://sourcewar

Re: [PATCH] varasm: Add support for emitting binary data with the new gas .base64 directive

2024-07-15 Thread Richard Biener
> + prev_base64 = false; > + } > +#else > + (void) last_base64; > + (void) prev_base64; > +#endif > + >if (p < limit && (p - s) <= (long) ELF_STRING_LIMIT) > { > if (bytes_in_chunk > 0) > --- gcc/configure.jj

Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-12 Thread Richard Biener
On Fri, Jul 12, 2024 at 3:05 PM Jakub Jelinek wrote: > > On Fri, Jul 12, 2024 at 02:56:53PM +0200, Richard Biener wrote: > > Padding is only an issue for very small vectors - the obvious choice is > > to disallow vector types that would require any padding. I can hardly >

Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-12 Thread Richard Biener
On Fri, Jul 12, 2024 at 12:44 PM Tejas Belagod wrote: > > On 7/12/24 11:46 AM, Richard Biener wrote: > > On Fri, Jul 12, 2024 at 6:17 AM Tejas Belagod wrote: > >> > >> On 7/10/24 4:37 PM, Richard Biener wrote: > >>> On Wed, Jul 10, 2024

Re: [Patch, tree-optimization, predcom] Improve unroll factor for predictive commoning

2024-07-12 Thread Richard Biener
On Fri, Jul 12, 2024 at 12:09 PM Ajit Agarwal wrote: > > Hello Richard: > > On 11/07/24 2:21 pm, Richard Biener wrote: > > On Thu, Jul 11, 2024 at 10:30 AM Ajit Agarwal > > wrote: > >> > >> Hello All: > >> > >> Unroll factor is determi

RE: [PATCH][ivopts]: perform affine fold on unsigned addressing modes known not to overflow. [PR114932]

2024-07-12 Thread Richard Biener
On Thu, 11 Jul 2024, Tamar Christina wrote: > -Original Message- > > From: Richard Biener > > Sent: Thursday, July 11, 2024 12:39 PM > > To: Tamar Christina > > Cc: gcc-patches@gcc.gnu.org; nd ; bin.ch...@linux.alibaba.com > > Subject: RE: [PATCH

Re: [PATCH 2/4] vect: Fix inaccurate vector stmts number for slp reduction with lane-reducing

2024-07-12 Thread Richard Biener
set instead of SLP_TREE_VECTYPE? As said having wrong > > SLP_TREE_NUMBER_OF_VEC_STMTS is going to backfire. > > Then the alternative is to limit special handling related to the vec_num only > inside vect_transform_reduction. Is that ok? Or any other suggestion? I think that's kind

Re: [PATCH] Fix SSA_NAME leak due to def_stmt is removed before use_stmt.

2024-07-12 Thread Richard Biener
On Fri, Jul 12, 2024 at 7:24 AM liuhongt wrote: > > >- _5 = __atomic_fetch_or_8 (&set_work_pending_p, 1, 0); > >- # DEBUG old => (long int) _5 > >+ _6 = .ATOMIC_BIT_TEST_AND_SET (&set_work_pending_p, 0, 1, 0, > >__atomic_fetch_or_8); > >+ # DEBUG old => NULL > > # DEBUG BEGIN_STMT > >- # D

Re: [PATCH] gimple ssa: Teach switch conversion to optimize powers of 2 switches

2024-07-12 Thread Richard Biener
statement (GSI_SAME_STMT behavior). */ > > static tree > gen_pow2p (gimple_stmt_iterator *gsi, bool before, location_t loc, tree op) > { > tree result; > > /* Use either .POPCOUNT (op) == 1 or op & -op == op. */ > tree type = TREE_TYPE (op); > gimple *s

[PATCH][v2] tree-optimization/115868 - ICE with .MASK_CALL in simdclone

2024-07-11 Thread Richard Biener
The following adjusts mask recording which didn't take into account that we can merge call arguments from two vectors like _50 = {vect_d_1.253_41, vect_d_1.254_43}; _51 = VIEW_CONVERT_EXPR(mask__19.257_49); _52 = (unsigned int) _51; _53 = _Z3bazd.simdclone.7 (_50, _52); _54 = BIT_FIELD_R

Re: [PATCH v2 10/11] Add new bbitmap class

2024-07-11 Thread Richard Biener
On Thu, Jul 11, 2024 at 2:18 PM Andrew Carlotti wrote: > > This class provides a constant-size bitmap that can be used as almost a > drop-in replacement for bitmaps stored in integer types. The > implementation is entirely within the header file and uses recursive > templated operations to suppor

Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-11 Thread Richard Biener
On Fri, Jul 12, 2024 at 6:17 AM Tejas Belagod wrote: > > On 7/10/24 4:37 PM, Richard Biener wrote: > > On Wed, Jul 10, 2024 at 12:44 PM Richard Sandiford > > wrote: > >> > >> Tejas Belagod writes: > >>> On 7/10/24 2:38 PM, Richard Biener wrot

Re: [PATCH 3/3] RISC-V: load and store-lanes with SLP

2024-07-11 Thread Richard Biener
On Wed, 10 Jul 2024, Richard Sandiford wrote: > Richard Biener writes: > > The following is a prototype for how to represent load/store-lanes > > within SLP. I've for now settled with having a single load node > > with multiple permute nodes acting as selection, one fo

Re: [PATCH 2/2] RISC-V: Allow uninitialized preferred_else_value for RVV

2024-07-11 Thread Richard Biener
On Thu, Jul 11, 2024 at 2:45 PM YunQiang Su wrote: > > Richard Biener 于2024年7月11日周四 20:21写道: > > > > On Thu, Jul 11, 2024 at 2:13 PM YunQiang Su wrote: > > > > > > From: YunQiang Su > > > > > > PR target/115840. > > > > >

Re: [PATCH 1/2] Add allow_uninitialized to tree_base.u.bits for VAR_DECL

2024-07-11 Thread Richard Biener
On Thu, Jul 11, 2024 at 2:14 PM YunQiang Su wrote: > > From: YunQiang Su > > Uninitialized internal temp variable may be useful in some case, > such as for COND_LEN_MUL etc on RISC-V with V extension: If an > const or pre-exists VAR is used, we have to use "undisturbed" > policy; if an uninitiali

<    2   3   4   5   6   7   8   9   10   11   >