Re: [Bug target/87832] AMD pipeline models are very costly size-wise

2022-11-16 Thread Jan Hubicka via Gcc-bugs
> > Do you mean we should fix modeling of divisions there as well? I don't have > latency/throughput measurements for those CPUs, nor access so I can run > experiments myself, unfortunately. > > I guess you mean just making a patch to model division units separately, > leaving latency/throughput

Fix resolution streaming with incremental linking

2022-11-25 Thread Jan Hubicka via Gcc-patches
Hi, this patch fixes streaming of resolution info when flag_incremental_link == INCREMENTAL_LINK_NOLTO. Here we want to stream the info from WPA to ltrans as usual. Bootstrapped/regtested x86_64-linux, tested with kernel LTO builds. Plan to commit it later today. Honza *

Re: [PATCH] tree-optimization/106912 - IPA profile and pure/const

2022-11-25 Thread Jan Hubicka via Gcc-patches
> > > > Am 25.11.2022 um 11:05 schrieb Jan Hubicka via Gcc-patches > > : > > > >  > >> > >> IPA profile instrumentation tries to clear the pure and const > >> flags of functions but that's quite hopeless in particular for > >>

Re: [PATCH] tree-optimization/106912 - IPA profile and pure/const

2022-11-25 Thread Jan Hubicka via Gcc-patches
> IPA profile instrumentation tries to clear the pure and const > flags of functions but that's quite hopeless in particular for > const since that attribute prevails on the type and thus on each > call to the function leading to inconsistencies in the IL and > eventual checking ICEs. There's no

Re: [PATCH] tree-optimization/106912 - IPA profile and pure/const

2022-11-25 Thread Jan Hubicka via Gcc-patches
> On Fri, 25 Nov 2022, Jan Hubicka wrote: > > > > > > > > > > > Am 25.11.2022 um 11:05 schrieb Jan Hubicka via Gcc-patches > > > > : > > > > > > > >  > > > >> > > > >> IPA profile i

Re: [PATCH 05/12] ipa-sra: Dump edge summaries also for non-candidates

2022-11-16 Thread Jan Hubicka via Gcc-patches
> Hi, > > this should have been part of r12-578-g717d278af93a4a. Call edge > summaries provide information required for IPA-SRA transformations in > the callees but are generated when analyzing callers and thus also > callers which are not IPA-SRA candidates themselves. Therefore we > analyze

Re: [PATCH 01/12] ipa: IPA-SRA split detection simplification

2022-11-16 Thread Jan Hubicka via Gcc-patches
> Hi, > > I have noticed that the flag m_split_modifications_p of > ipa_param_body_adjustments is not really necessary as it has to > correspond to whether m_replacements is non-empty so this patch > removes it. This also simplifies a bit some patches I work on. > > Bootstrapped and tested on

Re: [PATCH 02/12] ipa-cp: Do not consider useless aggregate constants

2022-11-16 Thread Jan Hubicka via Gcc-patches
> Hi, > > When building vectors of known aggregate values, there is no point in > including those for parameters which are not used in any way > whatsoever. > > Bootstrapped and tested on x86_64-linux. OK for master? OK, thanks! Honza > > Thanks, > > Martin > > > gcc/ChangeLog: > >

Re: [PATCH 1/2] symtab: also change RTL decl name

2022-11-22 Thread Jan Hubicka via Gcc-patches
> On Mon, 21 Nov 2022 20:02:49 +0100 > Jan Hubicka wrote: > > > > Hi Honza, Ping. > > > Regtests cleanly for c,fortran,c++,ada,d,go,lto,objc,obj-c++ > > > Ok? > > > I'd need this for attribute target_clones for the Fortran FE. > > Sorry for delay here. > > > > void > > > > @@ -303,6 +301,10

Re: [PATCH 1/2] symtab: also change RTL decl name

2022-11-21 Thread Jan Hubicka via Gcc-patches
> Hi Honza, Ping. > Regtests cleanly for c,fortran,c++,ada,d,go,lto,objc,obj-c++ > Ok? > I'd need this for attribute target_clones for the Fortran FE. Sorry for delay here. > > void > > @@ -303,6 +301,10 @@ symbol_table::change_decl_assembler_name (tree decl, > > tree name) > > warning (0,

Fix wrong code issues with ipa-sra

2023-01-16 Thread Jan Hubicka via Gcc-patches
Hi, this patch fixes wrong code issues in ipa-sra where we are trying to prove that on every execution of a given function a call to other function will happen. The code uses post dominators and makes a wrong query (which passes only for first BB in function). Hoever post-dominators are only valid

Re: [PATCH] IPA: do not release body if still needed

2023-01-14 Thread Jan Hubicka via Gcc-patches
> Hi. > > Noticed during building of libbackend.a with the LTO partial linking. > > The function release_body is called even if clone_of is a clone > of a another function and thus it shares tree declaration. We should > preserve it in that situation. > > Patch can bootstrap on x86_64-linux-gnu

Re: [PATCH] lto: pass through -funwind-tables and -fasynchronous-unwind-tables

2023-01-18 Thread Jan Hubicka via Gcc-patches
> No unwind tables are generated, as if -funwind-tables is ignored. If > LTO is disabled, everything works as expected. I think it is because dwaf2out_do_eh_frame is called out of function context at the end of compilation. At that time cfun is NULL and the flag is read from global settings that

Re: [PATCH] IPA: do not release body if still needed

2023-01-18 Thread Jan Hubicka via Gcc-patches
> The code removing function bodies when the last call graph clone of a > node is removed is too aggressive when there are nodes up the > clone_of chain which still need them. Fixed by expanding the check. > > gcc/ChangeLog: > > 2023-01-18 Martin Jambor > > PR ipa/107944 > *

Re: [PATCH] lto: pass through -funwind-tables and -fasynchronous-unwind-tables

2023-01-18 Thread Jan Hubicka via Gcc-patches
> On Jan 18 2023, Michael Matz wrote: > > > The purest solution is to emit unwind tables for all functions that > > request it into .eh_frame and for those that don't request it put > > into .debug_frame (if also -g is on). > > The assembler does not allow switching back to .eh_frame once a >

Re: [PATCH] middle-end/106075 - non-call EH and DSE

2023-01-18 Thread Jan Hubicka via Gcc-patches
> On Tue, 17 Jan 2023, Jan Hubicka wrote: > > > > > We don't use same argumentation about other control flow statements. > > > > The following: > > > > > > > > fn() > > > > { > > > > try { > > > > i_read_no_global_memory (); > > > > } catch (...) > > > > { > > > > reutrn 1; > > > >

Re: [PATCH] tree-optimization/108449 - keep maybe_special_function_p behavior

2023-01-21 Thread Jan Hubicka via Gcc-patches
> When we have a static declaration without definition we diagnose > that and turn it into an extern declaration. That can alter > the outcome of maybe_special_function_p here and there's really > no point in doing that, so don't. > > Bootstrapped and tested on x86_64-unknown-linux-gnu, OK? > >

Re: Fix wrong code issues with ipa-sra

2023-01-30 Thread Jan Hubicka via Gcc-patches
> Hello, > > Coverity flagged a real issue in this patch: > > On Mon, 16 Jan 2023, Jan Hubicka via Gcc-patches wrote: > > --- a/gcc/ipa-utils.cc > > +++ b/gcc/ipa-utils.cc > [...] > > +bitmap > > +find_always_executed_bbs (function *fun, bool assume_

Re: [PATCH] middle-end/106075 - non-call EH and DSE

2023-01-17 Thread Jan Hubicka via Gcc-patches
> On Tue, 17 Jan 2023, Jan Hubicka wrote: > > > > The following fixes a long-standing bug with DSE removing stores as > > > dead even though they are live across non-call exceptional flow. > > > This affects both GIMPLE and RTL DSE and the fix is similar in > > > making externally throwing

Re: [PATCH] middle-end/106075 - non-call EH and DSE

2023-01-17 Thread Jan Hubicka via Gcc-patches
> The following fixes a long-standing bug with DSE removing stores as > dead even though they are live across non-call exceptional flow. > This affects both GIMPLE and RTL DSE and the fix is similar in > making externally throwing statements uses of non-local stores. > Note this doesn't fix the

Re: [PATCH] middle-end/106075 - non-call EH and DSE

2023-01-17 Thread Jan Hubicka via Gcc-patches
> > We don't use same argumentation about other control flow statements. > > The following: > > > > fn() > > { > > try { > > i_read_no_global_memory (); > > } catch (...) > > { > > reutrn 1; > > } > > return 0; > > } > > > > should be detected as const. Marking throw pure

Re: [PATCH] cgraph_node: Remove redundant section clearing

2022-11-04 Thread Jan Hubicka via Gcc-patches
> Ok for trunk if testing passes? > > gcc/ChangeLog: > > * cgraph.cc (cgraph_node::make_local): Remove redundant set_section. > * multiple_target.cc (create_dispatcher_calls): Likewise. OK (not sure how this slipped in) The code in create_dispatcher_calls is clearly cut of

Re: HELP: Questions on multiple PROGRAM_SUMMARY sections in a profiling data file

2023-03-08 Thread Jan Hubicka via Gcc-patches
> Hi, Jan, > > I am studying one profiling feedback ICE bug with GCC8 recently. > It’s an assertion failure inside the routine “compute_working_sets”of > gcov-io.c: > > gcov_nonruntime_assert (ws_ix == NUM_GCOV_WORKING_SETS); > > After some debugging and study, I found that the corresponding

Re: [PATCH RFC] c++: lambda mangling alias issues [PR107897]

2023-03-08 Thread Jan Hubicka via Gcc-patches
> Tested x86_64-pc-linux-gnu. Does this look good, or do we want to factor the > flag clearing into a symtab_node counterpart to cgraph_node::reset? > > -- 8< -- > > In 107897, by the time we are looking at the mangling clash, the > alias has already been removed from the symbol table by

Re: [PATCH 1/2] ipa-cp: Fix various issues in update_specialized_profile (PR 107925)

2023-03-10 Thread Jan Hubicka via Gcc-patches
> Hi, > > the patch below fixes various issues in function > update_specialized_profile. The main is removal of the assert which > is bogus in the case of recursive cloning. The division of > unexplained counts is guesswork, which then leads to updates of counts > of recursive edges, which then

Re: [PATCH 2/2] ipa-cp: Improve updating behavior when profile counts have gone bad

2023-03-10 Thread Jan Hubicka via Gcc-patches
> Hi, > > Looking into the behavior of profile count updating in PR 107925, I > noticed that an option not considered possible was actually happening, > and - with the guesswork in place to distribute unexplained counts - > it simply can happen. Currently it is handled by dropping the counts >

Re: [PATCH] tree-optimization/106912 - IPA profile and pure/const

2023-03-17 Thread Jan Hubicka via Gcc-patches
> > The following is what I profile-bootstrapped and tested on > x86_64-unknown-linux-gnu. > > Richard. > > From d438a0d84cafced85c90204cba81de0f60ad0073 Mon Sep 17 00:00:00 2001 > From: Richard Biener > Date: Thu, 16 Mar 2023 13:51:19 +0100 > Subject: [PATCH] tree-optimization/106912 - clear

Re: [PATCH] tree-optimization/106912 - IPA profile and pure/const

2023-03-17 Thread Jan Hubicka via Gcc-patches
> > I have in the meantime briefly tested following. > > But if you want to the above way, then at least the testcase could be > useful. Though, not sure if the above is all that is needed. Shouldn't > set_const_flag_1 upon TREE_READONLY (node->decl) = 0; also adjust > TREE_TYPE on the

Fix ICE in profile_count::to_sreal_frequency

2023-03-14 Thread Jan Hubicka via Gcc-patches
As discussed in the PR log, profile_count::to_cgraph_frequency was originally intended to work across function boundary and has some extra logic and sanity check for that. It is used only within single function and with current API it can not really work well globally, so this patch synchronizes

Re: [PATCH] tree-optimization/106912 - IPA profile and pure/const

2023-03-24 Thread Jan Hubicka via Gcc-patches
> On Fri, 17 Mar 2023, Jakub Jelinek wrote: > > > On Fri, Mar 17, 2023 at 08:40:34PM +0100, Jan Hubicka wrote: > > > > + /* Drop the const attribute from the call type (the pure > > > > + attribute is not available on types). */ > > > > + tree fntype

Re: [PATCH] tree-optimization/106912 - IPA profile and pure/const

2023-03-24 Thread Jan Hubicka via Gcc-patches
> From d438a0d84cafced85c90204cba81de0f60ad0073 Mon Sep 17 00:00:00 2001 > From: Richard Biener > Date: Thu, 16 Mar 2023 13:51:19 +0100 > Subject: [PATCH] tree-optimization/106912 - clear const attribute from fntype > To: gcc-patches@gcc.gnu.org > > The following makes sure that after clearing

Re: [PATCH 2/2] [i386] Adjust costing of emulated vectorized gather/scatter

2023-03-24 Thread Jan Hubicka via Gcc-patches
> Emulated gather/scatter behave similar to strided elementwise > accesses in that they need to decompose the offset vector > and construct or decompose the data vector so handle them > the same way, pessimizing the cases with may elements. > > For pr88531-2c.c instead of > > .L4: > leaq

Re: [PATCH] tree-optimization/106912 - IPA profile and pure/const

2023-03-24 Thread Jan Hubicka via Gcc-patches
> > > > Actually on second thought, I think I can break this either by making > > the wraping function to be thunk or alias or by moving it to different > > compilation unit. > > Also with LTO we will get body later. > > > > So I think we need to drop this optimization. > > It's the same

Re: [PATCH] predict: Don't emit -Wsuggest-attribute=cold warning for functions which already have that attribute [PR105685]

2023-03-26 Thread Jan Hubicka via Gcc-patches
> Hi! > > In the following testcase, we predict baz to have cold > entry regardless of the user supplied attribute (as it call > unconditionally a cold function), but still issue > a -Wsuggest-attribute=cold warning despite it having that attribute > already. > > The following patch avoids that.

Enable scatter for generic

2023-03-06 Thread Jan Hubicka via Gcc-patches
Hi, while adding tunes to siable scatters on znver4 I mistakely also disabled them on generic. This patch fixes it. Bootstraped/regtested x86_64, comitted. Honza gcc/ChangeLog: 2023-03-06 Jan Hubicka * config/i386/x86-tune.def (X86_TUNE_USE_SCATTER_2PARTS): Enable for

Re: [PATCH] cgraphclones: Don't share DECL_ARGUMENTS between thunk and its artificial thunk [PR108854]

2023-02-24 Thread Jan Hubicka via Gcc-patches
> Hi! > > The following testcase ICEs on x86_64-linux with -m32. The problem is > we create an artificial thunk and because of -fPIC, ia32 and thunk > destination which doesn't bind locally can't use a mi thunk. > The ICE is because during expansion to RTL we see SSA_NAME for a PARM_DECL, > but

Re: [PATCH] tree-optimization/109304 - properly handle instrumented aliases

2023-04-14 Thread Jan Hubicka via Gcc-patches
> On Tue, 4 Apr 2023, Jan Hubicka wrote: > > > > On Tue, 28 Mar 2023, Richard Biener wrote: > > > > > > > When adjusting calls to reflect instrumentation we failed to handle > > > > calls to aliases since they appear to have no body. Instead resort > > > > to symtab node availability. The

Re: [PATCH] tree-optimization/109304 - properly handle instrumented aliases

2023-04-18 Thread Jan Hubicka via Gcc-patches
> > > > I do not think LTO is of any help here. You can allways call non-LTO > > const function from outer-world and that function can will end up > > calling back to instrumented const function in your unit which > > effectively makes the extenral const function non-const. > > Hmm, true. > >

Re: Fix loop-ch

2023-04-21 Thread Jan Hubicka via Gcc-patches
> Hi, > Ondrej Kubanek implemented profiling of loop histograms which sould be useful > to improve > i.e. quality of loop peeling of verctorization. However it turns out that > most of histograms > are lost on the way from profiling to loop peeling pass (about 90%). One > common case is the >

Re: Unloop no longer looping loops in loop-ch

2023-04-25 Thread Jan Hubicka via Gcc-patches
> On 25 April 2023 17:12:50 CEST, Jan Hubicka via Gcc-patches > wrote: > > + fprintf (stderr, "Bingo\n"); > > You forgot to remove that.. > Do we prune Bingo in the testsuite? ;-) Ah, thanks :) I was curious how much I win with unloo

Re: [PATCH] tree-optimization/109609 - correctly interpret arg size in fnspec

2023-04-25 Thread Jan Hubicka via Gcc-patches
> By majority vote and a hint from the API name which is > arg_max_access_size_given_by_arg_p this interprets a memory access > size specified as given as other argument such as for strncpy > in the testcase which has "1cO313" as specifying the _maximum_ > size read/written rather than the exact

Unloop no longer looping loops in loop-ch

2023-04-25 Thread Jan Hubicka via Gcc-patches
Hi, I noticed this after adding sanity check that the upper bound on number of iterations never drop to -1. It seems to be relatively common case (happening few hundred times in testsuite and also during bootstrap) that loop-ch duplicates enough so the loop itself no longer loops. This is later

Re: Unloop no longer looping loops in loop-ch

2023-04-26 Thread Jan Hubicka via Gcc-patches
> > - if (precise) > > + if (precise > > + && get_max_loop_iterations_int (loop) == 1) > > + { > > + if (dump_file && (dump_flags & TDF_DETAILS)) > > + fprintf (dump_file, "Loop %d no longer loops.\n", loop->num); > > but max loop iterations is 1 ...? I first check for

Re: [PATCH] rtl-optimization/109585 - alias analysis typo

2023-04-25 Thread Jan Hubicka via Gcc-patches
> When r10-514-gc6b84edb6110dd2b4fb improved access path analysis > it introduced a typo that triggers when there's an access to a > trailing array in the first access path leading to false > disambiguation. > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > Honza, does this look OK?

Re: [PATCH] gcov: add info about "calls" to JSON output format

2023-04-14 Thread Jan Hubicka via Gcc-patches
> On 4/11/23 11:23, Richard Biener wrote: > > On Thu, Apr 6, 2023 at 3:58 PM Martin Liška wrote: > >> > >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > >> > >> Ready to be installed after stage1 opens? > > > > Did we release a compiler with version 1? If not we might

Disable X86_TUNE_AVX256_MOVE_BY_PIECES and STORE_BY_PIECES for znver1-3

2023-04-14 Thread Jan Hubicka via Gcc-patches
Hi, I have enabled SSE moves for znver1-3 since they are performance win on this machine too (we avoid using loops or string operations which are more costy). However as discussed in the PR log, this triggers bug in IRA and it was decided it is better to not backport the fix.

Remove dead handling of label_decl in tree merging

2023-04-21 Thread Jan Hubicka via Gcc-patches
Hi, while working on incremental WHOPR with Michal Jires, we noticed that there is code hashing LABEL_DECL_UID in lto-streamer-out which would break the hash table, since label decls are not streamed and gets re-initialized later. The whole conditional is dead since LABEL_DECLs are not merged

Stabilize temporary variable names

2023-04-21 Thread Jan Hubicka via Gcc-patches
Hi, Michal Jires implemented quite well working prototype of cache for WPA which makes it to re-use partitions from from earlier build when package is rebulit with smaller changes. It should be useful to improve edit/compile/debug cycles when one is forced to debug with LTO enabled but

Stabilize inliner Fibonacci heap

2023-04-21 Thread Jan Hubicka via Gcc-patches
Hi, This fixes another problem Michal noticed while working on incrmeental WHOPR. The Fibonacci heap can change its behaviour quite significantly for no good reasons when multiple edges with same key occurs. This is quite common for small functions. This patch stabilizes the order by adding

Fix loop-ch

2023-04-21 Thread Jan Hubicka via Gcc-patches
Hi, Ondrej Kubanek implemented profiling of loop histograms which sould be useful to improve i.e. quality of loop peeling of verctorization. However it turns out that most of histograms are lost on the way from profiling to loop peeling pass (about 90%). One common case is the following

Re: [PATCH RFC] c++: lambda mangling alias issues [PR107897]

2023-03-30 Thread Jan Hubicka via Gcc-patches
> > How about moving it to symtab_node and using dyn_cast for the cgraph bits, > like this: > From 1d869ceb04573727e59be6518903133c8654069a Mon Sep 17 00:00:00 2001 > From: Jason Merrill > Date: Mon, 6 Mar 2023 15:33:45 -0500 > Subject: [PATCH] c++: lambda mangling alias issues [PR107897] > To:

Re: [PATCH] tree-optimization/109304 - properly handle instrumented aliases

2023-04-03 Thread Jan Hubicka via Gcc-patches
> On Tue, 28 Mar 2023, Richard Biener wrote: > > > When adjusting calls to reflect instrumentation we failed to handle > > calls to aliases since they appear to have no body. Instead resort > > to symtab node availability. The patch also avoids touching > > internal function calls in a more

Re: [PATCH] tree-optimization/109304 - properly handle instrumented aliases

2023-04-04 Thread Jan Hubicka via Gcc-patches
> On Tue, Apr 04, 2023 at 01:21:40AM +0200, Jan Hubicka via Gcc-patches wrote: > > It is however really side case and I am worried about dropping > > pure/const from builtin declarations... > > Yeah, that can certainly break stuff. See e.g. the recently fixed > ICE

Re: Patch ping Re: [PATCH] ipa: Avoid another ICE when dealing with type-incompatibilities (PR 108959)

2023-04-04 Thread Jan Hubicka via Gcc-patches
> On Thu, Mar 23, 2023 at 11:09:19AM +0100, Martin Jambor wrote: > > Hi, > > > > PR 108959 shows one more example where undefined code with type > > incompatible accesses to stuff passed in parameters can cause an ICE > > because we try to create a VIEW_CONVERT_EXPR of mismatching sizes: > > > >

Re: [PATCH] ipa: silent -Wodr notes with -w

2023-02-06 Thread Jan Hubicka via Gcc-patches
> On 2/1/23 15:26, Martin Jambor wrote: > > Hi, > > > > On Fri, Dec 02 2022, Martin Liška wrote: > > > If -w is used, warn_odr properly sets *warned = false and > > > so it should be preserved when calling warn_types_mismatch. > > > > > > Noticed that during a LTO reduction where I used -w. > >

Re: [PATCH] tree-optimization/106722 - fix CD-DCE edge marking

2023-02-10 Thread Jan Hubicka via Gcc-patches
> The following fixes a latent issue when we mark control edges but > end up with marking a block with no stmts necessary. In this case > we fail to mark dependent control edges of that block. > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > Does this look OK? > > Thanks, >

Re: [PATCH] ipa: silent -Wodr notes with -w

2023-02-08 Thread Jan Hubicka via Gcc-patches
> > On 2/1/23 15:26, Martin Jambor wrote: > > > Hi, > > > > > > On Fri, Dec 02 2022, Martin Liška wrote: > > > > If -w is used, warn_odr properly sets *warned = false and > > > > so it should be preserved when calling warn_types_mismatch. > > > > > > > > Noticed that during a LTO reduction where

Enable AVX512 512bit vectors by default on Zen4

2023-02-04 Thread Jan Hubicka via Gcc-patches
Hi, this patch enables AVX512 by default on Zen4. While internally 512 registers are splits into two 256 halves, 512 bit vectors reduces number of instructions to retire and has chance to improve paralelism. There are few tsvc benchmarks that improves significantly: runtime benchmark

Re: [PATCH] ipa-split: Don't split returns_twice functions [PR106923]

2023-02-08 Thread Jan Hubicka via Gcc-patches
> Hi! > > As discussed in the PR, returns_twice functions are rare/special beasts > that need special treatment in the cfg, and inside of their bodies > we don't know which part actually works the weird returns twice way > (either in the fork/vfork sense, or in the setjmp) and aren't updating >

Re: [PATCH] cgraph: Handle simd clones in cgraph_node::set_{const,pure}_flag [PR106433]

2023-02-08 Thread Jan Hubicka via Gcc-patches
> Hi! > > The following testcase ICEs, because we determine only in late pure const > pass that bar is const (the content of the function loses a store to a > global var during dse3 and read from it during cddce2) and local-pure-const2 > makes it const. The cgraph ordering is that post IPA (in

Re: [PATCH] cgraph: Handle simd clones in cgraph_node::set_{const,pure}_flag [PR106433]

2023-02-08 Thread Jan Hubicka via Gcc-patches
> On Wed, Feb 08, 2023 at 06:10:08PM +0100, Jan Hubicka wrote: > > My understanding of simd clones is bit limited, but I think you are > > right that they should have the same semantics as their caller. > > > > I think const may be one that makes compiler to ICE, but > > there are many other

Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits

2023-07-06 Thread Jan Hubicka via Gcc-patches
Hi, original scale_loop_profile was implemented to only handle very simple loops produced by vectorizer at that time (basically loops with only one exit and no subloops). It also has not been updated to new profile-count API very carefully. Since I want to use it from loop peeling and unlooping, I

update_bb_profile_for_threading TLC

2023-07-06 Thread Jan Hubicka via Gcc-patches
Hi, this patch applies some TLC to update_bb_profile_for_threading. The function resales probabilities by: FOR_EACH_EDGE (c, ei, bb->succs) c->probability /= prob; which is correct but in case prob is 0 (took all execution counts to the newly constructed path), this leads to

Fix profile update after loop-ch and cunroll

2023-07-06 Thread Jan Hubicka via Gcc-patches
Hi, this patch makes loop-ch and loop unrolling to fix profile in case the loop is known to not iterate at all (or iterate few times) while profile claims it iterates more. While this is kind of symptomatic fix, it is best we can do incase profile was originally esitmated incorrectly. In the

Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits

2023-07-07 Thread Jan Hubicka via Gcc-patches
> > Looks good, but I wonder what we can do to at least make the > multiple exit case behave reasonably? The vectorizer keeps track > of a "canonical" exit, would it be possible to pass in the main > exit edge and use that instead of single_exit (), would other > exits then behave somewhat

Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits

2023-07-07 Thread Jan Hubicka via Gcc-patches
> Hi Both, > > Thanks for all the reviews/patches so far  > > > > > > > Looks good, but I wonder what we can do to at least make the multiple > > > exit case behave reasonably? The vectorizer keeps track > > > > > of a "canonical" exit, would it be possible to pass in the main exit > > > edge

Fix some profile consistency testcases

2023-07-07 Thread Jan Hubicka via Gcc-patches
Hi, Information about profile mismatches is printed only with -details-blocks for some time. I think it should be printed even with default to make it easier to spot when someone introduces new transform that breaks the profile, but I will send separate RFC for that. This patch enables details

Cleanup force_edge_cold

2023-07-07 Thread Jan Hubicka via Gcc-patches
Hi, we can use the new set_edge_probability_and_rescale_others here. Bootstrapped/regtested x86_64-linux, comitted. Honza gcc/ChangeLog: * predict.cc (force_edge_cold): Use set_edge_probability_and_rescale_others; improve dumps. diff --git a/gcc/predict.cc b/gcc/predict.cc

Add missing dump_file check

2023-07-08 Thread Jan Hubicka via Gcc-patches
Hi, I forgot to check dump_file being non-NULL before writting to it. It is somewhat odd that this does not trigger more often - I will take deeper look tomorrow, but I am checking this in as obvious to avoid ICE. Honza gcc/ChangeLog: PR tree-optimization/110600 *

Fix profile update in tree-ssa/update-cunroll.c

2023-07-08 Thread Jan Hubicka via Gcc-patches
Fix tree-ssa/update-cunroll.c In this testcase the profile is misupdated before loop has two exits. The first exit is one eliminated by complete unrolling while second exit remains. We remove first exit but forget about fact that the source BB of other exit will then have higher frequency making

Loop-ch improvements, part 2

2023-07-12 Thread Jan Hubicka via Gcc-patches
Hi, as discussed this patch moves profile updating to tree-ssa-loop-ch.cc since it is now quite ch specific. There are no functional changes. Boostrapped/regtesed x86_64-linux, comitted. gcc/ChangeLog: * tree-cfg.cc (gimple_duplicate_sese_region): Rename to ...

Loop-ch improvements, part 1

2023-07-11 Thread Jan Hubicka via Gcc-patches
Hi, this patch improves profile update in loop-ch to handle situation where duplicated header has loop invariant test. In this case we konw that all count of the exit edge belongs to the duplicated loop header edge and can update probabilities accordingly. Since we also do all the work to track

Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits

2023-07-11 Thread Jan Hubicka via Gcc-patches
> > What I saw most wrecking the profile is when passes turn > if (cond) into if (0/1) leaving the CFG adjustment to CFG cleanup > which then simply deletes one of the outgoing edges without doing > anything to the (guessed) profile. Yep, I agree that this is disturbing. At the cfg cleanup time

Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits

2023-07-11 Thread Jan Hubicka via Gcc-patches
> > By now we did CCP and FRE so we likely optimized out most of constant > > conditionals exposed by inline. > > So maybe we should simply delay re-propagation of the profile? I > think cunrolli doesn't so much care about the profile - cunrolli > is (was) about abstraction removal. Jump

Re: [x86-64] RFC: Add nosse abi attribute

2023-07-11 Thread Jan Hubicka via Gcc-patches
> > > FWIW, this particular patch was regstrapped on x86-64-linux > > > with trunk from a week ago (and sniff-tested on current trunk). > > > > This looks really cool. > > The biggest benefit might be from IPA with LTO where we'd carefully place > those > attributes at WPA time (at that time

Re: [x86-64] RFC: Add nosse abi attribute

2023-07-11 Thread Jan Hubicka via Gcc-patches
> > > When a function doesn't contain calls to > > > unknown functions we can be a bit more lenient: we can make it so that > > > GCC simply doesn't touch xmm8-15 at all, then no save/restore is > > > necessary. One may also take into account that first 8 registers are cheaper to encode than the

Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits

2023-07-10 Thread Jan Hubicka via Gcc-patches
Hi, over weekend I found that vectorizer is missing scale_loop_profile for epilogues. It already adjusts loop_info to set max iteraitons, so adding it was easy. However now predicts the first loop to iterate at most once (which is too much, I suppose it forgets to divide by epilogue unrolling

Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits

2023-07-10 Thread Jan Hubicka via Gcc-patches
> On Fri, 7 Jul 2023, Jan Hubicka wrote: > > > > > > > Looks good, but I wonder what we can do to at least make the > > > multiple exit case behave reasonably? The vectorizer keeps track > > > > > of a "canonical" exit, would it be possible to pass in the main > > > exit edge and use that

Improve dumping of profile_count

2023-07-09 Thread Jan Hubicka via Gcc-patches
Hi, dumps of profile_counts are quite hard to interpret since they are 64bit fixed point values. In many cases one looks at a single function and it is better to think of basic block frequency, that is how many times it is executed each invocatoin. This patch makes CFG dumps to also print this

Loop-ch improvements, part 3

2023-07-14 Thread Jan Hubicka via Gcc-patches
Hi, loop-ch currently does analysis using ranger for all loops to identify candidates and then follows by phase where headers are duplicated (which breaks SSA and ranger). The second stage does more analysis (to see how many BBs we want to duplicate) but can't use ranger and thus misses

Turn TODO_rebuild_frequencies to a pass

2023-07-14 Thread Jan Hubicka via Gcc-patches
Hi, currently we rebuild profile_counts from profile_probability after inlining, because there is a chance that producing large loop nests may get unrealistically large profile_count values. This is much less of concern when we switched to new profile_count representation while back. This

Fix profile update in scale_profile_for_vect_loop

2023-07-17 Thread Jan Hubicka via Gcc-patches
Hi, when vectorizing 4 times, we sometimes do for <4x vectorized body> for <2x vectorized body> for <1x vectorized body> Here the second two fors handling epilogue never iterates. Currently vecotrizer thinks that the middle for itrates twice. This turns out to be

Fix optimize_mask_stores profile update

2023-07-17 Thread Jan Hubicka via Gcc-patches
Hi, While looking into sphinx3 regression I noticed that vectorizer produces BBs with overall probability count 120%. This patch fixes it. Richi, I don't know how to create a testcase, but having one would be nice. Bootstrapped/regtested x86_64-linux, commited last night (sorry for late email)

Avoid double profile udpate in try_peel_loop

2023-07-17 Thread Jan Hubicka via Gcc-patches
Hi, try_peel_loop uses gimple_duplicate_loop_body_to_header_edge which subtracts the profile from the original loop. However then it tries to scale the profile in a wrong way (it forces header count to be entry count). This eliminates to profile misupdates in the internal loop of sphinx3.

Re: Fix optimize_mask_stores profile update

2023-07-17 Thread Jan Hubicka via Gcc-patches
> On Mon, Jul 17, 2023 at 12:36 PM Jan Hubicka via Gcc-patches > wrote: > > > > Hi, > > While looking into sphinx3 regression I noticed that vectorizer produces > > BBs with overall probability count 120%. This patch fixes it. > > Richi, I don't know how

Re: [PATCH]AArch64 fix regexp for live_1.c sve test

2023-07-20 Thread Jan Hubicka via Gcc-patches
> Tamar Christina writes: > > Hi All, > > > > The resulting predicate register of a whilelo is not > > restricted to the lower half of the predicate register file. > > > > As such these tests started failing after recent changes > > because the whilelo outside the loop is getting assigned p15. >

loop-ch improvements, part 3

2023-07-20 Thread Jan Hubicka via Gcc-patches
Hi, this patch makes tree-ssa-loop-ch to understand if-combined conditionals (which are quite common) and remove the IV-derived heuristics. That heuristics is quite dubious because every variable with PHI in header of integral or pointer type is seen as IV, so in the first basic block we match

Cleanup code determining number of iterations from cfg profile

2023-07-20 Thread Jan Hubicka via Gcc-patches
Hi, this patch cleanups API for determining expected loop iteraitons from profile. We started with having expected_loop_iterations and only source was the integer represented BB counts. It did some work on guessing number of iteration if profile was absent or bogus. Later we introduced loop_info

Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits

2023-07-04 Thread Jan Hubicka via Gcc-patches
> On Wed, 28 Jun 2023, Tamar Christina wrote: > > > Hi All, > > > > There's an existing bug in loop frequency scaling where the if statement > > checks > > to see if there's a single exit, and records an dump file note but then > > continues. > > > > It then tries to access the null pointer,

Re: [PATCH] value-prof.cc: Correct edge prob calculation.

2023-07-04 Thread Jan Hubicka via Gcc-patches
> The mod-subtract optimization with ncounts==1 produced incorrect edge > probabilities due to incorrect conditional probability calculation. This > patch fixes the calculation. > > gcc/ChangeLog: > > * value-prof.cc (gimple_mod_subtract_transform): Correct edge > prob calculation.

Re: [PATCH 1/2] ipa-cp: Avoid long linear searches through DECL_ARGUMENTS

2023-05-30 Thread Jan Hubicka via Gcc-patches
> On Mon, May 29, 2023 at 6:20 PM Martin Jambor wrote: > > > > Hi, > > > > there have been concerns that linear searches through DECL_ARGUMENTS > > that are often necessary to compute the index of a particular > > PARM_DECL which is the key to results of IPA-CP can happen often > > enough to be a

Re: [PATCH] inline: improve internal function costs

2023-06-04 Thread Jan Hubicka via Gcc-patches
> On Thu, 1 Jun 2023, Andre Vieira (lists) wrote: > > > Hi, > > > > This is a follow-up of the internal function patch to add widening and > > narrowing patterns. This patch improves the inliner cost estimation for > > internal functions. > > I have no idea why calls are special in IPA

Re: [Predicated Ins vs Branches] O3 and PGO result in 2x performance drop relative to O2

2023-08-01 Thread Jan Hubicka via Gcc-bugs
> > If I comment it out as above patch, then O3/PGO can get 16% and 12% > > performance > > improvement compared to O2 on x86. > > > > O2 O3 PGO > > cycles 2,497,674,824 2,104,993,224 2,199,753,593 > > instructions

Fix profile upate after vectorizer peeling

2023-08-01 Thread Jan Hubicka via Gcc-patches
Hi, This patch fixes update after constant peeling in profilogue. We now reached 0 profile update bugs on tramp3d vectorizaiton and also on quite few testcases, so I am enabling the testuiste checks so we do not regress again. Bootstrapped/regtested x86_64, comitted. Honza gcc/ChangeLog:

Re: Fix profile upate after vectorizer peeling

2023-08-03 Thread Jan Hubicka via Gcc-patches
> > > > Note most of the profile consistency checks FAIL when testing with -m32 on > > x86_64-unknown-linux-gnu ... > > > > For example vect-11.c has > > > > ;; basic block 4, loop depth 0, count 719407024 (estimated locally, > > freq 0.6700), maybe hot > > ;; Invalid sum of incoming counts

Re: [PATCH] ipa-sra: Don't consider CLOBBERS as writes preventing splitting

2023-08-03 Thread Jan Hubicka via Gcc-patches
> On Mon, Jul 31, 2023 at 7:05 PM Martin Jambor wrote: > > > > Hi, > > > > when IPA-SRA detects whether a parameter passed by reference is > > written to, it does not special case CLOBBERs which means it often > > bails out unnecessarily, especially when dealing with C++ destructors. > > Fixed by

Re: [PATCH] Swap loop splitting and final value replacement

2023-08-03 Thread Jan Hubicka via Gcc-patches
> The following swaps the loop splitting pass and the final value > replacement pass to avoid keeping the IV of the earlier loop > live when not necessary. The existing gcc.target/i386/pr87007-5.c > testcase shows that we otherwise fail to elide an empty loop > later. I don't see any good reason

Re: Fix profile upate after vectorizer peeling

2023-08-03 Thread Jan Hubicka via Gcc-patches
> > Note most of the profile consistency checks FAIL when testing with -m32 on > x86_64-unknown-linux-gnu ... > > For example vect-11.c has > > ;; basic block 4, loop depth 0, count 719407024 (estimated locally, > freq 0.6700), maybe hot > ;; Invalid sum of incoming counts 708669602

Re: Fix profile upate after vectorizer peeling

2023-08-03 Thread Jan Hubicka via Gcc-patches
> > Jeff, an help would be appreciated here :) > > > > I will try to debug this. One option would be to disable branch > > prediciton on vect_check for time being - it is not inlined anyway > Not a lot of insight. The backwards threader uses a totally different API > for the CFG/SSA updates and

finite_loop_p tweak

2023-07-21 Thread Jan Hubicka via Gcc-patches
Hi, we have finite_p flag in loop structure. finite_loop_p already know to use it, but we also may set the flag when we prove loop to be finite by SCEV analysis to avoid duplicated work. Bootstrapped/regtested x86_64-linux, OK? gcc/ChangeLog: * tree-ssa-loop-niter.cc (finite_loop_p):

<    1   2   3   4   5   6   >