Re: [Bug ipa/114531] Feature proposal for an `-finline-functions-aggressive` compiler option

2024-06-25 Thread Jan Hubicka via Gcc-bugs
> different issue from the one that is raised in the PR. (Unless we think that > -O2 and -O3 should always have the same inlining heuristics henceforward, but > that seems unlikely.) Yes, I think point of -O3 is to let compiler to be more aggressive than what seems desirable for your average

Re: [Bug c++/110137] implement clang -fassume-sane-operator-new

2024-06-04 Thread Jan Hubicka via Gcc-bugs
> Is the option supposed to be only about the standard global scope operator > new/delete (_Znam etc.) or also user operator new/delete class methods? If > the > former, then I agree it is a global property (or at least a per shared > library/binary property, one can arrange stuff with symbol

[gcc r15-581] Fix points_to_local_or_readonly_memory_p wrt TARGET_MEM_REF

2024-05-16 Thread Jan Hubicka via Gcc-cvs
https://gcc.gnu.org/g:96d53252aefcbc2fe419c4c3b4bcd3fc03d4d187 commit r15-581-g96d53252aefcbc2fe419c4c3b4bcd3fc03d4d187 Author: Jan Hubicka Date: Thu May 16 15:33:55 2024 +0200 Fix points_to_local_or_readonly_memory_p wrt TARGET_MEM_REF TARGET_MEM_REF can be used to offset

[gcc r15-512] Avoid pointer compares on TYPE_MAIN_VARIANT in TBAA

2024-05-15 Thread Jan Hubicka via Gcc-cvs
https://gcc.gnu.org/g:9b7cad5884f21cc5783075be0043777448db3fab commit r15-512-g9b7cad5884f21cc5783075be0043777448db3fab Author: Jan Hubicka Date: Wed May 15 14:14:27 2024 +0200 Avoid pointer compares on TYPE_MAIN_VARIANT in TBAA while building more testcases for ipa-icf I noticed

Re: [Bug libstdc++/109442] Dead local copy of std::vector not removed from function

2024-05-14 Thread Jan Hubicka via Gcc-bugs
This patch attempts to add __builtin_operator_new/delete. So far they are not optimized, which will need to be done by extra flag of BUILT_IN_ code. also the decl.cc code can be refactored to be less of cut and I guess has_builtin hack to return proper value needs to be moved to C++ FE. However

[gcc r15-482] Reduce recursive inlining of always_inline functions

2024-05-14 Thread Jan Hubicka via Gcc-cvs
https://gcc.gnu.org/g:1ec49897253e093e1ef6261eb104ac0c111bac83 commit r15-482-g1ec49897253e093e1ef6261eb104ac0c111bac83 Author: Jan Hubicka Date: Tue May 14 12:58:56 2024 +0200 Reduce recursive inlining of always_inline functions this patch tames down inliner on (mutiply)

GNU Tools Cauldron 2024

2024-05-09 Thread Jan Hubicka via Gcc
Hello, we are pleased to invite you all to the next GNU Tools Cauldron, taking place in Prague, Czech Republic, on September 14-16, 2024. As for the previous instances, we have setup a wiki page for details: https://gcc.gnu.org/wiki/cauldron2024 Like last year, we are having to charge

gcc-wwwdocs branch master updated. 3aee4b0adcb86280ecdaec41447e7ff4f8d8c0a7

2024-05-07 Thread Jan Hubicka via Gcc-cvs-wwwdocs
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "gcc-wwwdocs". The branch, master has been updated via 3aee4b0adcb86280ecdaec41447e7ff4f8d8c0a7 (commit) from

Re: How stable is the CFG and basic block IDs?

2024-04-30 Thread Jan Hubicka via Gcc
> > The problem is testing. If gcc would re-number the basic blocks then > > tests comparing hard-coded test paths would break, even though the path > > coverage itself would be just fine (and presumably the change to the > > basic block indices), which would add an unreasonable maintenance > >

[gcc r14-10093] Remove repeated information in -ftree-loop-distribute-patterns doc

2024-04-23 Thread Jan Hubicka via Gcc-cvs
https://gcc.gnu.org/g:6f0a646dd2fc59e9c9cde63718b36085f84a19ba commit r14-10093-g6f0a646dd2fc59e9c9cde63718b36085f84a19ba Author: Jan Hubicka Date: Tue Apr 23 15:51:42 2024 +0200 Remove repeated information in -ftree-loop-distribute-patterns doc We have:

Re: [Bug ipa/113907] [11/12/13/14 regression] ICU miscompiled on x86 since r14-5109-ga291237b628f41

2024-04-09 Thread Jan Hubicka via Gcc-bugs
There is still problem with loop bounds. I am testing patch on that and then we should be (finally) finally safe.

[gcc r14-9705] Hash operands of PHI in ipa-icf

2024-03-28 Thread Jan Hubicka via Gcc-cvs
https://gcc.gnu.org/g:0923fe2d4808c16b72c1d1bfe28220dd326d8b76 commit r14-9705-g0923fe2d4808c16b72c1d1bfe28220dd326d8b76 Author: Jan Hubicka Date: Thu Mar 28 13:24:54 2024 +0100 Hash operands of PHI in ipa-icf This patch fixes cache colision on function whose body differs only by

[gcc r14-9516] Add missing config/i386/zn4zn5.md file

2024-03-18 Thread Jan Hubicka via Gcc-cvs
https://gcc.gnu.org/g:dfc9d1cc8353bdd7fbc37bc10bb3fd40f49fa4af commit r14-9516-gdfc9d1cc8353bdd7fbc37bc10bb3fd40f49fa4af Author: Jan Hubicka Date: Mon Mar 18 14:24:10 2024 +0100 Add missing config/i386/zn4zn5.md file gcc/ChangeLog: * config/i386/zn4zn5.md: Add

[gcc r14-9515] Add AMD znver5 processor enablement with scheduler model

2024-03-18 Thread Jan Hubicka via Gcc-cvs
https://gcc.gnu.org/g:d0aa0af9a9b7dd709a8c7ff6604ed6b7da0fc23a commit r14-9515-gd0aa0af9a9b7dd709a8c7ff6604ed6b7da0fc23a Author: Jan Hubicka Date: Mon Mar 18 10:22:44 2024 +0100 Add AMD znver5 processor enablement with scheduler model 2024-02-14 Jan Hubicka

Re: [Bug ipa/114262] Over-inlining when optimizing for size with gnu_inline function

2024-03-07 Thread Jan Hubicka via Gcc-bugs
> Note GCC has not retuned its -Os heurstics for a long time because it has been > decent enough for most folks and corner cases like this is almost never come > up. There were quite few changes to -Os heuristics :) One of bigger challenges is that we do see more and more C++ code built with -Os

Re: [Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread Jan Hubicka via Gcc-bugs
Looking at the prototype patch, why need to change also the splitters? My original goal was to use splitters to expand to faster code sequences while having patterns necessary for both variants. This makes it possible to use optimize_insn_for_size/speed and make decisions using BB profile, since

Re: [Bug tree-optimization/113787] [12/13/14 Regression] Wrong code at -O with ipa-modref on aarch64

2024-02-14 Thread Jan Hubicka via Gcc-bugs
> > I guess PTA gets around by tracking points-to set also for non-pointer > > types and consequently it also gives up on any such addition. > > It does. But note it does _not_ for POINTER_PLUS where it treats > the offset operand as non-pointer. > > > I think it is

Re: [Bug target/113233] LoongArch: target options from LTO objects not respected during linking

2024-01-04 Thread Jan Hubicka via Gcc-bugs
> Confirm. But option save/restore has been always implemented: > > .section.gnu.lto_.opts,"",@progbits > .ascii "'-fno-openmp' '-fno-openacc' '-fno-pie' '-fcf-protection" > .ascii "=none' '-mabi=lp64d' '-march=loongarch64' '-mfpu=64' '-m" > .ascii "simd=lasx'

Re: libgcov, fork, and mingw (and other targets without the full POSIX set)

2023-12-01 Thread Jan Hubicka via Gcc
> On Dez 01 2023, Richard Biener via Gcc wrote: > > > Hmm, so why's it then referenced and not "GCed"? > > This has nothing to do with garbage collection. It's just the way > libgcc avoids having too many source files. It would be exactly the > same if every function were in its own file. THe

Re: Help needed in output relocations

2023-10-18 Thread Jan Hubicka via Gcc
> Hello, Hi, > I have almost completed the output of relocation entries. The only thing > that remains is to output the corresponding symbols in .symtab. In my > current design, I store the info about relocation entry and the symbol > name. However, the problem I am facing with this approach is

Re: Clarification regarding various classes DIE's attribute value class

2023-10-11 Thread Jan Hubicka via Gcc
> Hello, > I am working on a project to produce the LTO object file from the compiler > directly. So far, we have > correctly outputted .symtab along with various .debug sections. The only > thing remaining is to > correctly output attribute values and their corresponding values in the >

Re: Incremental LTO Project

2023-09-10 Thread Jan Hubicka via Gcc
> Hi! > > On 2023-09-07T19:00:49-0400, James Hu via Gcc wrote: > > I noticed that adding incremental LTO was a GSoC project that was not > > claimed this cycle ( > > https://summerofcode.withgoogle.com/programs/2023/organizations/gnu-compiler-collection-gcc). > > I was curious about working on

Re: Check that passes do not forget to define profile

2023-08-24 Thread Jan Hubicka via Gcc-patches
> On Thu, Aug 24, 2023 at 3:15 PM Jan Hubicka via Gcc-patches > wrote: > > > > Hi, > > this patch extends verifier to check that all probabilities and counts are > > initialized if profile is supposed to be present. This is a bit complicated > &

Check that passes do not forget to define profile

2023-08-24 Thread Jan Hubicka via Gcc-patches
Hi, this patch extends verifier to check that all probabilities and counts are initialized if profile is supposed to be present. This is a bit complicated by the posibility that we inline !flag_guess_branch_probability function into function with profile defined and in this case we need to stop

Fix profile update in tree-ssa-reassoc

2023-08-23 Thread Jan Hubicka via Gcc-patches
Hi, this patch adds missing profile update to maybe_optimize_range_tests. Jakub, I hope I got the code right: I think it basically analyzes the chain of conditionals, finds some basic blocks involved in the range testing and then puts all the test into first BB. The patch fixes

Re: Loop-ch improvements, part 3

2023-08-23 Thread Jan Hubicka via Gcc-patches
> We seem to peel one iteration for no good reason. The loop is > a do-while loop already. The key is we see the first iteration > exit condition is known not taken and then: Hi, this is patch fixing wrong return value in should_duplicate_loop_header_p. Doing so uncovered suboptimal decisions on

Re: Loop-ch improvements, part 3

2023-08-22 Thread Jan Hubicka via Gcc-patches
> > We seem to peel one iteration for no good reason. The loop is > a do-while loop already. The key is we see the first iteration > exit condition is known not taken and then: > > Registering value_relation (path_oracle) (iter.24_6 > iter.24_5) (root: > bb2) > Stmt is static (constant

Re: [Bug middle-end/111088] useless 'xor eax,eax' inserted when a value is not returned and icf

2023-08-21 Thread Jan Hubicka via Gcc-bugs
> But adds a return with a value. And then the inliner inlines foo into foo2 but > we still have the return with a value around ... I guess ICF can special case unused return value, but why this is not taken care of by ipa-sra?

Re: [PATCH] tree-optimization/110991 - unroll size estimate after vectorization

2023-08-14 Thread Jan Hubicka via Gcc-patches
> The following testcase shows that we are bad at identifying inductions > that will be optimized away after vectorizing them because SCEV doesn't > handle vectorized defs. The following rolls a simpler identification > of SSA cycles covering a PHI and an assignment with a binary operator > with

Avoid division by zero in fold_loop_internal_call

2023-08-14 Thread Jan Hubicka via Gcc-patches
Hi, My patch to fix profile after folding internal call is missing check for the case profile was already zero before if-conversion. Bootstrapped/regtested x86_64-linux, comitted. gcc/ChangeLog: PR gcov-profile/110988 * tree-cfg.cc (fold_loop_internal_call): Avoid division by

Fix division by zero in tree-ssa-loop-split

2023-08-10 Thread Jan Hubicka via Gcc-patches
Hi, Profile update I added to tree-ssa-loop-split can divide by zero in situation that the conditional is predicted with 0 probability which is triggered by jump threading update in the testcase. gcc/ChangeLog: PR middle-end/110923 * tree-ssa-loop-split.cc (split_loop): Watch for

Fix profile update in duplicat_loop_body_to_header_edge for loops with 0 count_in

2023-08-10 Thread Jan Hubicka via Gcc-patches
Hi, this patch makes duplicate_loop_body_to_header_edge to not drop profile counts to uninitialized when count_in is 0. This happens because profile_probability in 0 count is undefined. Bootstrapped/regtested x86_64-linux, committed. gcc/ChangeLog: * cfgloopmanip.cc

Fix profile updating bug in tree-ssa-threadupdate

2023-08-10 Thread Jan Hubicka via Gcc-patches
Hi, ssa_fix_duplicate_block_edges later calls update_profile to correct profile after threading. In the testcase this does not work since we lose track of the duplicated edge. This happens because redirect_edge_and_branch returns NULL if the edge already has correct destination which is the

Fix undefined behaviour in profile_count::differs_from_p

2023-08-10 Thread Jan Hubicka via Gcc-patches
Hi, This patch avoid overflow in profile_count::differs_from_p and also makes it to return false from one of the values is undefined while other is defined. Bootstrapped/regtested x86_64-linux, comitted. gcc/ChangeLog: * profile-count.cc (profile_count::differs_from_p): Fix overflow and

Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.

2023-08-10 Thread Jan Hubicka via Gcc-patches
> On Thu, Aug 10, 2023 at 9:42 AM Uros Bizjak wrote: > > > > On Thu, Aug 10, 2023 at 9:40 AM Richard Biener > > wrote: > > > > > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt wrote: > > > > > > > > Currently we have 3 different independent tunes for gather > > > >

Fix profile update after versioning ifconverted loop

2023-08-06 Thread Jan Hubicka via Gcc-patches
Hi, If loop is ifconverted and later versioning by vectorizer, vectorizer will reuse the scalar loop produced by ifconvert. Curiously enough it does not seem to do so for versions produced by loop distribution while for loop distribution this matters (since since both ldist versions survive to

Fix profile update after peeled epilogues

2023-08-06 Thread Jan Hubicka via Gcc-patches
Hi, Epilogue peeling expects the scalar loop to have same number of executions as the vector loop which is true at the beggining of vectorization. However if the epilogues are vectorized, this is no longer the case. In this situation the loop preheader is replaced by new guard code with correct

Re: Disable loop distribution for loops with estimated iterations 0

2023-08-04 Thread Jan Hubicka via Gcc-patches
> On Fri, Aug 4, 2023 at 9:16 AM Jan Hubicka via Gcc-patches > wrote: > > > > Hi, > > this prevents useless loop distribiton produced in hmmer. With FDO we now > > correctly work out that the loop created for last iteraiton is not going to > > iterate howe

Re: Fix profile upate after vectorizer peeling

2023-08-04 Thread Jan Hubicka via Gcc-patches
Hi, so I found the problem. We duplicate multiple paths and end up with: ;; basic block 6, loop depth 0, count 365072224 (estimated locally, freq 0.3400) ;; prev block 12, next block 7, flags: (NEW, REACHABLE, VISITED) ;; pred: 4 [never (guessed)] count:0 (estimated locally, freq

Disable loop distribution for loops with estimated iterations 0

2023-08-04 Thread Jan Hubicka via Gcc-patches
Hi, this prevents useless loop distribiton produced in hmmer. With FDO we now correctly work out that the loop created for last iteraiton is not going to iterate however loop distribution still produces a verioned loop that has no chance to survive loop vectorizer since we only keep distributed

Re: Fix profile upate after vectorizer peeling

2023-08-04 Thread Jan Hubicka via Gcc-patches
> > > > A couple cycles ago I separated most of code to distinguish between the > > back and forward threaders. There is class jt_path_registry that is > > common to both, and {fwd,back}_jt_path_registry for the forward and > > backward threaders respectively. It's not perfect, but it's a start.

Update estimated iteraitons counts after splitting

2023-08-03 Thread Jan Hubicka via Gcc-patches
Hi, Hmmer's internal function has 4 loops. The following is the profile at start: loop 1: estimate 472 iterations by profile: 473.497707 (reliable) count in:84821 (precise, freq 0.9979) loop 2: estimate 99 iterations by profile: 100.00 (reliable) count in:39848881

Fix profiledbootstrap

2023-08-03 Thread Jan Hubicka via Gcc-patches
Hi, Profiledbootstrap fails with ICE in update_loop_exit_probability_scale_dom_bbs called from loop unroling. The reason is that under relatively rare situations, we may run into case where loop has multiple exits and all are considered as likely but then we scale down the profile and one of the

Re: Fix profile upate after vectorizer peeling

2023-08-03 Thread Jan Hubicka via Gcc-patches
> > Jeff, an help would be appreciated here :) > > > > I will try to debug this. One option would be to disable branch > > prediciton on vect_check for time being - it is not inlined anyway > Not a lot of insight. The backwards threader uses a totally different API > for the CFG/SSA updates and

Re: [PATCH] ipa-sra: Don't consider CLOBBERS as writes preventing splitting

2023-08-03 Thread Jan Hubicka via Gcc-patches
> On Mon, Jul 31, 2023 at 7:05 PM Martin Jambor wrote: > > > > Hi, > > > > when IPA-SRA detects whether a parameter passed by reference is > > written to, it does not special case CLOBBERs which means it often > > bails out unnecessarily, especially when dealing with C++ destructors. > > Fixed by

Re: Fix profile upate after vectorizer peeling

2023-08-03 Thread Jan Hubicka via Gcc-patches
> > > > Note most of the profile consistency checks FAIL when testing with -m32 on > > x86_64-unknown-linux-gnu ... > > > > For example vect-11.c has > > > > ;; basic block 4, loop depth 0, count 719407024 (estimated locally, > > freq 0.6700), maybe hot > > ;; Invalid sum of incoming counts

Re: Fix profile upate after vectorizer peeling

2023-08-03 Thread Jan Hubicka via Gcc-patches
> > Note most of the profile consistency checks FAIL when testing with -m32 on > x86_64-unknown-linux-gnu ... > > For example vect-11.c has > > ;; basic block 4, loop depth 0, count 719407024 (estimated locally, > freq 0.6700), maybe hot > ;; Invalid sum of incoming counts 708669602

Re: [PATCH] Swap loop splitting and final value replacement

2023-08-03 Thread Jan Hubicka via Gcc-patches
> The following swaps the loop splitting pass and the final value > replacement pass to avoid keeping the IV of the earlier loop > live when not necessary. The existing gcc.target/i386/pr87007-5.c > testcase shows that we otherwise fail to elide an empty loop > later. I don't see any good reason

Fix profile update after cancelled loop distribution

2023-08-02 Thread Jan Hubicka via Gcc-patches
Hi, Loop distribution and ifcvt introduces verisons of loops which may be removed later if vectorization fails. Ifcvt does this by temporarily breaking profile and producing conditional that has two arms with 100% probability because we know one of the versions will be removed. Loop distribution

Fix profile upate after vectorizer peeling

2023-08-01 Thread Jan Hubicka via Gcc-patches
Hi, This patch fixes update after constant peeling in profilogue. We now reached 0 profile update bugs on tramp3d vectorizaiton and also on quite few testcases, so I am enabling the testuiste checks so we do not regress again. Bootstrapped/regtested x86_64, comitted. Honza gcc/ChangeLog:

Re: [Predicated Ins vs Branches] O3 and PGO result in 2x performance drop relative to O2

2023-08-01 Thread Jan Hubicka via Gcc
> > If I comment it out as above patch, then O3/PGO can get 16% and 12% > > performance > > improvement compared to O2 on x86. > > > > O2 O3 PGO > > cycles 2,497,674,824 2,104,993,224 2,199,753,593 > > instructions

Re: [Predicated Ins vs Branches] O3 and PGO result in 2x performance drop relative to O2

2023-08-01 Thread Jan Hubicka via Gcc-bugs
> > If I comment it out as above patch, then O3/PGO can get 16% and 12% > > performance > > improvement compared to O2 on x86. > > > > O2 O3 PGO > > cycles 2,497,674,824 2,104,993,224 2,199,753,593 > > instructions

Fix profile update after loop versioning in vectorizer

2023-07-29 Thread Jan Hubicka via Gcc-patches
Hi, Vectorizer while loop versioning produces a versioned loop guarded with two conditionals of the form if (cond1) goto scalar_loop else goto next_bb next_bb: if (cond2) godo scalar_loop else goto vector_loop It wants the combined test to be prob (whch is set to likely)

Re: Loop-split improvements, part 3

2023-07-28 Thread Jan Hubicka via Gcc-patches
> On Fri, Jul 28, 2023 at 2:57 PM Jan Hubicka via Gcc-patches > wrote: > > > > Hi, > > This patch extends tree-ssa-loop-split to understand test of the form > > if (i==0) > > and > > if (i!=0) > > which triggers only during the first iteration.

Loop-split improvements, part 3

2023-07-28 Thread Jan Hubicka via Gcc-patches
Hi, This patch extends tree-ssa-loop-split to understand test of the form if (i==0) and if (i!=0) which triggers only during the first iteration. Naturally we should also be able to trigger last iteration or split into 3 cases if the test indeed can fire in the middle of the loop. Last

Re: Loop-split improvements, part 2

2023-07-28 Thread Jan Hubicka via Gcc-patches
> On Fri, Jul 28, 2023 at 9:58 AM Jan Hubicka via Gcc-patches > wrote: > > > > Hi, > > this patch fixes profile update in the first case of loop splitting. > > The pass still gives up on very basic testcases: > > > > __attribute__ ((noinline,noipa)) >

Re: [Bug tree-optimization/106293] [13/14 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022

2023-07-28 Thread Jan Hubicka via Gcc-bugs
> This heuristic wants to catch > > > if (foo) abort (); > > > and avoid sinking "too far" across a path with "similar enough" > execution count (I think the original motivation was to fix some > spilling / register pressure issue). The loop depth test > should be !(bb_loop_depth

Loop-split improvements, part 2

2023-07-28 Thread Jan Hubicka via Gcc-patches
Hi, this patch fixes profile update in the first case of loop splitting. The pass still gives up on very basic testcases: __attribute__ ((noinline,noipa)) void test1 (int n) { if (n <= 0 || n > 10) return; for (int i = 0; i <= n; i++) { if (i < n) do_something ();

loop-split improvements, part 1

2023-07-28 Thread Jan Hubicka via Gcc-patches
Hi, while looking on profile misupdate on hmmer I noticed that loop splitting pass is not able to handle the loop it has as an example it should apply on: One transformation of loops like: for (i = 0; i < 100; i++) { if (i < 50) A; else B; }

Make store likely in optimize_mask_stores

2023-07-27 Thread Jan Hubicka via Gcc-patches
Hi, as discussed with Richard, we want store to be likely in optimize_mask_stores. Bootstrapped/regtested x86_64-linux, comitted. gcc/ChangeLog: * tree-vect-loop.cc (optimize_mask_stores): Make store likely. diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index

Fix profile update after RTL unrolling

2023-07-27 Thread Jan Hubicka via Gcc-patches
This patch fixes profile update after RTL unroll, that is now done same way as in tree one. We still produce (slightly) corrupted profile for multiple exit loops I can try to fix incrementally. I also updated testcases to look for profile mismatches so they do not creep back in again.

Fix profile update in tree_transform_and_unroll_loop

2023-07-27 Thread Jan Hubicka via Gcc-patches
Hi, This patch fixes profile update in tree_transform_and_unroll_loop which is used by predictive comming. I stared by attempt to fix gcc.dg/tree-ssa/update-unroll-1.c I xfailed last week, but it turned to be harder job. Unrolling was never fixed for changes in duplicate_loop_body_to_header_edge

Fix profile update in tree-ssa-loop-im.cc

2023-07-27 Thread Jan Hubicka via Gcc-patches
Hi, this fixes two bugs in tree-ssa-loop-im.cc. First is that cap probability is not reliable, but it is constructed with adjusted quality. Second is that sometimes the conditional has wrong joiner BB count. This is visible on testsuite/gcc.dg/pr102385.c however the testcase triggers another

Fix profile_count::apply_probability

2023-07-27 Thread Jan Hubicka via Gcc-patches
Hi, profile_count::apply_probability misses check for uninitialized probability which leads to completely random results on applying uninitialized probability to initialized scale. This can make difference when i.e. inlining -fno-guess-branch-probability function to -fguess-branch-probability

Fix profile_count::to_sreal_scale

2023-07-26 Thread Jan Hubicka via Gcc-patches
Hi, this patch makes profile_count::to_sreal_scale consider the scale unknown when in is 0. This fixes the case where loop has 0 executions in profile feedback and thus we can't determine its trip count. Bootstrapped/regtested x86_64-linux, comitted. Honza gcc/ChangeLog: *

Re: Fix optimize_mask_stores profile update

2023-07-21 Thread Jan Hubicka via Gcc-patches
> On Mon, Jul 17, 2023 at 12:36 PM Jan Hubicka via Gcc-patches > wrote: > > > > Hi, > > While looking into sphinx3 regression I noticed that vectorizer produces > > BBs with overall probability count 120%. This patch fixes it. > > Richi, I don't know how

Re: [Bug target/110758] [14 Regression] 8% hmmer regression on zen1/3 with -Ofast -march=native -flto between g:8377cf1bf41a0a9d (2023-07-05 01:46) and g:3a61ca1b9256535e (2023-07-06 16:56); g:d76d19c

2023-07-21 Thread Jan Hubicka via Gcc-bugs
> I suspect this is most likely the profile updates changes ... Quite possibly. The goal of this excercise is to figure out if there are some bugs in profile estimate or whether passes somehow preffer broken profile or if it is just back luck. Looking at sphinx and fatigue it seems that LRA

Re: [PATCH]AArch64 fix regexp for live_1.c sve test

2023-07-21 Thread Jan Hubicka via Gcc-patches
Avoid scaling flat loop profiles of vectorized loops As discussed, when vectorizing loop with static profile, it is not always good idea to divide the header frequency by vectorization factor because the profile may not realistically represent the expected number of iterations. Since in such

Fix gcc.dg/tree-ssa/copy-headers-9.c and gcc.dg/tree-ssa/dce-1.c failures

2023-07-21 Thread Jan Hubicka via Gcc-patches
Hi, this patch fixes template in the two testcases so it matches the output correctly. I did not re-test after last changes in the previous patch, sorry for that. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/copy-headers-9.c: Fix template for tree-ssa-loop-ch.cc changes. *

Implement flat loop profile detection

2023-07-21 Thread Jan Hubicka via Gcc-patches
Hi, this patch adds maybe_flat_loop_profile which can be used in loop profile udpate to detect situation where the profile may be unrealistically flat and should not be dwonscalled after vectorizing, unrolling and other transforms that assume that loop has high iteration count even if the CFG

Fix sreal::to_int and implement sreal::to_nearest_int

2023-07-21 Thread Jan Hubicka via Gcc-patches
Fix sreal::to_int and implement sreal::to_nearest_int while exploring new loop estimate dumps, I noticed that loop iterating 1.8 times by profile is etimated as iterating once instead of 2 by nb_estimate. While nb_estimate should really be a sreal and I will convert it incrementally, I found

Re: loop-ch improvements, part 5

2023-07-21 Thread Jan Hubicka via Gcc-patches
> > The patch requires bit of testsuite changes > > - I disabled ch in loop-unswitch-17.c since it tests unswitching of > >loop invariant conditional. > > - pr103079.c needs ch disabled to trigger vrp situation it tests for > >(otherwise we optimize stuff earlier and better) > > -

loop-ch improvements, part 5

2023-07-21 Thread Jan Hubicka via Gcc-patches
Hi, currently loop-ch skips all do-while loops. But when loop is not do-while in addition to original goal of turining it to do-while it can do additional things: 1) move out loop invariant computations 2) duplicate loop invariant conditionals and eliminate them in loop body. 3) prove that

finite_loop_p tweak

2023-07-21 Thread Jan Hubicka via Gcc-patches
Hi, we have finite_p flag in loop structure. finite_loop_p already know to use it, but we also may set the flag when we prove loop to be finite by SCEV analysis to avoid duplicated work. Bootstrapped/regtested x86_64-linux, OK? gcc/ChangeLog: * tree-ssa-loop-niter.cc (finite_loop_p):

Improve loop dumping

2023-07-21 Thread Jan Hubicka via Gcc-patches
Hi, we have flow_loop_dump and print_loop. While print_loop was extended to dump stuff from loop structure we added over years (loop info), flow_loop_dump was not. -fdump-tree-all files contains flow_loop_dump which makes it hard to see what metadata we have attached to loop. This patch unifies

Cleanup code determining number of iterations from cfg profile

2023-07-20 Thread Jan Hubicka via Gcc-patches
Hi, this patch cleanups API for determining expected loop iteraitons from profile. We started with having expected_loop_iterations and only source was the integer represented BB counts. It did some work on guessing number of iteration if profile was absent or bogus. Later we introduced loop_info

Re: [PATCH]AArch64 fix regexp for live_1.c sve test

2023-07-20 Thread Jan Hubicka via Gcc-patches
> Tamar Christina writes: > > Hi All, > > > > The resulting predicate register of a whilelo is not > > restricted to the lower half of the predicate register file. > > > > As such these tests started failing after recent changes > > because the whilelo outside the loop is getting assigned p15. >

loop-ch improvements, part 3

2023-07-20 Thread Jan Hubicka via Gcc-patches
Hi, this patch makes tree-ssa-loop-ch to understand if-combined conditionals (which are quite common) and remove the IV-derived heuristics. That heuristics is quite dubious because every variable with PHI in header of integral or pointer type is seen as IV, so in the first basic block we match

Re: Fix optimize_mask_stores profile update

2023-07-17 Thread Jan Hubicka via Gcc-patches
> On Mon, Jul 17, 2023 at 12:36 PM Jan Hubicka via Gcc-patches > wrote: > > > > Hi, > > While looking into sphinx3 regression I noticed that vectorizer produces > > BBs with overall probability count 120%. This patch fixes it. > > Richi, I don't know how

Avoid double profile udpate in try_peel_loop

2023-07-17 Thread Jan Hubicka via Gcc-patches
Hi, try_peel_loop uses gimple_duplicate_loop_body_to_header_edge which subtracts the profile from the original loop. However then it tries to scale the profile in a wrong way (it forces header count to be entry count). This eliminates to profile misupdates in the internal loop of sphinx3.

Fix profile update in scale_profile_for_vect_loop

2023-07-17 Thread Jan Hubicka via Gcc-patches
Hi, when vectorizing 4 times, we sometimes do for <4x vectorized body> for <2x vectorized body> for <1x vectorized body> Here the second two fors handling epilogue never iterates. Currently vecotrizer thinks that the middle for itrates twice. This turns out to be

Fix optimize_mask_stores profile update

2023-07-17 Thread Jan Hubicka via Gcc-patches
Hi, While looking into sphinx3 regression I noticed that vectorizer produces BBs with overall probability count 120%. This patch fixes it. Richi, I don't know how to create a testcase, but having one would be nice. Bootstrapped/regtested x86_64-linux, commited last night (sorry for late email)

Turn TODO_rebuild_frequencies to a pass

2023-07-14 Thread Jan Hubicka via Gcc-patches
Hi, currently we rebuild profile_counts from profile_probability after inlining, because there is a chance that producing large loop nests may get unrealistically large profile_count values. This is much less of concern when we switched to new profile_count representation while back. This

Loop-ch improvements, part 3

2023-07-14 Thread Jan Hubicka via Gcc-patches
Hi, loop-ch currently does analysis using ranger for all loops to identify candidates and then follows by phase where headers are duplicated (which breaks SSA and ranger). The second stage does more analysis (to see how many BBs we want to duplicate) but can't use ranger and thus misses

Loop-ch improvements, part 2

2023-07-12 Thread Jan Hubicka via Gcc-patches
Hi, as discussed this patch moves profile updating to tree-ssa-loop-ch.cc since it is now quite ch specific. There are no functional changes. Boostrapped/regtesed x86_64-linux, comitted. gcc/ChangeLog: * tree-cfg.cc (gimple_duplicate_sese_region): Rename to ...

Re: [x86-64] RFC: Add nosse abi attribute

2023-07-11 Thread Jan Hubicka via Gcc-patches
> > > When a function doesn't contain calls to > > > unknown functions we can be a bit more lenient: we can make it so that > > > GCC simply doesn't touch xmm8-15 at all, then no save/restore is > > > necessary. One may also take into account that first 8 registers are cheaper to encode than the

Loop-ch improvements, part 1

2023-07-11 Thread Jan Hubicka via Gcc-patches
Hi, this patch improves profile update in loop-ch to handle situation where duplicated header has loop invariant test. In this case we konw that all count of the exit edge belongs to the duplicated loop header edge and can update probabilities accordingly. Since we also do all the work to track

Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits

2023-07-11 Thread Jan Hubicka via Gcc-patches
> > By now we did CCP and FRE so we likely optimized out most of constant > > conditionals exposed by inline. > > So maybe we should simply delay re-propagation of the profile? I > think cunrolli doesn't so much care about the profile - cunrolli > is (was) about abstraction removal. Jump

Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits

2023-07-11 Thread Jan Hubicka via Gcc-patches
> > What I saw most wrecking the profile is when passes turn > if (cond) into if (0/1) leaving the CFG adjustment to CFG cleanup > which then simply deletes one of the outgoing edges without doing > anything to the (guessed) profile. Yep, I agree that this is disturbing. At the cfg cleanup time

Re: [x86-64] RFC: Add nosse abi attribute

2023-07-11 Thread Jan Hubicka via Gcc-patches
> > > FWIW, this particular patch was regstrapped on x86-64-linux > > > with trunk from a week ago (and sniff-tested on current trunk). > > > > This looks really cool. > > The biggest benefit might be from IPA with LTO where we'd carefully place > those > attributes at WPA time (at that time

Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits

2023-07-10 Thread Jan Hubicka via Gcc-patches
> On Fri, 7 Jul 2023, Jan Hubicka wrote: > > > > > > > Looks good, but I wonder what we can do to at least make the > > > multiple exit case behave reasonably? The vectorizer keeps track > > > > > of a "canonical" exit, would it be possible to pass in the main > > > exit edge and use that

Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits

2023-07-10 Thread Jan Hubicka via Gcc-patches
Hi, over weekend I found that vectorizer is missing scale_loop_profile for epilogues. It already adjusts loop_info to set max iteraitons, so adding it was easy. However now predicts the first loop to iterate at most once (which is too much, I suppose it forgets to divide by epilogue unrolling

Improve dumping of profile_count

2023-07-09 Thread Jan Hubicka via Gcc-patches
Hi, dumps of profile_counts are quite hard to interpret since they are 64bit fixed point values. In many cases one looks at a single function and it is better to think of basic block frequency, that is how many times it is executed each invocatoin. This patch makes CFG dumps to also print this

Add missing dump_file check

2023-07-08 Thread Jan Hubicka via Gcc-patches
Hi, I forgot to check dump_file being non-NULL before writting to it. It is somewhat odd that this does not trigger more often - I will take deeper look tomorrow, but I am checking this in as obvious to avoid ICE. Honza gcc/ChangeLog: PR tree-optimization/110600 *

Fix profile update in tree-ssa/update-cunroll.c

2023-07-08 Thread Jan Hubicka via Gcc-patches
Fix tree-ssa/update-cunroll.c In this testcase the profile is misupdated before loop has two exits. The first exit is one eliminated by complete unrolling while second exit remains. We remove first exit but forget about fact that the source BB of other exit will then have higher frequency making

Cleanup force_edge_cold

2023-07-07 Thread Jan Hubicka via Gcc-patches
Hi, we can use the new set_edge_probability_and_rescale_others here. Bootstrapped/regtested x86_64-linux, comitted. Honza gcc/ChangeLog: * predict.cc (force_edge_cold): Use set_edge_probability_and_rescale_others; improve dumps. diff --git a/gcc/predict.cc b/gcc/predict.cc

Fix some profile consistency testcases

2023-07-07 Thread Jan Hubicka via Gcc-patches
Hi, Information about profile mismatches is printed only with -details-blocks for some time. I think it should be printed even with default to make it easier to spot when someone introduces new transform that breaks the profile, but I will send separate RFC for that. This patch enables details

Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits

2023-07-07 Thread Jan Hubicka via Gcc-patches
> Hi Both, > > Thanks for all the reviews/patches so far  > > > > > > > Looks good, but I wonder what we can do to at least make the multiple > > > exit case behave reasonably? The vectorizer keeps track > > > > > of a "canonical" exit, would it be possible to pass in the main exit > > > edge

Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits

2023-07-07 Thread Jan Hubicka via Gcc-patches
> > Looks good, but I wonder what we can do to at least make the > multiple exit case behave reasonably? The vectorizer keeps track > of a "canonical" exit, would it be possible to pass in the main > exit edge and use that instead of single_exit (), would other > exits then behave somewhat

Re: Regarding bypass-asm patch and help in an error during bootstrapped build

2023-07-06 Thread Jan Hubicka via Gcc
> Hi, > > I have added the patch ( > https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623379.html ) on the > devel/bypass-asm branch. > Although I am able to build using the --disable-bootstrap option but while > doing a bootstrapped build, I am getting these errors ( as warnings while > doing

  1   2   3   4   5   >