[Bug lto/116535] LTO partitioning vs. offloading compilation

2024-09-03 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116535 --- Comment #7 from Jan Hubicka --- > void > output_offload_tables (void) > { > ... > > /* In WHOPR mode during the WPA stage the joint offload tables need to be > streamed to one partition only. That's why we free offload_funcs and

[Bug ipa/116410] modref doesn't generate LTO summaries with -ffat-lto-objects (-ffat-lto-objects generates different and inefficient code compared with -fno-fat-lto-objects)

2024-09-03 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116410 --- Comment #11 from Jan Hubicka --- > We plan to adopt -ffat-lto-objects ourselves soon for at least a subset of > packages, so this was good timing. :) Note that -ffat-lto-objects has various issues, especially with library archives. The prob

[Bug tree-optimization/115679] inlining failed in call to 'foo': function not considered for inlining

2024-06-27 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115679 --- Comment #2 from Jan Hubicka --- > With -Og it's usually that the always-inline function is called indirectly - > that's an unsupported case. We can probably add CIF code for functions that were called indirectly but are no more, so this is r

[Bug ipa/114531] Feature proposal for an `-finline-functions-aggressive` compiler option

2024-06-25 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114531 --- Comment #18 from Jan Hubicka --- > different issue from the one that is raised in the PR. (Unless we think that > -O2 and -O3 should always have the same inlining heuristics henceforward, but > that seems unlikely.) Yes, I think point of -

[Bug ipa/114531] Feature proposal for an `-finline-functions-aggressive` compiler option

2024-06-25 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114531 --- Comment #14 from Jan Hubicka --- As for bit of history on this. I have introduced the split -O2 and -O3 limits in order to be able to enable -finline-small-functions at -O2 which we found to be really importnat for C++ codebases which no lo

[Bug ipa/114531] Feature proposal for an `-finline-functions-aggressive` compiler option

2024-06-25 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114531 --- Comment #12 from Jan Hubicka --- If this is without LTO, can you also try the LTO numbers? Inliner behaves sifniciantly different with and without LTO, since LTO introduces many (and often too many) inlining oppurtunities, which sometimes ma

[Bug c++/110137] implement clang -fassume-sane-operator-new

2024-06-05 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110137 --- Comment #15 from Jan Hubicka --- > As pointed out in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035#c13 , > gcc > already assume operator new's retuned pointer cannot alias any existing > pointer. So no change is needed there. Seems yo

[Bug c++/110137] implement clang -fassume-sane-operator-new

2024-06-04 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110137 --- Comment #13 from Jan Hubicka --- > Is the option supposed to be only about the standard global scope operator > new/delete (_Znam etc.) or also user operator new/delete class methods? If > the > former, then I agree it is a global property

[Bug c++/110137] implement clang -fassume-sane-operator-new

2024-06-04 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110137 --- Comment #9 from Jan Hubicka --- Doing global flag has a problem since with LTO or using optimize attribute user may mix code compiled with and without sane operator new. When function with insane operator new gets inlined to a function wit

[Bug middle-end/115277] [13/14/15 regression] ICF needs to match loop bound estimates

2024-05-30 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115277 --- Comment #3 from Jan Hubicka --- > What about gcc 13? GCC 13 also misoptimizes. Honza

[Bug ipa/109914] --suggest-attribute=pure misdiagnoses static functions

2024-05-26 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109914 --- Comment #5 from Jan Hubicka --- > (In reply to Jan Hubicka from comment #2) > > The reason why gcc warns is that it is unable to prove that the function is > > always finite. > > I don't see why finiteness matters. If a pure function return

[Bug ipa/96059] ICE: in remove_unreachable_nodes, at ipa.c:575 with -fdevirtualize-at-ltrans

2024-05-15 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96059 --- Comment #7 from Jan Hubicka --- > Actually, let me drop the PR59859 blocker, as IIRC we've had reports of this > downstream w/o graphite. I think you edited wrong PR here.

[Bug ipa/115097] Strange suboptimal codegen specifically at -O2 when copying struct type

2024-05-15 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115097 --- Comment #7 from Jan Hubicka --- > and then we inline them back, introducing the extra copy. Why do we use > tail-calls here instead of aliases? Why do we lack cost modeling here? Because the function is exported and we must keep addresses

[Bug libstdc++/109442] Dead local copy of std::vector not removed from function

2024-05-14 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109442 --- Comment #21 from Jan Hubicka --- This patch attempts to add __builtin_operator_new/delete. So far they are not optimized, which will need to be done by extra flag of BUILT_IN_ code. also the decl.cc code can be refactored to be less of cut&

[Bug tree-optimization/114959] incorrect TBAA for drived types involving function types

2024-05-07 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114959 --- Comment #4 from Jan Hubicka --- > > I think function types are somewhat special in that they do not denote > objects in the classical sense. They are also most complex and probably > target-dependent to handle. > > Note there's LTO where

[Bug tree-optimization/114774] Missed DSE in simple code due to interleaving sotres

2024-04-19 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114774 --- Comment #5 from Jan Hubicka --- > > Looking into it, instead of having simple outer loop it needs to > > maintain worklist of defs to proceed each annotated with live bitmap, > > rigt? > > Yeah, I have some patch on some branch somewhere ..

[Bug tree-optimization/114774] Missed DSE in simple code due to interleaving sotres

2024-04-19 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114774 --- Comment #3 from Jan Hubicka --- > Yes, DSE walking doesn't "branch" but goes to some length handling some > trivial > branches only. Mainly to avoid compile-time issues. It needs larger > re-structuring to fix that, but in principle it sh

[Bug ipa/114703] Missed devirtualization in rather simple case

2024-04-15 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114703 --- Comment #3 from Jan Hubicka --- > Yep, 'new' memory escapes. Yep, this is blocking a lot of propagation in common C++ code. Here it may help to do speculative devirtualization during IPA stage that will let the late optimization to get rid o

[Bug ipa/113907] [11/12/13/14 regression] ICU miscompiled on x86 since r14-5109-ga291237b628f41

2024-04-09 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907 --- Comment #76 from Jan Hubicka --- There is still problem with loop bounds. I am testing patch on that and then we should be (finally) finally safe.

[Bug gcov-profile/114115] xz-utils segfaults when built with -fprofile-generate (bad interaction between IFUNC and binding?)

2024-04-03 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114115 --- Comment #15 from Jan Hubicka --- > Fixed for GCC 14 so far It is simple patch, so backporting is OK after a week in mainline.

[Bug ipa/113907] [11/12/13/14 regression] ICU miscompiled on x86 since r14-5109-ga291237b628f41

2024-04-02 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907 --- Comment #70 from Jan Hubicka --- Hello, over easter I did some analysis of the cases where ICF is now disabled due to jump function miscompare. Most common case (seen also on GCC) is the situation where function is originally static inline

[Bug tree-optimization/112303] [14 Regression] ICE on valid code at -O3 on x86_64-linux-gnu: verify_flow_info failed since r14-3459-g0c78240fd7d519

2024-03-27 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112303 --- Comment #14 from Jan Hubicka --- > This patch fixes the ICE for me. > Seems we already did something like that in other spots (e.g. in apply_scale). In general if the overflow happens, some pass must have misbehaved and do something crazy w

[Bug ipa/113907] [11/12/13/14 regression] ICU miscompiled since on x86 since r14-5109-ga291237b628f41

2024-03-19 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907 --- Comment #64 from Jan Hubicka --- > Are you going to apply this patch, even if it just helps partially with some > tests and not others? I think we should fix this completely, since it is source of very suprising bugs. I discussed it with Ma

[Bug ipa/113907] [11/12/13/14 regression] ICU miscompiled since on x86 since r14-5109-ga291237b628f41

2024-03-13 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907 --- Comment #57 from Jan Hubicka --- > So, we can punt on differences there (that is desirable for backporting and > maybe GCC 14 too), or we could at that point populate an int vector, which > maps Yep, that is what I do. I had bug in that so

[Bug ipa/114317] Missing optimization for multiple condition statements

2024-03-12 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114317 --- Comment #2 from Jan Hubicka --- > (it would need to elide the stores of course). We do have way to elide stores, since we can optimize out write-only values. What we do not have readilly available is the value written to a reference (ipa-r

[Bug ipa/114262] Over-inlining when optimizing for size with gnu_inline function

2024-03-07 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114262 --- Comment #6 from Jan Hubicka --- > Note GCC has not retuned its -Os heurstics for a long time because it has been > decent enough for most folks and corner cases like this is almost never come > up. There were quite few changes to -Os heurist

[Bug lto/114241] False-positive -Wodr warning when using -flto and -fno-semantic-interposition

2024-03-06 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114241 --- Comment #2 from Jan Hubicka --- This indeed looks like bug caused by fact that the class is keyed into one of the two units. Outputting translation unit names is unfortunately hard, since they are object files and often comming from .a arch

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 --- Comment #26 from Jan Hubicka --- > I think optimize_function_for_size_p (cfun) isn't always true if > optimize_size is since it looks at the function-specific setting > of that flag, so you'd have to use opt_for_fn (cfun, optimize_size). Wh

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 --- Comment #21 from Jan Hubicka --- Looking at the prototype patch, why need to change also the splitters? My original goal was to use splitters to expand to faster code sequences while having patterns necessary for both variants. This makes

[Bug target/114232] [14 regression] ICE when building rr-5.7.0 with LTO on x86

2024-03-05 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114232 --- Comment #18 from Jan Hubicka --- optimize_function_for_size_p is not really affected by LTO or non-LTO. It does take into account node->count and node->frequency, which is updated during IPA, so it may change between early opts and late opt

[Bug tree-optimization/114052] [11/12/13/14 Regression] Wrong code at -O2 for well-defined infinite loop

2024-02-22 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114052 --- Comment #7 from Jan Hubicka --- > I see it doesn't do anything if mark_dfs_back_edges returns false, so it > will claim the function is finite even when it calls a non-finite function? > So I assume this is local analysis only and call edges

[Bug ipa/111960] [14 Regression] ICE: during GIMPLE pass: rebuild_frequencies: SIGSEGV (Invalid read of size 4) with -fdump-tree-rebuild_frequencies-all

2024-02-22 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111960 --- Comment #13 from Jan Hubicka --- > Should be fixed now. Thanks! I was testing with stage3 compiler, so that is the reason. Indeed dropping the buffer is a good idea.

[Bug middle-end/113907] [12/13/14 regression] ICU miscompiled since on x86 since r14-5109-ga291237b628f41

2024-02-16 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907 --- Comment #45 from Jan Hubicka --- > > "Once legacy evrp is removed, this won't be an issue, as ranges in the IL > > will tell the truth. However, this will mean that we will no longer > > remove the first __builtin_unreachable combo. But

[Bug middle-end/113907] [12/13/14 regression] ICU miscompiled since on x86 since r14-5109-ga291237b628f41

2024-02-16 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907 --- Comment #43 from Jan Hubicka --- > // See discussion here: > // https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571709.html Discussion says: "Once legacy evrp is removed, this won't be an issue, as ranges in the IL will tell the truth.

[Bug middle-end/113907] [14 regression] ICU miscompiled since on x86 since r14-5109-ga291237b628f41

2024-02-15 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907 --- Comment #37 from Jan Hubicka --- > Also remember we like to have a fix that's easily backportable, and > that's probably going to be resetting the info. We can do something > more fancy for GCC 15 Rejecting to merge function with different

[Bug middle-end/113907] [14 regression] ICU miscompiled since on x86 since r14-5109-ga291237b628f41

2024-02-15 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907 --- Comment #36 from Jan Hubicka --- > > Having a testcase is great. I was just playing with crafting one. > > I am still concerned about value ranges in ipa-prop's jump functions. > > Maybe my imagination is too limited, but if the ipa-prop's

[Bug tree-optimization/113787] [12/13/14 Regression] Wrong code at -O with ipa-modref on aarch64

2024-02-14 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113787 --- Comment #19 from Jan Hubicka --- > Note I didn't check if it helps the testcase .. I will check. > > > > > > > A "nicer" solution might be to add a informational operand > > > to TARGET_MEM_REF, representing the base pointer to be used fo

[Bug tree-optimization/113787] [12/13/14 Regression] Wrong code at -O with ipa-modref on aarch64

2024-02-14 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113787 --- Comment #17 from Jan Hubicka --- > > I guess PTA gets around by tracking points-to set also for non-pointer > > types and consequently it also gives up on any such addition. > > It does. But note it does _not_ for POINTER_PLUS where it tre

[Bug tree-optimization/113787] [12/13/14 Regression] Wrong code at -O with ipa-modref on aarch64

2024-02-13 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113787 --- Comment #15 from Jan Hubicka --- > > IVOPTs does the above but it does it (or should) as > > offset = (uintptr)&base2 - (uintptr)&base1; > val = *((T *)((uintptr)base1 + i + offset)) > > which is OK for points-to as no POINTER_PLUS_EX

[Bug gcov-profile/113646] PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native

2024-02-01 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646 --- Comment #4 from Jan Hubicka --- > > With -fprofile-partial-training the znver4 LTO vs LTOPGO regression (on a > newer > master) goes down from 66% to 54%. > > So far I did not find a way to easily train with the reference run (when I ad

[Bug ipa/113665] [11/12/13/14 regression] Regular for Loop results in Endless Loop with -O2 since r11-4987-g602c6cfc79ce4a

2024-01-30 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113665 --- Comment #8 from Jan Hubicka --- > Honza - ICF seems to fixup points-to sets when merging variables, so there > should be a way to kill off flow-sensitive info inside prevailing bodies > as well. But would that happen before inlining the bod

[Bug gcov-profile/113646] PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native

2024-01-29 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646 --- Comment #2 from Jan Hubicka --- > Did you try with -fprofile-partial-training (is that default on? it probably > should ...). Can you please try training with the rate data instead of train It is not on by default - the problem of partial

[Bug ipa/113478] -Os does not inline single instruction function

2024-01-19 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113478 --- Comment #4 from Jan Hubicka --- > Possibly, at least when we know it doesn't expand to a libatomic call? OTOH > even then a function just wrapping such call should probably be inlined, > so the question is whether the problem that > is esti

[Bug ipa/113478] -Os does not inline single instruction function

2024-01-19 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113478 --- Comment #2 from Jan Hubicka --- Probably is_inexpensive_bulitin_p should return true here?

[Bug c++/109753] [13/14 Regression] pragma GCC target causes std::vector not to compile (always_inline on constructor)

2024-01-11 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109753 --- Comment #14 from Jan Hubicka --- > I think the issue might be that whoever is creating > __static_initialization_and_destruction_0 fails to honor the active > target pragma. Which means back to my suggestion to have multiple ones > when dif

[Bug tree-optimization/110852] [14 Regression] ICE: in get_predictor_value, at predict.cc:2695 with -O -fno-tree-fre and __builtin_expect() since r14-2219-geab57b825bcc35

2024-01-04 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110852 --- Comment #14 from Jan Hubicka --- > I thought the goal was to handle what is in predict-18.c, i.e. > b * __builtin_expect (c, 0) > or similar. If it is about > __builtin_expect_with_probability (b, 42, 0.25) * > __builtin_expect_with_probabi

[Bug tree-optimization/110852] [14 Regression] ICE: in get_predictor_value, at predict.cc:2695 with -O -fno-tree-fre and __builtin_expect() since r14-2219-geab57b825bcc35

2024-01-04 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110852 --- Comment #11 from Jan Hubicka --- > > + int p1 = get_predictor_value (*predictor, *probability); > > + int p2 = get_predictor_value (predictor2, probability2); > > + /* If both predictors agrees, it does not matter fro

[Bug tree-optimization/110852] [14 Regression] ICE: in get_predictor_value, at predict.cc:2695 with -O -fno-tree-fre and __builtin_expect() since r14-2219-geab57b825bcc35

2024-01-04 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110852 --- Comment #9 from Jan Hubicka --- > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110852 > > --- Comment #7 from Jakub Jelinek --- > So, what about following patch (which also fixes the ICE, would of course need > to add the testcase) and doe

[Bug tree-optimization/110852] [14 Regression] ICE: in get_predictor_value, at predict.cc:2695 with -O -fno-tree-fre and __builtin_expect() since r14-2219-geab57b825bcc35

2024-01-04 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110852 --- Comment #6 from Jan Hubicka --- > which fixes the ICE by preferring PRED_BUILTIN_EXPECT* over others. > At least in this case when one operand is a constant and another one is > __builtin_expect* result that seems like the right choice to me

[Bug target/113233] LoongArch: target options from LTO objects not respected during linking

2024-01-04 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113233 --- Comment #3 from Jan Hubicka --- > Confirm. But option save/restore has been always implemented: > > .section.gnu.lto_.opts,"",@progbits > .ascii "'-fno-openmp' '-fno-openacc' '-fno-pie' '-fcf-protection" > .ascii "=none'

[Bug middle-end/88345] -Os overrides -falign-functions=N on the command line

2023-12-06 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345 --- Comment #20 from Jan Hubicka --- > > Live patching (user-space) doesn't depend on any particular alignment of > functions, on x86-64 at least. (The plan for other architectures wouldn't > need > any specific alignment either). Note that t

[Bug middle-end/109849] suboptimal code for vector walking loop

2023-11-29 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849 --- Comment #32 from Jan Hubicka --- > /tmp/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/stl_algobase.h:437: > warning: 'void* __builtin_memcpy(void*, const void*, long unsigned int)' > writing between 2 and 9223372036854775806 bytes into

[Bug middle-end/112653] PTA should handle correctly escape information of values returned by a function

2023-11-27 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112653 --- Comment #15 from Jan Hubicka --- Thanks a lot for working on this! I think it is quite importnat part of the puzzle of making libstdc++ vector working reasonably well.

[Bug tree-optimization/112706] missed simplification in FRE

2023-11-24 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112706 --- Comment #3 from Jan Hubicka --- Thanks, new pattern looks like noticeable improvement :) Base+offset is effective for alias analysis and I suppose it happens reasonably enough for compares as well. > _76 = _71 + 4; > # .MEM_154 = VDEF <.

[Bug tree-optimization/112678] [14 regression] Massive slowdown of compilation time with PGO

2023-11-23 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112678 --- Comment #2 from Jan Hubicka --- Seems we changed default to locking increments. jh@ryzen4:/tmp> cat t.C void test() { } jh@ryzen4:/tmp> ~/trunk-install/bin/g++ -O2 -fprofile-generate t.C -S ; grep lock t.s lock addl $1, __gcov

[Bug middle-end/112653] We should optimize memmove to memcpy using alias oracle

2023-11-22 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112653 --- Comment #5 from Jan Hubicka --- > but the issue is that test2 escapes which makes this conflict: It is passed to memmove which is noescape and returned. Why local PTA considers returned values to escape?

[Bug tree-optimization/111498] 951% profile quality regression between g:93996cfb308ffc63 (2023-09-18 03:40) and g:95d2ce05fb32e663 (2023-09-19 03:22)

2023-09-22 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111498 --- Comment #2 from Jan Hubicka --- > That just might cause a tid more early threading. That is, expose latent > profile updating issues elsewhere. Looking at the graph we're also still very > good compared to July. Early threading should not

[Bug ipa/111157] [14 Regression] 416.gamess fails with a run-time abort when compiled with -O2 -flto after r14-3226-gd073e2d75d9ed4

2023-08-29 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57 --- Comment #8 from Jan Hubicka --- > This is what I wanted to ask about. Looking at the dumps, ipa-modref > knows it is "killed." Is that enough or does it need to be also not > read to be know to be useless? The killed info means that the d

[Bug tree-optimization/110628] [14 regression] gcc.dg/tree-ssa/update-threading.c fails after r14-2383-g768f00e3e84123

2023-08-24 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110628 --- Comment #8 from Jan Hubicka --- patch posted https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628231.html

[Bug ipa/111088] useless 'xor eax,eax' inserted when a value is not returned and icf

2023-08-21 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111088 --- Comment #3 from Jan Hubicka --- > But adds a return with a value. And then the inliner inlines foo into foo2 but > we still have the return with a value around ... I guess ICF can special case unused return value, but why this is not taken c

[Bug tree-optimization/110628] [14 regression] gcc.dg/tree-ssa/update-threading.c fails after r14-2383-g768f00e3e84123

2023-08-17 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110628 --- Comment #6 from Jan Hubicka --- The mismatch happens on: void foo (unsigned int x) { if (x != 0x800 && x != 0x810) abort (); } It is bug in reassoc turning: void foo (unsigned int x) { ;; basic block 2, loop depth 0, count 107374

[Bug tree-optimization/106293] [13/14 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022

2023-07-28 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293 --- Comment #19 from Jan Hubicka --- > This heuristic wants to catch > > > if (foo) abort (); > > > and avoid sinking "too far" across a path with "similar enough" > execution count (I think the original motivation was to fix some > sp

[Bug middle-end/110832] 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core

2023-07-27 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832 --- Comment #2 from Jan Hubicka --- I tested that the profile change makes no difference.

[Bug target/110758] [14 Regression] 8% hmmer regression on zen1/3 with -Ofast -march=native -flto between g:8377cf1bf41a0a9d (2023-07-05 01:46) and g:3a61ca1b9256535e (2023-07-06 16:56); g:d76d19c9bc5

2023-07-21 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110758 --- Comment #2 from Jan Hubicka --- > I suspect this is most likely the profile updates changes ... Quite possibly. The goal of this excercise is to figure out if there are some bugs in profile estimate or whether passes somehow preffer broken p

[Bug tree-optimization/110628] [14 regression] gcc.dg/tree-ssa/update-threading.c fails after r14-2383-g768f00e3e84123

2023-07-13 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110628 --- Comment #3 from Jan Hubicka --- > -fdump-tree-all-blocks-details produced more than 100 dump files. Which > one(s) > do you want? Can you just zip them an attach all? Thank you! Honza

[Bug ipa/110334] [13/14 Regresssion] unused functions not eliminated before LTO streaming

2023-07-11 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110334 --- Comment #23 from Jan Hubicka --- But it would be nice to see why the functions are not early inlined.

[Bug ipa/110334] [13/14 Regresssion] unused functions not eliminated before LTO streaming

2023-07-11 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110334 --- Comment #22 from Jan Hubicka --- I will cook up the patch to keep multiple variants of nodes pre-inline and we will see how much that affects compile time & how hard it will be to get unit size esitmates right.

[Bug ipa/110334] [13/14 Regresssion] unused functions not eliminated before LTO streaming

2023-06-28 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110334 --- Comment #16 from Jan Hubicka --- > > We already have plenty of GF_CALL_ flags, so adding one should be easy? > > We have 3 bits left :/ I was hoping that cgraph_edge lives long > enough? But I suppose we're not keeping them across the ear

[Bug tree-optimization/109689] [14 Regression] ICE at -O1 with "-ftree-vectorize": in check_loop_closed_ssa_def, at tree-ssa-loop-manip.cc:645 since r14-301-gf2d6beb7a4ddf1

2023-06-28 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109689 --- Comment #10 from Jan Hubicka --- > > So perhaps simply: > > rewrite_into_loop_closed_ssa (NULL, 0); > > in case we unlooped in loop closed ssa form (which is not that common). > > Would that be acceptable? > > Yes, we do that in other pla

[Bug ipa/110334] [13/14 Regresssion] unused functions not eliminated before LTO streaming

2023-06-28 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110334 --- Comment #14 from Jan Hubicka --- > > why disallow caller->indirect_calls? See testcase in comment #9 > > > + return false; > > + for (cgraph_edge *e2 = callee->callees; e2; e2 = e2->next_callee) > > I don't think this flys - it

[Bug ipa/110334] [13/14 Regresssion] unused functions not eliminated before LTO streaming

2023-06-26 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110334 --- Comment #11 from Jan Hubicka --- Hi, what about this. It should make at least quite basic inlining to happen to always_inline. I do not think many critical always_inlines have indirect calls in them. The test for lto is quite bad and I can

[Bug ipa/110334] [13/14 Regresssion] unused functions not eliminated before LTO streaming

2023-06-23 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110334 --- Comment #9 from Jan Hubicka --- Just so it is somewhere, here is a testcase that we can't inline leaf functions to always_inlines unless we do some tracking of what calls were formerly indirect calls. We really overloaded always_inline from

[Bug ipa/110334] [13/14 Regresssion] unused functions not eliminated before LTO streaming

2023-06-23 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110334 --- Comment #8 from Jan Hubicka --- > > I was playing with the idea of warning when at lto time when comdats have > > different command line options, but this triggers way too often in practice. > > Really? :/ Yep, for example firefox consist o

[Bug libstdc++/110287] _M_check_len is expensive

2023-06-19 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287 --- Comment #7 from Jan Hubicka --- > > There is no guarantee that std::vector::max_size() is PTRDIFF_MAX. It > depends on the Allocator type, A. A user-defined allocator could have > max_size() == 100. If inliner we see path to the throw func

[Bug libstdc++/110287] _M_check_len is expensive

2023-06-18 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287 --- Comment #5 from Jan Hubicka --- > Do you mean something like this? I sent my own version, but yours looks nicer. > > diff --git a/libstdc++-v3/include/bits/stl_vector.h > b/libstdc++-v3/include/bits/stl_vector.h > index 70ced3d101f..a4dbfeb

[Bug target/109812] GraphicsMagick resize is a lot slower in GCC 13.1 vs Clang 16 on Intel Raptor Lake

2023-05-31 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109812 --- Comment #12 from Jan Hubicka --- > /home/sdp/jun/btl0/install/bin/ld: /tmp/ccnX75zI.ltrans0.ltrans.o: in > function `main': > :(.text.startup+0x1): undefined reference to `GMCommand' I wonder if your plugin is configured correctly. Can you

[Bug middle-end/109849] suboptimal code for vector walking loop

2023-05-14 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849 --- Comment #5 from Jan Hubicka --- > Actually why didn't we copy the loop header in the first place? Because it is considered to be do-while loop already (thanks to the in-loop conitional, do_while_loop_p is happy).

[Bug middle-end/109849] suboptimal code for vector walking loop

2023-05-14 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849 --- Comment #4 from Jan Hubicka --- > Rather, because store-motion out of a loop that might iterate zero times would > create a data race. Good point. If we did copy loop headers all the way to the store the problem will go away. Also I assume

[Bug c++/106943] GCC building clang/llvm with LTO flags causes ICE in clang

2023-05-12 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106943 --- Comment #19 from Jan Hubicka --- > > Is there any need to over-engineer this like that? I would hope enabling > > -fno-lifetime-dse globally would not be controversial for LLVM It would be really nice to have the ranger bug fixed. Since li

[Bug c++/106943] GCC building clang/llvm with LTO flags causes ICE in clang

2023-05-12 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106943 --- Comment #15 from Jan Hubicka --- > > Indeed it is quite long time problem with clang not building with lifetime > > DSE and strict aliasing. I wonder why this is not fixed on clang side? > > Because the problems were not communicated? I kn

[Bug c++/108887] [13 Regression] ICE in process_function_and_variable_attributes since r13-3601

2023-03-09 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108887 --- Comment #8 from Jan Hubicka --- > Also, reset() is only defined in cgraph_node, and I need it to work on both > functions and variables. Aha, this is a good point. I forgot that. I will make reset() working on symbols in general. I think i

[Bug c++/101118] coroutines: unexpected ODR warning for coroutine frame type in LTO builds

2023-03-07 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101118 --- Comment #15 from Jan Hubicka --- > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101118 > > --- Comment #14 from Iain Sandoe --- > (In reply to Jan Hubicka from comment #13) > > > So .. for promotion of target expression temporaries to fram

[Bug c++/101118] coroutines: unexpected ODR warning for coroutine frame type in LTO builds

2023-03-07 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101118 --- Comment #13 from Jan Hubicka --- > So .. for promotion of target expression temporaries to frame vars, one of: > - a) we need to find a different way to name them I think we can just count number of fields within a given frame type? Honza

[Bug c++/108887] [13 Regression] ICE in process_function_and_variable_attributes since r13-3601

2023-03-03 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108887 --- Comment #5 from Jan Hubicka --- > Perhaps, but shouldn't we also unlink_from_assembler_name_hash (node, false);? > I think the point of the current removal is that we've discovered the mangling > alias clashes with some other symbol. cgraph_

[Bug c++/101118] coroutines: unexpected ODR warning for coroutine frame type in LTO builds

2023-03-03 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101118 --- Comment #8 from Jan Hubicka --- > > the synthesised functions (actor, destroy) are intended to be TU-local. > the ramp function is what remains of the user's original function after the > coroutine body is outlined - so that has the origina

[Bug ipa/107931] [12/13 Regression] -Og causes always_inline to fail since r12-6677-gc952126870c92cf2

2023-02-21 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107931 --- Comment #20 from Jan Hubicka --- > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107931 > > --- Comment #17 from rguenther at suse dot de --- > On Mon, 20 Feb 2023, jakub at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bu

[Bug tree-optimization/108552] Linux i386 kernel 5.14 memory corruption for pre_compound_page() when gcov is enabled

2023-01-28 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552 --- Comment #39 from Jan Hubicka --- > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552 > > --- Comment #35 from Vladimir Makarov --- > (In reply to Jakub Jelinek from comment #34) > > Seems right now DECL_NONALIASED is only used on these c

[Bug bootstrap/107950] partial LTO linking of libbackend.a: gcc/gcc-rich-location.cc:207: undefined reference to `range_label_for_type_mismatch::get_text(unsigned int) const'

2023-01-16 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107950 --- Comment #9 from Jan Hubicka --- > > Feel free to grab my initial patch in c#0 and upstream it. I tried that some > time ago in the following email thread: > https://gcc.gnu.org/pipermail/gcc/2021-May/236096.html Actually I was shooting for

[Bug tree-optimization/107467] [12/13 Regression] Miscompilation involing -Os , -flto and -fno-strict-aliasing since r12-656-ga564da506f52be66

2023-01-13 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107467 --- Comment #9 from Jan Hubicka --- > > so it's ICFed compare_pairs having modref TBAA info that makes the > stores dead. I suppose ICF needs to reset / alter the modref summaries? Well, matching that ICF does should be enough to verify that

[Bug target/87832] AMD pipeline models are very costly size-wise

2022-11-16 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832 --- Comment #9 from Jan Hubicka --- > > Do you mean we should fix modeling of divisions there as well? I don't have > latency/throughput measurements for those CPUs, nor access so I can run > experiments myself, unfortunately. > > I guess you m

[Bug tree-optimization/107715] TSVC s161 for double runs at zen4 30 times slower when vectorization is enabled

2022-11-16 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107715 --- Comment #2 from Jan Hubicka --- > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107715 > > --- Comment #1 from Richard Biener --- > Because store data races are allowed with -Ofast masked stores are not used so > we instead get > > vect_

[Bug target/87832] AMD pipeline models are very costly size-wise

2022-11-16 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832 --- Comment #7 from Jan Hubicka --- > 53730 r btver2_fp_min_issue_delay > 53760 r znver1_fp_transitions > 93960 r bdver3_fp_transitions > 106102 r lujiazui_core_check > 106102 r lujiazui_core_transitions > 196123 r lujiazui_core_min_issue_delay >

[Bug c++/107597] LTO causes static inline variables to get a non-uniqued global symbol

2022-11-11 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107597 --- Comment #2 from Jan Hubicka --- Hi, What happens is that we read the symbol as: Visibility: externally_visible semantic_interposition prevailing_def_ironly_exp public weak comdat comdat_group:_ZN12NonTemplated1xE one_only While in visibili

[Bug ipa/106991] new+delete pair not optimized by g++ at -O3 but optimized at -Os

2022-09-21 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106991 --- Comment #2 from Jan Hubicka --- > Looks like inlining decisions decide to inline new but not delete but for -Os > we inline none and elide the new/delete pair. > > Maybe we can devise some inline hints to keep pairs? Inliner is mostly buil

[Bug middle-end/106408] PRE with infinite loops

2022-07-22 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106408 --- Comment #2 from Jan Hubicka --- > + /* If block is a loop that is possibly infinite we should not > +hoist across it. */ > + if (block->loop_father->header == block > + && !finite_loop_p (block->loop_father)) > +

[Bug ipa/102581] [12 Regression] ice in forced_merge, at ipa-modref-tree.h:352 with -fno-strict-aliasing and -O2 since r12-3202-gf5ff3a8ed4ca9173

2021-10-06 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102581 --- Comment #8 from Jan Hubicka --- Actually, this is shorter patch - we already should notice that one range is contained in other, but we give up too early. Honza diff --git a/gcc/ipa-modref-tree.h b/gcc/ipa-modref-tree.h index 6a9ed5ce54b..

[Bug ipa/102581] [12 Regression] ice in forced_merge, at ipa-modref-tree.h:352 with -fno-strict-aliasing and -O2 since r12-3202-gf5ff3a8ed4ca9173

2021-10-06 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102581 --- Comment #7 from Jan Hubicka --- Hi, the problem is that we assume that merge is symmetric (merging a to b succeeds if and only if merging b to a succeeds). There was one symetrical path missing in the (fancy and bit ugly) logic on what we ca

[Bug tree-optimization/102446] [9/10/11/12 Regression] wrong code at -O3 on x86_64-linux-gnu

2021-09-22 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102446 --- Comment #4 from Jan Hubicka --- > Started with r5-6477-g3620b606822f80863488ca4883542d848d41f9f9 This only affects early inlining decisions, so it may be useful to bisect this with --param early-inlining-insns=14 Honza

[Bug tree-optimization/45178] CDDCE doesn't eliminate conditional code in infinite loop

2021-08-26 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45178 --- Comment #4 from Jan Hubicka --- > and that marks a condition that has nothing to do with loop control. I > suppose > we can elide this when the loop has no exit (we are already marking backedges > of irreducible loops). > > But I never act

[Bug tree-optimization/101909] 73% regression on tfft benchmark for -O2 -ftree-loop-vectorize compared to -O2 on zen hardware

2021-08-16 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101909 --- Comment #2 from Jan Hubicka --- > So that's znver1 (split AVX IIRC) compared to znver2? Martin will know how to decode machine names. I am never sure. It is with generic, so split AVX does not make difference. Honza

  1   2   >