[Bug tree-optimization/110289] Phiprop may be good idea in early opts

2023-06-23 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110289 Jan Hubicka changed: What|Removed |Added Resolution|--- |FIXED Status|NEW

[Bug libstdc++/110287] _M_check_len is expensive

2023-06-23 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287 Bug 110287 depends on bug 110289, which changed state. Bug 110289 Summary: Phiprop may be good idea in early opts https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110289 What|Removed |Added

[Bug middle-end/110377] New: Early VRP and IPA-PROP should work out value ranges from __builtin_unreacahble

2023-06-23 Thread hubicka at gcc dot gnu.org via Gcc-bugs
Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- In the following testcase void test2(int); void test(int n) { if (n > 5) __builtin_unreacha

[Bug tree-optimization/109689] [14 Regression] ICE at -O1 with "-ftree-vectorize": in check_loop_closed_ssa_def, at tree-ssa-loop-manip.cc:645 since r14-301-gf2d6beb7a4ddf1

2023-06-23 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109689 --- Comment #8 from Jan Hubicka --- An easy way would be to avoid unlooping if tree_ssa_loop_ch is executed in loop closed ssa (which happens from ch_vect pass). I wonder how hard would be however to get this right? I think this means to take

[Bug ipa/110334] [13/14 Regresssion] unused functions not eliminated before LTO streaming

2023-06-23 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110334 --- Comment #6 from Jan Hubicka --- Comdats are really in conflict with the fact that we have command line options. I blame C++ standard for that and I don't think there is fully satisfactory solution to this problem. I was playing with the

[Bug ipa/86590] Codegen is poor when passing std::string by value with _GLIBCXX_EXTERN_TEMPLATE undefined

2023-06-18 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86590 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment

[Bug tree-optimization/110302] New: libquantum regression on zen3 with LTO and PGO

2023-06-18 Thread hubicka at gcc dot gnu.org via Gcc-bugs
: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- Seen here https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=474.210.0 between g:067a8c7cb897b6a1ea5b1d26df8e89ccc7f0659c and g

[Bug tree-optimization/110301] New: x264 regression on Neoverse-N1

2023-06-18 Thread hubicka at gcc dot gnu.org via Gcc-bugs
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- This happen between g:9e3607e19bcd34e1fc857ca44ae30a8a1a4f5e20 and g:57446d1bc9757ee1fb030600d38fa9487231f2a4 (Jun 16 2023) https://lnt.opensuse.org/db_default/v4/SPEC

[Bug libstdc++/110287] _M_check_len is expensive

2023-06-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287 --- Comment #2 from Jan Hubicka --- With patch in PR110289 to optimize the std::max int MAX_EXPR and the throw commented out I get: size_type std::vector >::_M_check_len (const struct vector * const this, size_type __n, const char * __s) {

[Bug tree-optimization/110289] Phiprop may be good idea in early opts

2023-06-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110289 --- Comment #2 from Jan Hubicka --- This patch fixes the problem diff --git a/gcc/passes.def b/gcc/passes.def index c9a8f19747b..faa5208b26b 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -88,6 +88,8 @@ along with GCC; see the file

[Bug tree-optimization/110289] Phiprop may be good idea in early opts

2023-06-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110289 --- Comment #1 from Jan Hubicka --- This is caused by the way libstdc++ defines max: constexpr inline const _Tp& max(const _Tp& __a, const _Tp& __b) { if (__a < __b) return __b; return __a; }

[Bug tree-optimization/110289] New: Phiprop may be good idea in early opts

2023-06-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- libstdc++ in push_back operation does equivalent of the following: int max(int a, int b) { int *ptr; if (a > b) ptr = e

[Bug libstdc++/110287] _M_check_len is expensive

2023-06-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287 --- Comment #1 from Jan Hubicka --- Another problem is: D.27747 = _8; if (__n.3_2 > _8) goto ; [34.00%] else goto ; [66.00%] [local count: 364926196]: [local count: 1073312330]: # _18 = PHI <(4), &__n(5)> _3 = *_18;

[Bug libstdc++/110287] New: _M_check_len is expensive

2023-06-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
++ Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- I am looking into ineffective codegen for loops controlled by std::vec based stack (see testcase in PR109849). The problem is that we fail to inline enough of implementation of std

[Bug middle-end/109849] suboptimal code for vector walking loop

2023-06-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849 --- Comment #14 from Jan Hubicka --- One interesting situation is: void std::vector >::push_back (struct vector * const this, const struct value_type & __x) { struct __normal_iterator D.27894; struct pair * _1; struct pair * _2; struct

[Bug tree-optimization/110062] missed vectorization in graphicsmagick

2023-06-07 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110062 --- Comment #5 from Jan Hubicka --- In sharpening the number of iterations depends on sharpen radius. Not sure what it is for the benchmark, but in normal situations the number of iterations is indeed not very large. However clang simply slp

[Bug middle-end/110148] New: TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8

2023-06-06 Thread hubicka at gcc dot gnu.org via Gcc-bugs
Version: 13.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- Seen here: https://lnt.opensuse.org/db_default/v4/CPP

[Bug tree-optimization/110062] missed vectorization in graphicsmagick

2023-06-06 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110062 Jan Hubicka changed: What|Removed |Added Status|WAITING |NEW --- Comment #3 from Jan Hubicka ---

[Bug target/109812] GraphicsMagick resize is a lot slower in GCC 13.1 vs Clang 16 on Intel Raptor Lake

2023-06-01 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109812 --- Comment #17 from Jan Hubicka --- I was also thinking of DCE. It looks like plausible idea. It may leads to a surprise where you sture same undefined variable to two places and later compare them for equality, but that is undefined anyway.

[Bug target/109812] GraphicsMagick resize is a lot slower in GCC 13.1 vs Clang 16 on Intel Raptor Lake

2023-05-31 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109812 Jan Hubicka changed: What|Removed |Added CC||rguenther at suse dot de See

[Bug c/110062] New: missed vectorization in graphicsmagick

2023-05-31 Thread hubicka at gcc dot gnu.org via Gcc-bugs
Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- Phoronix claims 31% performance difference between gcc13 and clang on sharpen benchmark of graphicsmagick. On zen3 I reproduce only 4%, but the benchmark has only single short

[Bug tree-optimization/109985] __builtin_prefetch ignored by GCC 12/13

2023-05-28 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109985 --- Comment #6 from Jan Hubicka --- Created attachment 55180 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55180=edit untested patch It turns out that as modref was written for memory loads/stores only and later side effects discovery

[Bug tree-optimization/109985] __builtin_prefetch ignored by GCC 12/13

2023-05-28 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109985 Jan Hubicka changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |hubicka at gcc dot gnu.org

[Bug ipa/109914] --suggest-attribute=pure misdiagnoses static functions

2023-05-28 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109914 --- Comment #2 from Jan Hubicka --- The reason why gcc warns is that it is unable to prove that the function is always finite. This means that it can not auto-detect pure attribute since optimizing the call out may turn infinite program to

[Bug middle-end/79704] [meta-bug] Phoronix Test Suite compiler performance issues

2023-05-28 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79704 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment #2

[Bug middle-end/110015] openjpeg is slower when built with gcc13 compared to clang16

2023-05-28 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110015 --- Comment #1 from Jan Hubicka --- opj_t1_enc_refpass is not inlined due to large function growth and some others due to max-inline-insns-auto. With inlining forced I get profile: 87.35% opj_t1_cblk_encode_processor 6.22%

[Bug middle-end/110015] New: openjpeg is slower when built with gcc13 compared to clang16

2023-05-28 Thread hubicka at gcc dot gnu.org via Gcc-bugs
Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- I tried to reproduce openjpeg benchmarks from Phoronix https://www.phoronix.com/review/gcc13-clang16-raptorlake/5 On zen3

[Bug target/109812] GraphicsMagick resize is a lot slower in GCC 13.1 vs Clang 16 on Intel Raptor Lake

2023-05-28 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109812 --- Comment #10 from Jan Hubicka --- This is benchmarkeable version of the simplified testcase: jan@localhost:/tmp> cat t.c #define N 1000 struct rgb {unsigned char r,g,b;} rgbs[N]; int *addr; struct drgb {double r,g,b; #ifdef OPACITY

[Bug target/109812] GraphicsMagick resize is a lot slower in GCC 13.1 vs Clang 16 on Intel Raptor Lake

2023-05-28 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109812 --- Comment #9 from Jan Hubicka --- Oddly enough simplified version of the loop SLP vectorizes for me: struct rgb {unsigned char r,g,b;} *rgbs; int *addr; double *weights; struct drgb {double r,g,b;}; struct drgb sum() { struct drgb r;

[Bug target/109812] GraphicsMagick resize is a lot slower in GCC 13.1 vs Clang 16 on Intel Raptor Lake

2023-05-28 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109812 --- Comment #8 from Jan Hubicka --- Created attachment 55178 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55178=edit Preprocessed source of VerticalFiller and HorisontalFiller

[Bug target/109812] GraphicsMagick resize is a lot slower in GCC 13.1 vs Clang 16 on Intel Raptor Lake

2023-05-28 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109812 Jan Hubicka changed: What|Removed |Added Summary|GraphicsMagick resize is a |GraphicsMagick resize is a

[Bug target/109812] GraphicsMagick resize is a lot slower in GCC 13.1 vs Clang 16

2023-05-28 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109812 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment

[Bug c/110007] Implement support for Clang’s __builtin_unpredictable()

2023-05-28 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110007 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment

[Bug middle-end/110004] New: large increase in profile mismatches on tramp3d

2023-05-27 Thread hubicka at gcc dot gnu.org via Gcc-bugs
Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- https://lnt.opensuse.org/db_default/v4/CPP/latest_runs_report it is between g:c5300bf3110b44e2742b36f49c2a380abd08d9c5

[Bug target/109811] libjxl 0.7 is a lot slower in GCC 13.1 vs Clang 16

2023-05-18 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811 --- Comment #11 from Jan Hubicka --- I got -fprofile-use builds working and with profile we peel the innermost loop 8 times which actually gets it off the hottest spot. We get more slective on what to inline (do not inline cold calls) which may

[Bug middle-end/109849] suboptimal code for vector walking loop

2023-05-18 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849 --- Comment #10 from Jan Hubicka --- Thanks. I tested the patch on jpegxl and it does not help there (I guess becuase the redundancy there is partial). But it is cool we compile at least the simplified testcase well.

[Bug middle-end/109849] suboptimal code for vector walking loop

2023-05-18 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849 --- Comment #8 from Jan Hubicka --- We can only SRA if the address is non-escaping. Clang does not seem to need it to optimize better: jan@localhost:~> cat t.c extern void q(int *); __attribute__ ((noinline)) void test() { for (int a

[Bug target/109811] libjxl 0.7 is a lot slower in GCC 13.1 vs Clang 16

2023-05-17 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811 --- Comment #10 from Jan Hubicka --- Actually vectorization hurts on both compilers and bit more with clang. It seems that all important loops are hand vectorized and since register pressure is a problem, vectorizing other loops causes enough

[Bug middle-end/109849] suboptimal code for vector walking loop

2023-05-17 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849 Jan Hubicka changed: What|Removed |Added Blocks||109811 CC|

[Bug target/109811] libjxl 0.7 is a lot slower in GCC 13.1 vs Clang 16

2023-05-17 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811 --- Comment #8 from Jan Hubicka --- Created attachment 55101 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55101=edit hottest loop jpegxl build machinery adds -fno-vectorize and -fno-slp-vectorize to clang flags. Adding

[Bug target/109811] libjxl 0.7 is a lot slower in GCC 13.1 vs Clang 16

2023-05-17 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811 --- Comment #6 from Jan Hubicka --- hottest loop in clang's profile is: for (size_t y = 0; y < opsin.ysize(); y++) { for (size_t x = 0; x < opsin.xsize(); x++) { if (is_background_row[y * is_background_stride + x]) continue;

[Bug target/109811] libxjl 0.7 is a lot slower in GCC 13.1 vs Clang 16

2023-05-17 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811 --- Comment #5 from Jan Hubicka --- Also forgot to mention, I used zen3 machine. So Raptor lake is not necessary. Note that build systems appends -O2 after any CFLAGS specified, so it really is -O2 build: # Force build with optimizations in

[Bug target/109811] libxjl 0.7 is a lot slower in GCC 13.1 vs Clang 16

2023-05-17 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org Ever

[Bug middle-end/109849] New: suboptimal code for vector walking loop

2023-05-13 Thread hubicka at gcc dot gnu.org via Gcc-bugs
-end Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- jan@localhost:/tmp> cat t.C #include typedef unsigned int uint32_t; std::vector> stack; void test() { while (!stack.empty()) { std::pa

[Bug c++/106943] GCC building clang/llvm with LTO flags causes ICE in clang

2023-05-12 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106943 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment

[Bug target/109690] bad SLP vectorization on zen

2023-05-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109690 --- Comment #7 from Jan Hubicka --- Thanks a lot! There however still seems to be problem with vectorization On zen4 i now get: jh@ryzen4:~/gcc/build/gcc> ./xgcc -B ./ -O2 -march=native slp.c ; perf stat ./a.out Performance counter stats

[Bug tree-optimization/109690] New: bad SLP vectorization on zen

2023-05-01 Thread hubicka at gcc dot gnu.org via Gcc-bugs
Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- model name : AMD Ryzen 7 5800X 8-Core Processor reproduces on my znver1 laptop too. h@ryzen3:~/gcc-kub/build/gcc> cat tt.c int a[100]; [[gnu::noipa]] void l

[Bug tree-optimization/109605] New: -fno-tree-vectorize does not disable vectorizer

2023-04-24 Thread hubicka at gcc dot gnu.org via Gcc-bugs
: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- -ftree-vectorize enables -ftree-slp-vectorize and -ftree-loop-vectorize however -fno-tree-vectorize does not disable them. This is quite counter-intuitive

[Bug tree-optimization/109595] New: Missed upper bound on number of iterations

2023-04-22 Thread hubicka at gcc dot gnu.org via Gcc-bugs
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- The following loop can iterate only 0 times before hitting undefined behaviour. struct foo { int a[3]; int b; } c; test(int p) { for (int i = 3; i < p

[Bug target/109137] [12 regression] Compiling ffmpeg with -m32 on x86_64-pc-linux-gnu hangs on libavcodec/h264_cabac.c since r12-9086-g489c81db7d4f75

2023-04-14 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109137 --- Comment #26 from Jan Hubicka --- reverted the znver1-3 change on gcc-12 branch. We still may want to fix IRA to avoid the problem on core_avx512 targets.

[Bug c++/79416] Internal compiler error for recursive template expansion

2023-04-14 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79416 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment #4

[Bug ipa/109509] Huge compile time with forced inlining

2023-04-14 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109509 --- Comment #5 from Jan Hubicka --- For a summary - PR109491 does not seem to be about integration time. most time is RTL PRE. - PR108086 has 10% spent in integration and seems to be operand scan issue - PR99785 is hard to judge given

[Bug tree-optimization/109491] [11/12 Regression] Segfault in tree-ssa-sccvn.cc:expressions_equal_p()

2023-04-14 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109491 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment

[Bug rtl-optimization/108086] [11 Regression] internal compiler error: in set_accesses, at rtl-ssa/internals.inl:449

2023-04-14 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108086 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment

[Bug ipa/109509] Huge compile time with forced inlining

2023-04-14 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109509 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment

[Bug target/109137] [12 regression] Compiling ffmpeg with -m32 on x86_64-pc-linux-gnu hangs on libavcodec/h264_cabac.c since r12-9086-g489c81db7d4f75

2023-03-30 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109137 --- Comment #21 from Jan Hubicka --- Zen 1-3 changes were intentional in the original tuning patch (it is also briefly mentioned in the commit message). By allowing 256 bit AVX moves instead of 64bit integer moves (or 128bit) we can move

[Bug ipa/109341] [12/13 Regression] ICE in merge, at ipa-modref-tree.cc:176 since r12-3142-g5c85f29537662f

2023-03-30 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109341 Jan Hubicka changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |hubicka at gcc dot gnu.org

[Bug tree-optimization/109213] [13 Regression] -Os generates significantly more code since r13-723

2023-03-29 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109213 --- Comment #8 from Jan Hubicka --- We have large-stack-frame-growth that is relative, so yes, increasing stack size of caller makes gcc to think that it is heavy and making it event heavier will not hurt that much. We originally ran into

[Bug tree-optimization/106896] [13 Regression] ICE in to_sreal_scale, at profile-count.cc:339 since r13-2288-g61c4c989034548f4

2023-03-14 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106896 Jan Hubicka changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug tree-optimization/106896] [13 Regression] ICE in to_sreal_scale, at profile-count.cc:339 since r13-2288-g61c4c989034548f4

2023-03-06 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106896 --- Comment #11 from Jan Hubicka --- Originally to_sreal_frquency was intended to work both inter-procedurally and intra-procedurally. However in such setup there are side cases that can not be solved without knowing the corresponding

[Bug target/108429] [13 Regression] FAIL: gcc.target/i386/pr89618.c scan-tree-dump vect "LOOP VECTORIZED"

2023-03-06 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108429 Jan Hubicka changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2023-03-06 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 Bug 53947 depends on bug 108429, which changed state. Bug 108429 Summary: [13 Regression] FAIL: gcc.target/i386/pr89618.c scan-tree-dump vect "LOOP VECTORIZED" https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108429 What|Removed

[Bug c++/101118] coroutines: unexpected ODR warning for coroutine frame type in LTO builds

2023-03-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101118 --- Comment #6 from Jan Hubicka --- I am not really expert on coroutines. But this seems to be a type (not a declaration we globalize during LTO) generated internally by the front-end. The name __D.9984.3.4 looks like it has a global counter

[Bug tree-optimization/106896] [13 Regression] ICE in to_sreal_scale, at profile-count.cc:339 since r13-2288-g61c4c989034548f4

2023-03-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106896 --- Comment #10 from Jan Hubicka --- The problem the assert is trying to solve is that local counters are all frequencies relative to the entry block count, while IPA counters are absolute values within the whole program. So comparing them

[Bug c++/108887] [13 Regression] ICE in process_function_and_variable_attributes since r13-3601

2023-03-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108887 --- Comment #3 from Jan Hubicka --- We don't really have way to mark nodes for removal. I am not 100% sure I understand what the code does, but removing random nodes from cgraph in hook invoked from mangling seems dangerous, since we invoke

[Bug middle-end/108990] New: Too restrictive precision check in fold and simplify pattern for PR70920

2023-03-02 Thread hubicka at gcc dot gnu.org via Gcc-bugs
: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- While experimenting with a new gimple pass we noticed that pr70920.c is sensitive on order of substitutions made. If 0

[Bug middle-end/106258] [13 Regression] ICE in ipa_verify_edge_has_no_modifications, at ipa-param-manipulation.cc:139 since r13-1204-gd68d366425369649

2023-02-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106258 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org

[Bug ipa/103585] fatigue2 requires inlining of peridida to work well

2023-01-28 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103585 --- Comment #15 from Jan Hubicka --- We get 47s runtime with -O2 -flto and 53s with -O2 -fno-inline-functions-called-once. The call sequence is: [local count: 109362591]: _1656 = (unsigned long) _45; _1655 = _1656 + ivtmp.1182_2540;

[Bug ipa/108511] [13 regression] ICE in possibly_call_in_translation_unit_p, at cgraph.cc:4184 since r13-5285-g106f99406312d7ed

2023-01-28 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108511 --- Comment #6 from Jan Hubicka --- The function is used to discard early summaries that will lead to external calls. This saves some memory allocations. At this stage we have identified prevailing symbols and they are first in the

[Bug tree-optimization/108565] -Wuse-after-free false positive triggered by -O2 on a shared_ptr implementation

2023-01-28 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108565 --- Comment #5 from Jan Hubicka --- Teaching modref that THIS parameter of all destructors is nonescape looks like interesting idea (and easy to implement). Memory stores are currently indeed handled as "anyting may happen". modref does

[Bug middle-end/105469] [10/11/12/13 Regression] "execution reached an unreachable program point" with -flto since r5-7027-g0b986c6ac777aa4e

2023-01-18 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105469 --- Comment #18 from Jan Hubicka --- It should just make any bug to go latent. It surprises me it makes any difference given that things not cloned by ipa-cp should be all handled by ipa-sra.

[Bug ipa/94360] 6% run-time regression of 502.gcc_r against GCC 9 when compiled with -O2 and both PGO and LTO

2023-01-18 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94360 Jan Hubicka changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed|

[Bug target/108429] [13 Regression] FAIL: gcc.target/i386/pr89618.c scan-tree-dump vect "LOOP VECTORIZED"

2023-01-17 Thread hubicka at gcc dot gnu.org via Gcc-bugs
at gcc dot gnu.org |hubicka at gcc dot gnu.org --- Comment #4 from Jan Hubicka --- I see this is scatter with generic tuning. I actually did not intend to disable it there without more testing, so I will revert that part of change. In meantime I noticed that aocc sometimes seems to use

[Bug middle-end/106075] Wrong DSE with -fnon-call-exceptions

2023-01-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106075 --- Comment #6 from Jan Hubicka --- The SRA issue is fixed now, but I am not quite sure what is desrable solution here... This blocks modref from understanding side effects of functions doing EH.

[Bug middle-end/108425] Invalid DSE

2023-01-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108425 Jan Hubicka changed: What|Removed |Added Status|RESOLVED|REOPENED Last reconfirmed|

[Bug middle-end/108425] New: Invalid DSE

2023-01-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- With non-call exceptions we misoptimize following testcase: struct a{int a,b,c,d,e;}; void test(struct a * __restrict a, struct a *b) { *a = (struct a){0,1,2,3,4}; *a = *b; } jan@localhost:/tmp

[Bug target/108346] gather/scatter loops optimized too often for znver4 (and other zens)

2023-01-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108346 --- Comment #2 from Jan Hubicka --- Sadly the win/loss cases does not seem to suggest a simple cost scheme. We currently compute gather/scatter costs as static startup cost + cost per lane and they are set to approximately match actual

[Bug tree-optimization/99408] s3251 benchmark of TSVC vectorized by clang runs about 7 times faster compared to gcc

2023-01-14 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99408 --- Comment #4 from Jan Hubicka --- On Zen4 it is 20s for gcc and 6.9s for aocc, so still a problem.

[Bug middle-end/108376] TSVC s1279 runs 40% faster with aocc than gcc at zen4

2023-01-14 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108376 --- Comment #3 from Jan Hubicka --- If I make the arrays random then GCC code is indeed faster: #include #include typedef float real_t; #define iterations 100 #define LEN_1D 32000 #define LEN_2D 256 real_t

[Bug ipa/56139] [10/11/12/13 Regression] unmodified static data could go in .rodata, not .data

2023-01-14 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56139 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment #4

[Bug bootstrap/107950] partial LTO linking of libbackend.a: gcc/gcc-rich-location.cc:207: undefined reference to `range_label_for_type_mismatch::get_text(unsigned int) const'

2023-01-14 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107950 --- Comment #7 from Jan Hubicka --- Thanks for looking into the incremental link of libbackend. I had it in my tree for a while but never got around implementing correct way to enable it only during bootstrap since host compiler may not support

[Bug middle-end/108410] New: x264 averaging loop not optimized well for avx512

2023-01-14 Thread hubicka at gcc dot gnu.org via Gcc-bugs
Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- x264 benchmark has a loop averaging two unsigned char arrays that is executed with relatively low trip counts that does not play well with our vectorized code

[Bug tree-optimization/99411] s311, s312, s31111, s31111, s3110, vsumr benchmark of TSVC is vectorized by clang better than by gcc

2023-01-11 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99411 --- Comment #8 from Jan Hubicka --- Compared to aocc we also do worse on zen4: jh@alberti:~/tsvc/bin> ~/trunk-install/bin/gcc -Ofast -march=native s311.c jh@alberti:~/tsvc/bin> time ./a.out real0m3.207s user0m3.206s sys

[Bug middle-end/99634] s2102 benchmarks of TSVC is vectorized better by icc than gcc, interchange is missing

2023-01-11 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99634 --- Comment #2 from Jan Hubicka --- AOCC produced code is: .LBB0_2:# %vector.body # Parent Loop BB0_1 Depth=1 # => This Inner

[Bug tree-optimization/99412] s352 benchmark of TSVC is vectorized by clang and not by gcc

2023-01-11 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99412 --- Comment #2 from Jan Hubicka --- This is also seen with zen4 comparing gcc and aocc. (about 2.3 times differnece)

[Bug tree-optimization/99408] s3251 benchmark of TSVC vectorized by clang runs about 7 times faster compared to gcc

2023-01-11 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99408 --- Comment #3 from Jan Hubicka --- with zen4 gcc build loop takes 19s, while aocc 6.6. aocc: .LBB0_1:# %for.cond22.preheader # =>This Loop Header: Depth=1

[Bug middle-end/108376] New: TSVC s1279 runs 40% faster with aocc than gcc at zen4

2023-01-11 Thread hubicka at gcc dot gnu.org via Gcc-bugs
Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- jh@alberti:~/tsvc/bin> more s1279.c #include #include typedef float real_t; #define iterations 100 #define LEN_1D 32000 #define LEN_2D 256 rea

[Bug middle-end/108346] New: gather/scatter loops optimized too often for znver4 (and other zens)

2023-01-09 Thread hubicka at gcc dot gnu.org via Gcc-bugs
Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- The following two benchmarks tests gather/scatter codegen: s4113.c: #include #include //typedef float real_t; #define

[Bug tree-optimization/107467] [12/13 Regression] Miscompilation involing -Os , -flto and -fno-strict-aliasing since r12-656-ga564da506f52be66

2023-01-04 Thread hubicka at gcc dot gnu.org via Gcc-bugs
at gcc dot gnu.org |hubicka at gcc dot gnu.org --- Comment #7 from Jan Hubicka --- mine.

[Bug c++/107597] LTO causes static inline variables to get a non-uniqued global symbol

2023-01-04 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107597 --- Comment #7 from Jan Hubicka --- So I guess it is asan being confused by our optimization. We intentionaly duplicate the symbol in order to reduce cost of dynamic linking in situations where we know it does not change semantics, but asan

[Bug ipa/107769] [12/13 Regression] -flto with -Os/-O2/-O3 emitted code with gcc 12.x segfaults via mutated global in .rodata since r12-2887-ga6da2cddcf0e959d

2022-11-21 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107769 Jan Hubicka changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |hubicka at gcc dot gnu.org

[Bug middle-end/107719] New: 14% regression on TSVC s3113 on znve4 compared to GCC 7.5

2022-11-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- jh@alberti:~/tsvc/bin> cat tt5.c #include typedef double real_t; #define iterations 10 #define LEN_1D 32000 #define LEN_2D

[Bug tree-optimization/107715] TSVC s161 and s277 for double runs at zen4 30 times slower when vectorization is enabled

2022-11-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107715 Jan Hubicka changed: What|Removed |Added Summary|TSVC s161 for double runs |TSVC s161 and s277 for

[Bug tree-optimization/99411] s311, s312, s31111, s31111, s3110, vsumr benchmark of TSVC is vectorized by clang better than by gcc

2022-11-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99411 --- Comment #7 from Jan Hubicka --- With znver4 current trunk and clang15 I still see this problem (clang code is about 60% faster) for s311, s312 and s3111. Curious s3 and s3110 no longer shows a regression.

[Bug middle-end/107718] New: clang optimizes TSVC s317 a lot better

2022-11-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
-end Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- This is a stupid benchmark but still... jh@alberti:~/tsvc/bin> more tt2.c typedef double real_t; #define iterations 10 #define LEN_1D 32000 #define LEN_2D 256 rea

[Bug tree-optimization/99408] s3251 benchmark of TSVC vectorized by clang runs about 7 times faster compared to gcc

2022-11-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99408 --- Comment #2 from Jan Hubicka --- This also reproduces with zen4 and double. jh@alberti:~/tsvc/bin> cat tt.c typedef double real_t; #define iterations 10 #define LEN_1D 32000 #define LEN_2D 256 real_t

[Bug tree-optimization/107715] New: TSVC s161 for double runs at zen4 30 times slower when vectorization is enabled

2022-11-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- jh@alberti:~/tsvc/bin> more test.c typedef double real_t; #define iterations 10 #define LEN_1D 32

[Bug ipa/101839] [10/11/12/13 Regression] Hang in C++ code with -fdevirtualize

2022-08-12 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101839 --- Comment #11 from Jan Hubicka --- Fixed on mainline with r:0f2c7ccd14a29a8af8318f50b8296098fb0ab218

[Bug ipa/101839] [10/11/12/13 Regression] Hang in C++ code with -fdevirtualize

2022-08-10 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101839 --- Comment #10 from Jan Hubicka --- Created attachment 53430 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53430=edit Patch I am testing

[Bug ipa/101839] [10/11/12/13 Regression] Hang in C++ code with -fdevirtualize

2022-08-10 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101839 --- Comment #9 from Jan Hubicka --- Thanks for looking into this. What happens here is that we start working from a call where we know that outer_type is BA. We correctly find the BA::type and notice that it is final and thus we do not need

<    1   2   3   4   5   6   7   8   9   10   >