[Bug target/114659] gcc miscompiles a __builtin_memcpy on i386, leading to wrong results for SNaN

2024-07-26 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114659 --- Comment #15 from Alexander Monakov --- (In reply to Jakub Jelinek from comment #14) > (In reply to Alexander Monakov from comment #13) > > fldt does not convert (otherwise there's no way to spill/reload x87 > > registers). > > Doesn't it

[Bug target/114659] gcc miscompiles a __builtin_memcpy on i386, leading to wrong results for SNaN

2024-07-26 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114659 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug ipa/115533] [12/13/14/15 regression] flac miscompiled with -O3 -march=znver2 -fipa-pta -fno-vect-cost-model since r12-3893-g6390c5047adb75

2024-07-04 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115533 --- Comment #26 from Alexander Monakov --- (In reply to Richard Biener from comment #24) > > That's because of -fno-vect-cost-model, it wouldn't be vectorized otherwise. Thanks, I forgot. The testcase in PR 106902 was vectorized at plain -O3

[Bug ipa/115533] [12/13/14/15 regression] flac miscompiled with -O3 -march=znver2 -fipa-pta -fno-vect-cost-model since r12-3893-g6390c5047adb75

2024-07-03 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115533 --- Comment #23 from Alexander Monakov --- I suggest it to close this a dup of PR 106902 if there are no better ideas. By the way, in both cases SLP introduces vectors in a loop where scalar computations it's attempting to replace are not

[Bug ipa/115533] [12/13/14/15 regression] flac miscompiled with -O3 -march=znver2 -fipa-pta -fno-vect-cost-model since r12-3893-g6390c5047adb75

2024-06-24 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115533 --- Comment #22 from Alexander Monakov --- Similar to the RawTherapee issue, SLP opportunities are created by predcom, so either -fno-predictive-commoning or -fno-tree-slp-vectorize avoids numerical runaway on the small testcase.

[Bug ipa/115533] [12/13/14/15 regression] flac miscompiled with -O3 -march=znver2 -fipa-pta -fno-vect-cost-model since r12-3893-g6390c5047adb75

2024-06-24 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115533 --- Comment #20 from Alexander Monakov --- Sam, can you provide more context? It seems there is no downstream bugreport? How does the alleged miscompilation manifest? Note that effects of interplay of fp-contract=fast and vectorization can be

[Bug target/115333] -march=native sets --param "l2-cache-size=1024" on Ryzen 7 7800X3D

2024-06-03 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115333 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug target/115161] [15 Regression] highway-1.0.7 miscompilation of some SSE2 intrinsics

2024-05-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115161 --- Comment #23 from Alexander Monakov --- (In reply to Sergei Trofimovich from comment #22) > Here `pcmpeqd %xmm2,%xmm1` is a problematic instruction. Why does `gcc` use > `%xmm2` (result of `cvttps2dq`) instead of, say `%xmm0` which contains

[Bug middle-end/115170] __cxa_atexit@plt even if -fno-plt

2024-05-24 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115170 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug target/115161] [15 Regression] highway-1.0.7 miscompilation of some SSE2 intrinsics

2024-05-22 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115161 --- Comment #20 from Alexander Monakov --- (In reply to Jakub Jelinek from comment #19) > If we guarantee that we never constant fold FIX/UNSIGNED_FIX with > -ftrapping-math (we shouldn't, as the exceptions should be raised), then > using

[Bug target/115161] [15 Regression] highway-1.0.7 miscompilation of some SSE2 intrinsics

2024-05-22 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115161 --- Comment #18 from Alexander Monakov --- No, allowing value-changing transformations under -ftrapping-math is really not appropriate. Invoking the intrinsic on a large floating-point value is not UB.

[Bug target/115161] [15 Regression] highway-1.0.7 miscompilation of some SSE2 intrinsics

2024-05-21 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115161 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug middle-end/115132] Sibling calls optim should not be performed when builtin_unwind_init is used

2024-05-17 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115132 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug middle-end/115091] Support value speculation in frontend

2024-05-16 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115091 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug target/115014] GCC generates incorrect instructions for addressing the data segment through EBP register

2024-05-10 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115014 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug target/114944] Codegen of __builtin_shuffle for an 16-byte uint8_t vector is suboptimal on SSE2

2024-05-06 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114944 --- Comment #4 from Alexander Monakov --- Like this: pandxmm1, XMMWORD PTR .LC0[rip] movaps XMMWORD PTR [rsp-40], xmm0 xor eax, eax xor edx, edx movaps XMMWORD PTR [rsp-24], xmm1

[Bug target/114944] Codegen of __builtin_shuffle for an 16-byte uint8_t vector is suboptimal on SSE2

2024-05-06 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114944 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug target/114960] New: [12/13/14/15 Regression] fails to clean up vector casts

2024-05-06 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114960 Bug ID: 114960 Summary: [12/13/14/15 Regression] fails to clean up vector casts Product: gcc Version: 12.3.1 Status: UNCONFIRMED Severity: normal

[Bug c/114923] gcc ignores escaping pointer and applies invalid optimization

2024-05-02 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114923 --- Comment #4 from Alexander Monakov --- You can place points of possible access outside of abstract machine in a fine-grained manner with volatile asms: asm volatile("" : "=m"(buf)); This cannot be reordered against accesses to volatile

[Bug c/114923] gcc ignores escaping pointer and applies invalid optimization

2024-05-02 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114923 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug libgomp/114765] linking to libgomp and setting CPU_PROC_BIND causes affinity reset

2024-04-18 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114765 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug c++/114480] g++: internal compiler error: Segmentation fault signal terminated program cc1plus

2024-04-05 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480 --- Comment #21 from Alexander Monakov --- It is possible to reduce gcc_qsort workload by improving the presorted-ness of the array, but of course avoiding quadratic behavior would be much better. With the following change, we go from

[Bug c++/114480] g++: internal compiler error: Segmentation fault signal terminated program cc1plus

2024-04-04 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480 --- Comment #20 from Alexander Monakov --- (note that if you uninclude the testcase and compile with -fno-exceptions it's much faster) On the smaller testcase from comment 14, prune_unused_phi_nodes invokes gcc_qsort 53386 times. There are two

[Bug lto/114337] LTO symbol table doesn't include builtin functions

2024-03-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114337 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug target/108866] Allow to pass Windows resource file (.rc) as input to gcc

2024-03-14 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108866 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug rtl-optimization/114261] [13/14 Regression] Scheduling takes excessive time (97%) since r13-5154-g733a1b777f1

2024-03-13 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114261 --- Comment #10 from Alexander Monakov --- Indeed, but OTOH according to bug 84402 comment 58 it caused a noticeable hit on gimple-match.cc compilation: 733a1b777f16cd397b43a242d9c31761f66d3da8 13th January 2023 sched-deps: do not schedule

[Bug rtl-optimization/114261] [13/14 Regression] Scheduling takes excessive time (97%) since r13-5154-g733a1b777f1

2024-03-13 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114261 --- Comment #8 from Alexander Monakov --- If we want to get rid of the compilation time regression sooner rather than later, I can suggest limiting my change only to functions that call setjmp: diff --git a/gcc/sched-deps.cc

[Bug rtl-optimization/114261] [13/14 Regression] Scheduling takes excessive time (97%)

2024-03-11 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114261 Alexander Monakov changed: What|Removed |Added CC||mkuvyrkov at gcc dot gnu.org ---

[Bug rtl-optimization/114261] [13/14 Regression] Scheduling takes excessive time (97%)

2024-03-06 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114261 --- Comment #3 from Alexander Monakov --- The first attachment is empty (perhaps you made a non-recursive archive when you meant to recursively zip a directory).

[Bug c++/66487] sanitizer/warnings for lifetime DSE

2024-02-28 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66487 --- Comment #28 from Alexander Monakov --- The bug is about the issue of lacking diagnostics, it should be fine to make note of various approaches to remedy the problem in one bug report. (in any case, all discussion of the Valgrind-based

[Bug rtl-optimization/113903] sched1 should schedule across EBBS

2024-02-13 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113903 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug ipa/113890] -fdump-tree-modref ICE with _BitInt

2024-02-12 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113890 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug target/113560] Strange code generated when optimizing a multiplication on x86_64

2024-01-24 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113560 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug ipa/113293] Incorrect code after inlining function containing extended asm

2024-01-09 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113293 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug rtl-optimization/113280] Strange error for empty inline assembly with +X constraint

2024-01-09 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113280 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug libstdc++/113159] More robust std::sort for silly comparator functions

2023-12-28 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113159 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug middle-end/113082] builtin transforms do not honor errno

2023-12-19 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113082 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug c/44179] warn about sizeof(char) and sizeof('x')

2023-12-16 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44179 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug middle-end/112697] [14 Regression] 30-40% exec time regression of 433.milc on zen2 since r14-4972-g8aa47713701b1f

2023-12-01 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697 --- Comment #9 from Alexander Monakov --- ... as does inserting a nop before the compare ¯\_(ツ)_/¯ --- d.out.ltrans0.ltrans.slow.s 2023-12-01 18:32:54.255841611 +0300 +++ d.out.ltrans0.ltrans.s 2023-12-01 18:53:04.909438690 +0300 @@

[Bug middle-end/112697] [14 Regression] 30-40% exec time regression of 433.milc on zen2 since r14-4972-g8aa47713701b1f

2023-12-01 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697 --- Comment #8 from Alexander Monakov --- Thanks, I can reproduce it. It is pretty tricky though. For instance, just swapping the mov and the compare is enough to make it fast: --- d.out.ltrans0.ltrans.slow.s 2023-12-01 18:32:54.255841611

[Bug middle-end/112697] [14 Regression] 30-40% exec time regression of 433.milc on zen2 since r14-4972-g8aa47713701b1f

2023-11-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug target/111107] i686-w64-mingw32 does not realign stack when __attribute__((aligned)) or __attribute__((vector_size)) are used

2023-11-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=07 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug preprocessor/112701] New: wrong type inference for ternary operator in preprocessing context

2023-11-24 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112701 Bug ID: 112701 Summary: wrong type inference for ternary operator in preprocessing context Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal

[Bug c/112699] Should limits.h in freestanding environment be self-contained?

2023-11-24 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112699 --- Comment #2 from Alexander Monakov --- Sorry, even though GCC's limits.h is installed under include-fixed, it is generated separately, not by the generic fixincludes mechanism. I was confused.

[Bug c/112699] Should limits.h in freestanding environment be self-contained?

2023-11-24 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112699 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug middle-end/111655] [11/12/13/14 Regression] wrong code generated for __builtin_signbit and 0./0. on x86-64 -O2

2023-11-24 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111655 --- Comment #13 from Alexander Monakov --- > Then there is the MULT_EXPR x * x case This is PR 111701. It would be nice to clarify what "nonnegative" means in the contracts of this family of functions, because it's ambiguous for NaNs and

[Bug rtl-optimization/110307] ICE in move_insn, at haifa-sched.cc:5473 when building Ruby on alpha with -fPIC -O2 (or -fpeephole2 -fschedule-insns2)

2023-11-11 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110307 Alexander Monakov changed: What|Removed |Added CC||uros at gcc dot gnu.org ---

[Bug target/82242] IRA spills allocno in loop body if it crosses throwing call outside the loop

2023-11-10 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82242 --- Comment #5 from Alexander Monakov --- The small testcase from comment 3 is now improved on trunk, possibly thanks to work in PR 110215.

[Bug c/112367] wrong rounding of sum of floating-point constants

2023-11-03 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112367 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug c++/66487] sanitizer/warnings for lifetime DSE

2023-10-30 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66487 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug c/111884] New: unsigned char no longer aliases anything under -std=c2x

2023-10-19 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111884 Bug ID: 111884 Summary: unsigned char no longer aliases anything under -std=c2x Product: gcc Version: 13.2.1 Status: UNCONFIRMED Keywords: wrong-code

[Bug target/111768] X86: -march=native does not support alder lake big.little cache infor correctly

2023-10-12 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111768 --- Comment #11 from Alexander Monakov --- (In reply to Hongtao.liu from comment #10) > > indeed (but I believe it did happen with Alder Lake already, by accident, > > with AVX512 on P-cores but not on E-cores). > > AVX512 is physically fused

[Bug target/111768] X86: -march=native does not support alder lake big.little cache infor correctly

2023-10-12 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111768 --- Comment #9 from Alexander Monakov --- (In reply to Arsen Arsenović from comment #8) > indeed (but I believe it did happen with Alder Lake already, by accident, > with AVX512 on P-cores but not on E-cores). AFAIK on those Alder Lake CPUs

[Bug target/111768] X86: -march=native does not support alder lake big.little cache infor correctly

2023-10-12 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111768 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug target/111768] X86: -march=native does not support alder lake big.little cache infor correctly

2023-10-11 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111768 --- Comment #5 from Alexander Monakov --- I think it's similar to attempting -march=native under distcc, which is already warned about on Gentoo wiki: https://wiki.gentoo.org/wiki/Distcc The difference here is that Intel so far decided to make

[Bug tree-optimization/111694] [13/14 Regression] Wrong behavior for signbit of negative zero when optimizing

2023-10-09 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111694 --- Comment #7 from Alexander Monakov --- No backport for gcc-13 planned?

[Bug sanitizer/111736] Address sanitizer is not compatible with named address spaces

2023-10-09 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111736 --- Comment #3 from Alexander Monakov --- Sorry, the second half of my comment is confusing. To clarify, ASan works fine for TLS data (the compiler knows that TLS base is at fs:0; libsanitizer uses some hacks to initialize shadow for TLS

[Bug sanitizer/111736] Address sanitizer is not compatible with named address spaces

2023-10-09 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111736 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug ipa/111643] __attribute__((flatten)) with -O1 runs out of memory (killed cc1)

2023-10-06 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111643 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug middle-end/111701] New: [11/12/13/14 Regression] wrong code for __builtin_signbit(x*x)

2023-10-05 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111701 Bug ID: 111701 Summary: [11/12/13/14 Regression] wrong code for __builtin_signbit(x*x) Product: gcc Version: 13.2.1 Status: UNCONFIRMED Keywords: wrong-code

[Bug tree-optimization/111694] [13/14 Regression] Wrong behavior for signbit of negative zero when optimizing

2023-10-04 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111694 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3

2023-10-04 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug middle-end/111655] [11/12/13/14 Regression] wrong code generated for __builtin_signbit and 0./0. on x86-64 -O2

2023-10-04 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111655 --- Comment #11 from Alexander Monakov --- (In reply to Richard Biener from comment #10) > And this conservatively has to apply to all FP divisions where we might infer > "nonnegative" unless we can also infer !zerop? Yes, I think the logic in

[Bug middle-end/51446] -fno-trapping-math generates NaN constant with different sign

2023-10-02 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51446 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug target/111655] [11/12/13/14 Regression] wrong code generated for __builtin_signbit and 0./0. on x86-64 -O2

2023-10-02 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111655 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug c/111210] Wrong code at -Os on x86_64-linux-gnu since r12-4849-gf19791565d7

2023-08-28 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111210 --- Comment #4 from Alexander Monakov --- The testcase is small enough to notice the issue by inspection. Note that you get the "expected" answer with -fno-strict-aliasing, and as explained in https://gcc.gnu.org/bugs/ it is one of the things

[Bug c/111210] Wrong code at -Os on x86_64-linux-gnu since r12-4849-gf19791565d7

2023-08-28 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111210 Alexander Monakov changed: What|Removed |Added Resolution|--- |INVALID

[Bug rtl-optimization/111143] [missed optimization] unlikely code slows down diffutils x86-64 ASCII processing

2023-08-26 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43 --- Comment #6 from Alexander Monakov --- Thanks. i5-1335U has two "performance cores" (with HT, four logical CPUs) and eight "efficiency cores". They have different micro-architecture. Are you binding the benchmark to some core in particular?

[Bug rtl-optimization/111143] [missed optimization] unlikely code slows down diffutils x86-64 ASCII processing

2023-08-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug rtl-optimization/111101] -finline-small-functions may invert FP arguments breaking FP bit accuracy in case of NaNs

2023-08-22 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=01 Alexander Monakov changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug middle-end/111009] [12/13/14 regression] -fno-strict-overflow erroneously elides null pointer checks and causes SIGSEGV on perf from linux-6.4.10

2023-08-14 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111009 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug target/110979] Miss-optimization for O2 fully masked loop on floating point reduction.

2023-08-11 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110979 --- Comment #2 from Alexander Monakov --- Yes, it is wrong-code to full extent. To demonstrate, you can initialize 'sum' and the array to negative zeroes: #define FLT double #define N 20 __attribute__((noipa)) FLT foo3 (FLT *a) { FLT sum

[Bug ipa/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946 --- Comment #11 from Alexander Monakov --- (In reply to Alexander Monakov from comment #8) > inline void mbedtls_put_unaligned_uint64(void *p, uint64_t x) > { > memcpy(p, , sizeof(x)); > } > > > We deciding to not inline this, while

[Bug ipa/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946 --- Comment #10 from Alexander Monakov --- Ah, the non-static inlines are intentional, the corresponding extern declarations appear in library/platform_util.c. Sorry, I missed that file the first time around.

[Bug ipa/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946 --- Comment #9 from Alexander Monakov --- (In reply to Alexander Monakov from comment #2) > Note that inline functions in mbedtls/library/alignment.h all miss the > 'static' qualifier, which affects inlining decisions, and looks like a >

[Bug ipa/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946 --- Comment #8 from Alexander Monakov --- Why? There's no bswap here, in particular mbedtls_put_unaligned_uint64 is a straightforward wrapper for memcpy: inline void mbedtls_put_unaligned_uint64(void *p, uint64_t x) { memcpy(p, ,

[Bug other/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug target/110926] [14 regression] Bootstrap failure (matmul_i1.c:1781:1: internal compiler error: RTL check: expected elt 0 type 'i' or 'n', have 'w' (rtx const_int) in vpternlog_redundant_operand_m

2023-08-06 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110926 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug target/110202] _mm512_ternarylogic_epi64 generates unnecessary operations

2023-08-05 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110202 Alexander Monakov changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug sanitizer/110799] [tsan] False positive due to -fhoist-adjacent-loads

2023-07-31 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799 --- Comment #16 from Alexander Monakov --- In C11 and C++11 the issue of compiler-introduced racing loads is discussed as follows (5.1.2.4 Multi-threaded executions and data races in C11): 28 NOTE 14 Transformations that introduce a

[Bug rtl-optimization/110823] [missed optimization] >50% speedup for x86-64 ASCII processing a la GNU diffutils

2023-07-30 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110823 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug sanitizer/110799] [tsan] False positive due to -fhoist-adjacent-loads

2023-07-26 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799 --- Comment #9 from Alexander Monakov --- (In reply to Tom de Vries from comment #7) > Can you elaborate on what you consider a correct approach? I think this optimization is incorrect and should be active only under -Ofast. I can offer two

[Bug sanitizer/110799] [tsan] False positive due to -fhoist-adjacent-loads

2023-07-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug target/110762] inappropriate use of SSE (or AVX) insns for v2sf mode operations

2023-07-21 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110762 --- Comment #14 from Alexander Monakov --- That seems undesirable in light of comment #4, you'd risk creating a situation when -fno-trapping-math is unpredictably slower when denormals appear in dirty upper halves.

[Bug target/110762] inappropriate use of SSE (or AVX) insns for v2sf mode operations

2023-07-21 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110762 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug target/110611] X86 is not honouring POINTERS_EXTEND_UNSIGNED in m32 code.

2023-07-10 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110611 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug target/110438] generating all-ones zmm needs dep-breaking pxor before ternlog

2023-07-04 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110438 --- Comment #3 from Alexander Monakov --- Patch available: https://inbox.sourceware.org/gcc-patches/8f73371d732237ed54ede44b7bd88...@ispras.ru/T/#u

[Bug rtl-optimization/110202] _mm512_ternarylogic_epi64 generates unnecessary operations

2023-06-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110202 --- Comment #9 from Alexander Monakov --- (In reply to Hongtao.liu from comment #8) > > For this one, we can load *a into %zmm0 to avoid false_dependence. > > vmovdqau ZMMWORD PTR [rdi], zmm0 > vpternlogq zmm0, zmm0, zmm0, 85 Yes, since

[Bug rtl-optimization/110202] _mm512_ternarylogic_epi64 generates unnecessary operations

2023-06-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110202 --- Comment #7 from Alexander Monakov --- Note that vpxor serves as a dependency-breaking instruction (see PR 110438). So in negate1 we do the right thing for the wrong reasons, and in negate2 we can cause a substantial stall if the previous

[Bug target/110438] generating all-ones zmm needs dep-breaking pxor before ternlog

2023-06-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110438 --- Comment #1 from Alexander Monakov --- We might want to omit PXOR when optimizing for size.

[Bug target/110438] New: generating all-ones zmm needs dep-breaking pxor before ternlog

2023-06-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110438 Bug ID: 110438 Summary: generating all-ones zmm needs dep-breaking pxor before ternlog Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal

[Bug rtl-optimization/110237] gcc.dg/torture/pr58955-2.c is miscompiled by RTL scheduling after reload

2023-06-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237 --- Comment #21 from Alexander Monakov --- (In reply to rguent...@suse.de from comment #19) > But the size argument doesn't have anything to do with TBAA (and > may_alias is about TBAA). I don't think we have any way to circumvent > C object

[Bug middle-end/110431] New: Incorrect disambiguation of wide accesess from store-merging or SLP

2023-06-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110431 Bug ID: 110431 Summary: Incorrect disambiguation of wide accesess from store-merging or SLP Product: gcc Version: 12.3.0 Status: UNCONFIRMED Keywords:

[Bug rtl-optimization/110237] gcc.dg/torture/pr58955-2.c is miscompiled by RTL scheduling after reload

2023-06-26 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237 --- Comment #18 from Alexander Monakov --- (In reply to rguent...@suse.de from comment #17) > Yes, we do the same to loads. I hope that's not a common technique > though but I have to admit the vectorizer itself assesses whether it's > safe to

[Bug rtl-optimization/110237] gcc.dg/torture/pr58955-2.c is miscompiled by RTL scheduling after reload

2023-06-26 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237 --- Comment #16 from Alexander Monakov --- (In reply to rguent...@suse.de from comment #14) > vectors of T and scalar T interoperate TBAA wise. What we disambiguate is > > int a[2]; > > int foo(int *p) > { > a[0] = 1; > *(v4si *)p =

[Bug target/110273] [12/13/14 Regression] i686-w64-mingw32 with -mavx512f generates AVX instructions without stack alignment

2023-06-26 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110273 --- Comment #8 from Alexander Monakov --- (In reply to Sam James from comment #7) > We keep getting quite a few reports of this downstream. Of this mingw32 stack realignment issue specifically, i.e. Wine breakage when AVX512 is enabled via

[Bug rtl-optimization/110237] gcc.dg/torture/pr58955-2.c is miscompiled by RTL scheduling after reload

2023-06-26 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237 --- Comment #13 from Alexander Monakov --- (In reply to rguent...@suse.de from comment #12) > As explained in comment#3 the issue is related to the tree alias oracle > part that gets invoked on the MEM_EXPR for the load where there is > no

[Bug rtl-optimization/110237] gcc.dg/torture/pr58955-2.c is miscompiled by RTL scheduling after reload

2023-06-26 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237 --- Comment #11 from Alexander Monakov --- The trapping angle seems valid, but I have a really hard time understanding the DSE issue, and the preceding issue about disambiguation based on RTL aliasing. How would DSE optimize out 'd[5] = 1' in

[Bug rtl-optimization/110307] ICE in move_insn, at haifa-sched.cc:5473 when building Ruby on alpha with -fPIC -O2 (or -fpeephole2 -fschedule-insns2)

2023-06-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110307 --- Comment #13 from Alexander Monakov --- Note to self: check how control_flow_insn_p relates.

[Bug tree-optimization/110369] wrong code on x86_64-linux-gnu

2023-06-22 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110369 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

  1   2   3   4   >