[Bug target/114944] Codegen of __builtin_shuffle for an 16-byte uint8_t vector is suboptimal on SSE2

2024-05-06 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114944 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug target/114944] Codegen of __builtin_shuffle for an 16-byte uint8_t vector is suboptimal on SSE2

2024-05-06 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114944 --- Comment #4 from Alexander Monakov --- Like this: pandxmm1, XMMWORD PTR .LC0[rip] movaps XMMWORD PTR [rsp-40], xmm0 xor eax, eax xor edx, edx movaps XMMWORD PTR [rsp-24], xmm1 mov

[Bug target/115014] GCC generates incorrect instructions for addressing the data segment through EBP register

2024-05-10 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115014 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug middle-end/115091] Support value speculation in frontend

2024-05-16 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115091 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug middle-end/115132] Sibling calls optim should not be performed when builtin_unwind_init is used

2024-05-17 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115132 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug target/115161] [15 Regression] highway-1.0.7 miscompilation of some SSE2 intrinsics

2024-05-21 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115161 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug target/115161] [15 Regression] highway-1.0.7 miscompilation of some SSE2 intrinsics

2024-05-22 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115161 --- Comment #18 from Alexander Monakov --- No, allowing value-changing transformations under -ftrapping-math is really not appropriate. Invoking the intrinsic on a large floating-point value is not UB.

[Bug target/115161] [15 Regression] highway-1.0.7 miscompilation of some SSE2 intrinsics

2024-05-22 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115161 --- Comment #20 from Alexander Monakov --- (In reply to Jakub Jelinek from comment #19) > If we guarantee that we never constant fold FIX/UNSIGNED_FIX with > -ftrapping-math (we shouldn't, as the exceptions should be raised), then > using FIX/UN

[Bug middle-end/115170] __cxa_atexit@plt even if -fno-plt

2024-05-24 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115170 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug target/115161] [15 Regression] highway-1.0.7 miscompilation of some SSE2 intrinsics

2024-05-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115161 --- Comment #23 from Alexander Monakov --- (In reply to Sergei Trofimovich from comment #22) > Here `pcmpeqd %xmm2,%xmm1` is a problematic instruction. Why does `gcc` use > `%xmm2` (result of `cvttps2dq`) instead of, say `%xmm0` which contains >

[Bug target/115333] -march=native sets --param "l2-cache-size=1024" on Ryzen 7 7800X3D

2024-06-03 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115333 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug ipa/115533] [12/13/14/15 regression] flac miscompiled with -O3 -march=znver2 -fipa-pta -fno-vect-cost-model since r12-3893-g6390c5047adb75

2024-06-24 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115533 --- Comment #20 from Alexander Monakov --- Sam, can you provide more context? It seems there is no downstream bugreport? How does the alleged miscompilation manifest? Note that effects of interplay of fp-contract=fast and vectorization can be p

[Bug ipa/115533] [12/13/14/15 regression] flac miscompiled with -O3 -march=znver2 -fipa-pta -fno-vect-cost-model since r12-3893-g6390c5047adb75

2024-06-24 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115533 --- Comment #22 from Alexander Monakov --- Similar to the RawTherapee issue, SLP opportunities are created by predcom, so either -fno-predictive-commoning or -fno-tree-slp-vectorize avoids numerical runaway on the small testcase.

[Bug ipa/115533] [12/13/14/15 regression] flac miscompiled with -O3 -march=znver2 -fipa-pta -fno-vect-cost-model since r12-3893-g6390c5047adb75

2024-07-03 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115533 --- Comment #23 from Alexander Monakov --- I suggest it to close this a dup of PR 106902 if there are no better ideas. By the way, in both cases SLP introduces vectors in a loop where scalar computations it's attempting to replace are not elimi

[Bug ipa/115533] [12/13/14/15 regression] flac miscompiled with -O3 -march=znver2 -fipa-pta -fno-vect-cost-model since r12-3893-g6390c5047adb75

2024-07-04 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115533 --- Comment #26 from Alexander Monakov --- (In reply to Richard Biener from comment #24) > > That's because of -fno-vect-cost-model, it wouldn't be vectorized otherwise. Thanks, I forgot. The testcase in PR 106902 was vectorized at plain -O3 b

[Bug rtl-optimization/108117] Wrong instruction scheduling on value coming from abnormal SSA

2022-12-22 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108117 --- Comment #16 from Alexander Monakov --- Draft patch for the sched1 issue: https://inbox.sourceware.org/gcc-patches/cf62c3ec-0a9e-275e-5efa-2689ff1f0...@ispras.ru/T/#m95238afa0f92daa0ba7f8651741089e7cfc03481

[Bug middle-end/108209] New: goof in genmatch.cc:commutative_op

2022-12-23 Thread amonakov at gcc dot gnu.org via Gcc-bugs
Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Target Milestone: --- It pretends that define_operator_list is commutative when its first member is NOT commutative: if (user_id *uid = dyn_cast (id)) { int res = commutative_op (uid

[Bug middle-end/108209] goof in genmatch.cc:commutative_op

2022-12-23 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108209 --- Comment #1 from Alexander Monakov --- Keeping notes as I go... Duplicated checks for 'op0' in lower_for are duplicated.

[Bug target/108229] New: [13 Regression] unprofitable STV transform

2022-12-26 Thread amonakov at gcc dot gnu.org via Gcc-bugs
: target Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Target Milestone: --- Target: x86_64-*-* In the following example, STV is making a very unprofitable transformation on trunk, but not on gcc-12: #include #include struct b

[Bug target/108229] [13 Regression] unprofitable STV transform since r13-4873-g0b2c1369d035e928

2022-12-28 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108229 --- Comment #3 from Alexander Monakov --- Thank you! I considered this unprofitable for these reasons: 1. As you said, the code grows in size, but the speed benefit is not clear. 2. The transform converts load+add operations in a loop, and the

[Bug middle-end/108256] New: Missing integer overflow instrumentation when assignment LHS is narrow

2022-12-31 Thread amonakov at gcc dot gnu.org via Gcc-bugs
Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Target Milestone: --- For unsigned short f(unsigned short x, unsigned short y) { return x * y; } unsigned short g(unsigned short x

[Bug target/108315] New: -mcpu=power10 changes ABI

2023-01-06 Thread amonakov at gcc dot gnu.org via Gcc-bugs
Component: target Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Target Milestone: --- Target: powerpc64le-*-* Created attachment 54202 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54202&action=edit testcase At le

[Bug rtl-optimization/108318] Floating point calculation moved out of loop despite fesetround

2023-01-06 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108318 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug target/108322] Using __restrict parameter with -ftree-vectorize (default with -O2) results in massive code bloat

2023-01-06 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108322 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug target/108322] Using __restrict parameter with -ftree-vectorize (default with -O2) results in massive code bloat

2023-01-10 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108322 --- Comment #5 from Alexander Monakov --- (In reply to Richard Biener from comment #4) > > For the case at hand loading two vectors from the destination and then > punpck{h,l}bw and storing them again might be the most efficient thing > to do h

[Bug middle-end/108376] TSVC s1279 runs 40% faster with aocc than gcc at zen4

2023-01-11 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108376 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug target/108401] gcc defeats vector constant generation with intrinsics

2023-01-15 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108401 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug tree-optimization/108487] [10/11/12/13 Regression] ~20-30x slowdown in populating std::vector from std::ranges::iota_view

2023-01-21 Thread amonakov at gcc dot gnu.org via Gcc-bugs
||std::ranges::iota_view CC||amonakov at gcc dot gnu.org --- Comment #1 from Alexander Monakov --- Regarding fn1, would you mind re-running the test on your Xeon CPU with fn2 removed from the source code and -falign-loops=32 added to gcc

[Bug libstdc++/108487] [10/11/12/13 Regression] ~20-30x slowdown in populating std::vector from std::ranges::iota_view

2023-01-21 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108487 Alexander Monakov changed: What|Removed |Added Component|tree-optimization |libstdc++ --- Comment #3 from Alexa

[Bug libgomp/108494] Slow thread creation with nested loops in GFortran

2023-01-23 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108494 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug target/108491] cross compiler does not work: cc1: error: ‘-msecure-plt’ not supported by your assembler

2023-01-24 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108491 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug rtl-optimization/108519] [13 regression] gcc.target/powerpc/pr105586.c fails after r13-5154-g733a1b777f16cd

2023-01-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108519 --- Comment #1 from Alexander Monakov --- We diverge in sched1 due to extra calls to advance_one_cycle when scheduling a BB that is empty apart from one debug insn. The following patch adds a hexdump of automaton state to make the problem eviden

[Bug rtl-optimization/108519] [13 regression] gcc.target/powerpc/pr105586.c fails after r13-5154-g733a1b777f16cd

2023-01-26 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108519 --- Comment #3 from Alexander Monakov --- Ah, a worthy sequel to "Note that I wasn't able to figure out a usable email address for the submitter" from PR 107353. Nevermind then.

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922 --- Comment #4 from Alexander Monakov --- Plus, Glibc does use fprem/fprem1 for fmodl/remainderl on x86_64, as well as for {fmod,remainder,remquo}{,f,l} on i386 without any branches for corner cases. So in practice CPUs apparently implement the

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922 --- Comment #7 from Alexander Monakov --- I saw that. That's why I'm pointing out that Glibc (and musl) uses the instruction without any additional checks: real CPUs produce the expected result in st(0), despite the documentation making no promi

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-26 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922 --- Comment #9 from Alexander Monakov --- (In reply to Jan Kratochvil from comment #8) > The revert makes it 13x faster. But the produced code still falls back to > calling glibc fmod() as shown in the disassembly in Comment 0. > If I use the "f

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922 --- Comment #15 from Alexander Monakov --- That is the fancy-error-handling path that is reached under _LIB_VERSION != _IEEE_. Before glibc-2.27, linking with -lieee would set _LIB_VERSION = _IEEE_, and then glibc would use the fprem[1] instruct

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922 --- Comment #19 from Alexander Monakov --- I get the feeling that you're ignoring me, but gcc-4.8.3 was already emitting a helper fmod call for setting errno without any flag_errno_math checks in i386.md, i.e. it was already in the middle-end. A

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922 --- Comment #22 from Alexander Monakov --- Strange, comment #8 claims the opposite (unless Jan tested the revert not on trunk, but on some branch).

[Bug target/108315] -mcpu=power10 changes ABI

2023-02-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108315 --- Comment #3 from Alexander Monakov --- Alan implemented the special case of .localentry 1 in this patch for the BFD linker (that appeared in binutils 2.32 if my calculations are correct): https://sourceware.org/pipermail/binutils/2018-July/10

[Bug target/108315] -mcpu=power10 changes ABI

2023-02-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108315 --- Comment #4 from Alexander Monakov --- Let me address one point separately: (In reply to Peter Bergner from comment #1) > CCing Alan, since he probably knows best how this all works, but yes, > -mcpu-power10 changes the ABI, namely it adds p

[Bug target/108315] -mcpu=power10 changes ABI

2023-03-02 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108315 --- Comment #10 from Alexander Monakov --- (In reply to Rui Ueyama from comment #9) > I'm the maintainer of the mold linker. I didn't implement that POWER10 ABI > because I didn't have an access to a POWER10 machine and therefore couldn't > veri

[Bug target/108315] -mcpu=power10 changes ABI

2023-03-03 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108315 --- Comment #14 from Alexander Monakov --- Are you guys really sure you want to blame the user here, considering that all linkers, including the BFD linker, initially misinterpreted the ABI the same way?

[Bug target/108315] -mcpu=power10 changes ABI

2023-03-03 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108315 Alexander Monakov changed: What|Removed |Added Resolution|INVALID |--- Status|RESOLVED

[Bug target/108315] -mcpu=power10 changes ABI

2023-03-06 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108315 --- Comment #18 from Alexander Monakov --- It seems you are saying that as long as GCC emits code according to the Holy Scripture that is the ABI spec, everything is fine. I imagine on other architectures maintainers are able to consider how the

[Bug c/110169] wrong code with '-Ofast'

2023-06-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110169 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug rtl-optimization/110202] _mm512_ternarylogic_epi64 generates unnecessary operations

2023-06-12 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110202 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug rtl-optimization/110237] gcc.dg/torture/pr58955-2.c is miscompiled by RTL scheduling after reload

2023-06-14 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug c/110249] __builtin_unreachable helps optimisation at -O1 but not at -O2

2023-06-14 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110249 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug web/110250] Broken url to README in st/cli-be project

2023-06-14 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110250 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug target/110260] Multiple applications misbehave when compiled with -march=znver4

2023-06-15 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110260 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug target/110260] Multiple applications misbehave when compiled with -march=znver4

2023-06-15 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110260 --- Comment #6 from Alexander Monakov --- (In reply to Jimi Huotari from comment #0) > (By the by, is ADCX a typo of ADX? I see -madx as an option but only one > use of it otherwise, and no -adcx as an option and lots of mentions of it... > but

[Bug target/110260] Multiple applications misbehave at runtime when compiled with -march=znver4

2023-06-15 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110260 --- Comment #10 from Alexander Monakov --- Right, those are different issues. Any chance of a standalone testcase extracted from Wine? If you already see a function where stack realignment is missing, just give us preprocessed containing source,

[Bug target/110273] i686-w64-mingw32 with -march=znver4 generates AVX instructions without stack alignment

2023-06-16 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110273 --- Comment #3 from Alexander Monakov --- Seems to work fine with explicit '-mincoming-stack-boundary=2' on the command line, even though it should make no difference for the 32-bit MinGW target.

[Bug target/110273] i686-w64-mingw32 with -march=znver4 generates AVX instructions without stack alignment

2023-06-16 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110273 --- Comment #4 from Alexander Monakov --- Further reduced: void f() { int c[4] = { 0, 0, 0, 0 }; int cc[8] = { 0 }; asm("" :: "m"(c), "m"(cc)); } Also reproducible with -march=skylake-avx512 or even plain -mavx512f, retitling.

[Bug target/110273] [12/13/14 Regression] i686-w64-mingw32 with -mavx512f generates AVX instructions without stack alignment

2023-06-16 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110273 --- Comment #6 from Alexander Monakov --- Huh? Just compile the supplied testcases without avx512, you'll see proper stack realignment.

[Bug rtl-optimization/110307] ICE in move_insn, at haifa-sched.cc:5473 on alpha with -fPIC -fpeephole2 -fschedule-insns2

2023-06-19 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110307 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug rtl-optimization/110307] ICE in move_insn, at haifa-sched.cc:5473 when building Ruby on alpha with -fPIC -O2 (or -fpeephole2 -fschedule-insns2)

2023-06-19 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110307 --- Comment #3 from Alexander Monakov --- Do you have older versions of GCC to check on this testcase?

[Bug rtl-optimization/110307] ICE in move_insn, at haifa-sched.cc:5473 when building Ruby on alpha with -fPIC -O2 (or -fpeephole2 -fschedule-insns2)

2023-06-20 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110307 --- Comment #5 from Alexander Monakov --- It's not necessary yet for this particular bug, but might be helpful for future bugs (if disk space is not an issue).

[Bug rtl-optimization/110307] ICE in move_insn, at haifa-sched.cc:5473 when building Ruby on alpha with -fPIC -O2 (or -fpeephole2 -fschedule-insns2)

2023-06-20 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110307 --- Comment #6 from Alexander Monakov --- Cross-compiler needs HAVE_AS_EXPLICIT_RELOCS=1. With checking enabled, we get: t.c:8:1: error: flow control insn inside a basic block (call_insn 97 96 98 4 (parallel [ (set (reg:DI 0 $0)

[Bug rtl-optimization/110307] ICE in move_insn, at haifa-sched.cc:5473 when building Ruby on alpha with -fPIC -O2 (or -fpeephole2 -fschedule-insns2)

2023-06-21 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110307 --- Comment #8 from Alexander Monakov --- REG_EH_REGION is handled further down that function, but copy_reg_eh_region_note_backward does not copy the note. Perhaps it needs diff --git a/gcc/except.cc b/gcc/except.cc index e728aa43b6..cfe140c4d0

[Bug rtl-optimization/110307] ICE in move_insn, at haifa-sched.cc:5473 when building Ruby on alpha with -fPIC -O2 (or -fpeephole2 -fschedule-insns2)

2023-06-22 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110307 --- Comment #10 from Alexander Monakov --- I think the first patch may result in duplicated notes, so I wouldn't recommend picking it.

[Bug tree-optimization/110369] wrong code on x86_64-linux-gnu

2023-06-22 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110369 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug rtl-optimization/110307] ICE in move_insn, at haifa-sched.cc:5473 when building Ruby on alpha with -fPIC -O2 (or -fpeephole2 -fschedule-insns2)

2023-06-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110307 --- Comment #13 from Alexander Monakov --- Note to self: check how control_flow_insn_p relates.

[Bug rtl-optimization/110237] gcc.dg/torture/pr58955-2.c is miscompiled by RTL scheduling after reload

2023-06-26 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237 --- Comment #11 from Alexander Monakov --- The trapping angle seems valid, but I have a really hard time understanding the DSE issue, and the preceding issue about disambiguation based on RTL aliasing. How would DSE optimize out 'd[5] = 1' in y

[Bug rtl-optimization/110237] gcc.dg/torture/pr58955-2.c is miscompiled by RTL scheduling after reload

2023-06-26 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237 --- Comment #13 from Alexander Monakov --- (In reply to rguent...@suse.de from comment #12) > As explained in comment#3 the issue is related to the tree alias oracle > part that gets invoked on the MEM_EXPR for the load where there is > no infor

[Bug target/110273] [12/13/14 Regression] i686-w64-mingw32 with -mavx512f generates AVX instructions without stack alignment

2023-06-26 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110273 --- Comment #8 from Alexander Monakov --- (In reply to Sam James from comment #7) > We keep getting quite a few reports of this downstream. Of this mingw32 stack realignment issue specifically, i.e. Wine breakage when AVX512 is enabled via CFLA

[Bug rtl-optimization/110237] gcc.dg/torture/pr58955-2.c is miscompiled by RTL scheduling after reload

2023-06-26 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237 --- Comment #16 from Alexander Monakov --- (In reply to rguent...@suse.de from comment #14) > vectors of T and scalar T interoperate TBAA wise. What we disambiguate is > > int a[2]; > > int foo(int *p) > { > a[0] = 1; > *(v4si *)p = {0,0,

[Bug rtl-optimization/110237] gcc.dg/torture/pr58955-2.c is miscompiled by RTL scheduling after reload

2023-06-26 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237 --- Comment #18 from Alexander Monakov --- (In reply to rguent...@suse.de from comment #17) > Yes, we do the same to loads. I hope that's not a common technique > though but I have to admit the vectorizer itself assesses whether it's > safe to

[Bug middle-end/110431] New: Incorrect disambiguation of wide accesess from store-merging or SLP

2023-06-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
-code Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Target Milestone: --- Inspired by bug 110237 comment 19: int b, a; int main() { int *pa = &a, *pb

[Bug rtl-optimization/110237] gcc.dg/torture/pr58955-2.c is miscompiled by RTL scheduling after reload

2023-06-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237 --- Comment #21 from Alexander Monakov --- (In reply to rguent...@suse.de from comment #19) > But the size argument doesn't have anything to do with TBAA (and > may_alias is about TBAA). I don't think we have any way to circumvent > C object ac

[Bug target/110438] New: generating all-ones zmm needs dep-breaking pxor before ternlog

2023-06-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Target Milestone: --- Target: x86_64-*-* VPTERNLOG is never a dependency-breaking instruction on existing x86 implementations, so generating a

[Bug target/110438] generating all-ones zmm needs dep-breaking pxor before ternlog

2023-06-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110438 --- Comment #1 from Alexander Monakov --- We might want to omit PXOR when optimizing for size.

[Bug rtl-optimization/110202] _mm512_ternarylogic_epi64 generates unnecessary operations

2023-06-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110202 --- Comment #7 from Alexander Monakov --- Note that vpxor serves as a dependency-breaking instruction (see PR 110438). So in negate1 we do the right thing for the wrong reasons, and in negate2 we can cause a substantial stall if the previous com

[Bug rtl-optimization/110202] _mm512_ternarylogic_epi64 generates unnecessary operations

2023-06-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110202 --- Comment #9 from Alexander Monakov --- (In reply to Hongtao.liu from comment #8) > > For this one, we can load *a into %zmm0 to avoid false_dependence. > > vmovdqau ZMMWORD PTR [rdi], zmm0 > vpternlogq zmm0, zmm0, zmm0, 85 Yes, since

[Bug target/110438] generating all-ones zmm needs dep-breaking pxor before ternlog

2023-07-04 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110438 --- Comment #3 from Alexander Monakov --- Patch available: https://inbox.sourceware.org/gcc-patches/8f73371d732237ed54ede44b7bd88...@ispras.ru/T/#u

[Bug target/110611] X86 is not honouring POINTERS_EXTEND_UNSIGNED in m32 code.

2023-07-10 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110611 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug target/110762] inappropriate use of SSE (or AVX) insns for v2sf mode operations

2023-07-21 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110762 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug target/110762] inappropriate use of SSE (or AVX) insns for v2sf mode operations

2023-07-21 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110762 --- Comment #14 from Alexander Monakov --- That seems undesirable in light of comment #4, you'd risk creating a situation when -fno-trapping-math is unpredictably slower when denormals appear in dirty upper halves.

[Bug sanitizer/110799] [tsan] False positive due to -fhoist-adjacent-loads

2023-07-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug sanitizer/110799] [tsan] False positive due to -fhoist-adjacent-loads

2023-07-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799 --- Comment #9 from Alexander Monakov --- (In reply to Tom de Vries from comment #7) > Can you elaborate on what you consider a correct approach? I think this optimization is incorrect and should be active only under -Ofast. I can offer two ar

[Bug rtl-optimization/110823] [missed optimization] >50% speedup for x86-64 ASCII processing a la GNU diffutils

2023-07-30 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110823 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug sanitizer/110799] [tsan] False positive due to -fhoist-adjacent-loads

2023-07-31 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799 --- Comment #16 from Alexander Monakov --- In C11 and C++11 the issue of compiler-introduced racing loads is discussed as follows (5.1.2.4 Multi-threaded executions and data races in C11): 28 NOTE 14 Transformations that introduce a speculative

[Bug target/110202] _mm512_ternarylogic_epi64 generates unnecessary operations

2023-08-05 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110202 Alexander Monakov changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug target/110926] [14 regression] Bootstrap failure (matmul_i1.c:1781:1: internal compiler error: RTL check: expected elt 0 type 'i' or 'n', have 'w' (rtx const_int) in vpternlog_redundant_operand_m

2023-08-06 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110926 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug other/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug ipa/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946 --- Comment #8 from Alexander Monakov --- Why? There's no bswap here, in particular mbedtls_put_unaligned_uint64 is a straightforward wrapper for memcpy: inline void mbedtls_put_unaligned_uint64(void *p, uint64_t x) { memcpy(p, &x, sizeof(x

[Bug ipa/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946 --- Comment #9 from Alexander Monakov --- (In reply to Alexander Monakov from comment #2) > Note that inline functions in mbedtls/library/alignment.h all miss the > 'static' qualifier, which affects inlining decisions, and looks like a > mistake

[Bug ipa/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946 --- Comment #10 from Alexander Monakov --- Ah, the non-static inlines are intentional, the corresponding extern declarations appear in library/platform_util.c. Sorry, I missed that file the first time around.

[Bug ipa/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946 --- Comment #11 from Alexander Monakov --- (In reply to Alexander Monakov from comment #8) > inline void mbedtls_put_unaligned_uint64(void *p, uint64_t x) > { > memcpy(p, &x, sizeof(x)); > } > > > We deciding to not inline this, while inli

[Bug target/110979] Miss-optimization for O2 fully masked loop on floating point reduction.

2023-08-11 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110979 --- Comment #2 from Alexander Monakov --- Yes, it is wrong-code to full extent. To demonstrate, you can initialize 'sum' and the array to negative zeroes: #define FLT double #define N 20 __attribute__((noipa)) FLT foo3 (FLT *a) { FLT sum =

[Bug middle-end/111009] [12/13/14 regression] -fno-strict-overflow erroneously elides null pointer checks and causes SIGSEGV on perf from linux-6.4.10

2023-08-14 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111009 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug rtl-optimization/111101] -finline-small-functions may invert FP arguments breaking FP bit accuracy in case of NaNs

2023-08-22 Thread amonakov at gcc dot gnu.org via Gcc-bugs
|--- |INVALID CC||amonakov at gcc dot gnu.org --- Comment #1 from Alexander Monakov --- 0x7fe5ed65 is a quiet NaN, not signaling (it differs from the input 0x7fa5ed65 sNaN by the leading mantissa bit 0x0040). IEEE-754 does not pin

[Bug rtl-optimization/111143] [missed optimization] unlikely code slows down diffutils x86-64 ASCII processing

2023-08-24 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug rtl-optimization/111143] [missed optimization] unlikely code slows down diffutils x86-64 ASCII processing

2023-08-26 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43 --- Comment #6 from Alexander Monakov --- Thanks. i5-1335U has two "performance cores" (with HT, four logical CPUs) and eight "efficiency cores". They have different micro-architecture. Are you binding the benchmark to some core in particular?

[Bug c/111210] Wrong code at -Os on x86_64-linux-gnu since r12-4849-gf19791565d7

2023-08-28 Thread amonakov at gcc dot gnu.org via Gcc-bugs
|UNCONFIRMED |RESOLVED CC||amonakov at gcc dot gnu.org --- Comment #1 from Alexander Monakov --- 'c' is called with 'd' pointing to 'long e[2]', so return *(int *)(d + 1); is an aliasing violation (dereferencin

[Bug c/111210] Wrong code at -Os on x86_64-linux-gnu since r12-4849-gf19791565d7

2023-08-28 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111210 --- Comment #4 from Alexander Monakov --- The testcase is small enough to notice the issue by inspection. Note that you get the "expected" answer with -fno-strict-aliasing, and as explained in https://gcc.gnu.org/bugs/ it is one of the things y

[Bug c/44179] warn about sizeof(char) and sizeof('x')

2023-12-16 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44179 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug middle-end/113082] builtin transforms do not honor errno

2023-12-19 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113082 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

<    6   7   8   9   10   11   12   >