[Bug target/88402] inefficient code generation for mask from CC

2018-12-07 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88402 --- Comment #3 from Alexander Monakov --- However, this may be worthwhile when one of operands is an immediate, as in that case there's no register pressure increase, and the xor just mutates the immediate. Essentially, we'd change e.g. (sign

[Bug rtl-optimization/88425] New: suboptimal code for a

2018-12-09 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88425 Bug ID: 88425 Summary: suboptimal code for a

[Bug target/88402] inefficient code generation for mask from CC

2018-12-09 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88402 --- Comment #4 from Alexander Monakov --- (In reply to Alexander Monakov from comment #3) > But unfortunately today we don't manage to use the cmp-sbb trick for > unsigned comparison against an immediate, i.e. for > > unsigned long baz (unsigned

[Bug rtl-optimization/84345] [8/9 Regression] ICE: qsort checking failed (error: qsort comparator non-negative on sorted output: 1)

2018-12-10 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84345 --- Comment #6 from Alexander Monakov --- I think gcc_qsort doesn't really change things here, validation failure implies a logic issue in the comparator, so some step is not always working as the author intended. And even with gcc_qsort it's st

[Bug c/88481] -O1 causes optimizer to drop 'then' clause in conditional

2018-12-13 Thread amonakov at gcc dot gnu.org
||2018-12-13 CC||amonakov at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Alexander Monakov --- Without -O, optimization passes are not enabled, even if individual -f options are passed on the

[Bug c/88481] -O1 causes optimizer to drop 'then' clause in conditional

2018-12-13 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88481 --- Comment #5 from Alexander Monakov --- The code shown in the opening comment looks fine to me, so please isolate the issue further using debug counters. Add -fdbg-cnt=if_conversion:99,if_after_combine:99 to -O1. This should lead to broken cod

[Bug c/88481] -O1 causes optimizer to drop 'then' clause in conditional

2018-12-13 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88481 --- Comment #6 from Alexander Monakov --- And just to be sure, can you confirm that -fno-if-conversion changes program behavior (the testcase is not executable so I cannot check), and the issue is not about debug info quality (i.e. that single-st

[Bug target/88425] suboptimal code for a

2018-12-13 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88425 --- Comment #4 from Alexander Monakov --- Thanks! Should this be closed as fixed now?

[Bug c/88481] -O1 causes optimizer to drop 'then' clause in conditional

2018-12-13 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88481 --- Comment #9 from Alexander Monakov --- Thanks. I still don't see what's wrong. Are you testing only by single-stepping in gdb, or does your program overall behave differently with/without if-conversion? In other words, do you see if-conversio

[Bug c/88568] [8/9 Regression] 'dllimport' no longer implies 'extern' in C

2018-12-24 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88568 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug rtl-optimization/88593] internal compiler error: in verify_dominators, at dominance.c:1184

2018-12-25 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88593 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug rtl-optimization/88593] internal compiler error: in verify_dominators, at dominance.c:1184

2018-12-25 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88593 --- Comment #4 from Alexander Monakov --- It seems to avoid this sort of gotchas cleanup_cfg should gcc_checking_assert (!dom_info_available_p (CDI_DOMINATORS)); gcc_checking_assert (!dom_info_available_p (CDI_POST_DOMINATORS)); but maybe t

[Bug c++/88600] GCC rejects attributes on type aliases, while clang accepts them

2018-12-26 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88600 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug c/88698] Relax generic vector conversions

2019-01-05 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88698 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug target/56028] New: Splitting a 64-bit volatile store

2013-01-18 Thread amonakov at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56028 Bug #: 56028 Summary: Splitting a 64-bit volatile store Classification: Unclassified Product: gcc Version: 4.7.2 URL: http://gcc.gnu.org/ml/gcc-patches/2013-01/msg00870.htm

[Bug middle-end/56077] [4.6/4.7/4.8 Regression] volatile ignored when function inlined

2013-02-04 Thread amonakov at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56077 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug target/56200] queens benchmark is faster with -O0 than with any other optimization level

2013-02-04 Thread amonakov at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56200 --- Comment #2 from Alexander Monakov 2013-02-04 21:36:38 UTC --- (In reply to comment #1) > What happens if you also use -fno-ivopts ? For me, -fno-ivopts gives a small improvement, but still slower than -O0. I think the slowdown is r

[Bug target/56200] queens benchmark is faster with -O0 than with any other optimization level

2013-02-05 Thread amonakov at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56200 Alexander Monakov changed: What|Removed |Added CC||hjl.tools at gmail dot com,

[Bug sanitizer/56393] SIGSEGV when -fsanitize=address and dynamic lib with global objects

2013-02-21 Thread amonakov at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56393 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug c/56507] GCC -march=native for Core2Duo

2013-03-04 Thread amonakov at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56507 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug other/39851] gcc -Q --help=target does not list extensions selected by -march=

2013-03-04 Thread amonakov at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39851 Alexander Monakov changed: What|Removed |Added CC||bratsinot at gmail dot com

[Bug tree-optimization/53265] Warn when undefined behavior implies smaller iteration count

2013-03-11 Thread amonakov at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53265 --- Comment #10 from Alexander Monakov 2013-03-11 16:15:36 UTC --- (In reply to comment #8) > Not sure about the warning wording What about (... "iteration %E invokes undefined behavior", max)? > plus no idea how to call the warning o

[Bug target/40735] memory hog compiling big functions with -fPIE

2012-08-28 Thread amonakov at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40735 --- Comment #18 from Alexander Monakov 2012-08-28 08:48:28 UTC --- (In reply to comment #17) > > richi, can you share this maxmem2 script? It's available on the wiki: http://gcc.gnu.org/wiki/PerformanceTesting

[Bug c++/55081] [4.8 regression?] Non-optimized static array elements initialization

2012-10-26 Thread amonakov at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55081 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug tree-optimization/55216] [4.8 Regression] Infinite loop generated on non-infinite code

2012-11-06 Thread amonakov at gcc dot gnu.org
||amonakov at gcc dot gnu.org Resolution||INVALID --- Comment #2 from Alexander Monakov 2012-11-06 15:05:40 UTC --- The code invokes undefined behavior and is invalid: accessing d[++k] implies that modified k is less than 16

[Bug tree-optimization/55216] [4.8 Regression] Infinite loop generated on non-infinite code

2012-11-06 Thread amonakov at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55216 --- Comment #3 from Alexander Monakov 2012-11-06 15:06:50 UTC --- > Enhancement request to produce a warning is filed as PR 52365. Correction: PR 53265.

[Bug target/65753] [i386] Incorrect tail call inhibition logic on i386 (32-bit)

2015-04-13 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65753 --- Comment #1 from Alexander Monakov --- Example testcase: void *lookup_f(void); void g() { void (*f)(void) = lookup_f(); f(); } With -O2 -fPIC, on x86-64 GCC produces the desired tail call: g: subq$8, %rsp calllookup_f@PL

[Bug target/65753] [i386] Incorrect tail call inhibition logic on i386 (32-bit)

2015-04-13 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65753 --- Comment #2 from Alexander Monakov --- For a simpler testcase: void g(void (*f)(void)) { f(); } gcc/cc1 -fPIC -O2 -m32: g: subl$12, %esp call*16(%esp) addl$12, %esp ret Here %ebx does not come into play at all

[Bug target/65753] [i386] Incorrect tail call inhibition logic on i386 (32-bit)

2015-04-14 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65753 --- Comment #4 from Alexander Monakov --- The check rejecting indirect calls was introduced with commit https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=069db0ae0a5b5b61d64731a94eea002fb3c9d901 (gcc-patches thread starting at https://gcc.gnu.org/ml/

[Bug rtl-optimization/48302] ICE: SIGSEGV in reposition_prologue_and_epilogue_notes (function.c:5662) with -fcrossjumping -fselective-scheduling2

2015-05-07 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48302 Alexander Monakov changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug rtl-optimization/48442] ICE: in init_seqno, at sel-sched.c:6767 with -Os -fselective-scheduling2 --param max-sched-extend-regions-iters=100

2015-05-07 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48442 Alexander Monakov changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug target/65753] [i386] Incorrect tail call inhibition logic on i386 (32-bit)

2015-05-11 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65753 --- Comment #5 from Alexander Monakov --- Author: amonakov Date: Mon May 11 16:10:24 2015 New Revision: 223005 URL: https://gcc.gnu.org/viewcvs?rev=223005&root=gcc&view=rev Log: PR target/65753 * config/i386/i386.c (ix86_function

[Bug target/65753] [i386] Incorrect tail call inhibition logic on i386 (32-bit)

2015-05-11 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65753 Alexander Monakov changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug tree-optimization/66512] New: PRE fails to optimize calls to pure functions in C++, ok in C

2015-06-11 Thread amonakov at gcc dot gnu.org
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Target Milestone: --- The following testcase: int p(void) __attribute__((const)); void g(int); void f() { for (;;) g(p()); } is optimized

[Bug tree-optimization/66512] PRE fails to optimize calls to pure functions in C++, ok in C

2015-06-15 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66512 --- Comment #2 from Alexander Monakov --- In that case I'd like to contribute a documentation patch to make that clear in the pure/const attribute information, but I need more explanation. I see that int p(void) __attribute__((const)); void f()

[Bug target/66655] New: [5.1 Regression] miscompilation due to ipa-ra on MinGW

2015-06-24 Thread amonakov at gcc dot gnu.org
Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Target Milestone: --- Target: x86_64-w64-mingw32 Created attachment 35846 --> https://gcc.gnu.org/bugzi

[Bug web/55237] Linkify r123456 in comments to point to SVN

2013-05-15 Thread amonakov at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55237 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug tree-optimization/57511] New: [4.8/4.9 Regression] Missing SCEV final value replacement

2013-06-03 Thread amonakov at gcc dot gnu.org
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org The following simple loop is no longer optimized out with 4.8 and 4.9: int main(int argc, char* argv[]) { int i, a = 0; for (i=0; i < 1

[Bug tree-optimization/57511] [4.8/4.9 Regression] Missing SCEV final value replacement

2013-06-03 Thread amonakov at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57511 --- Comment #1 from Alexander Monakov --- The loop invokes signed integer overflow, but changing 1 to 10 still keeps the missed optimization there.

[Bug target/57736] New: [4.8/4.9 Regression] ICE in emit_move_insn with __builtin_ia32_rdtsc

2013-06-27 Thread amonakov at gcc dot gnu.org
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org CC: uros at gcc dot gnu.org Regressed with r192589.

[Bug target/57736] [4.8/4.9 Regression] ICE in emit_move_insn with __builtin_ia32_rdtsc

2013-06-27 Thread amonakov at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57736 --- Comment #1 from Alexander Monakov --- Oops, submitted too soon. echo 'f(){__builtin_ia32_rdtsc();}' | gcc -xc - -S -o- : In function ‘f’: :1:25: internal compiler error: in emit_move_insn, at expr.c:3486 0x6d9c63 emit_move_insn(rtx_def*, rtx_

[Bug ipa/64420] New: LTO can miscompile IFUNCs designated via top-level asm

2014-12-27 Thread amonakov at gcc dot gnu.org
Component: ipa Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Minimal testcase: $ cat >test.c <

[Bug lto/57703] Assembler function definition moved to a different ltrans then call

2014-12-28 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57703 --- Comment #10 from Alexander Monakov --- Bug 64420, which was marked as a duplicate, presents an example where there's no diagnostics at build time (linking succeeds), but the resulting code is wrong and will fail at runtime. I believe the cor

[Bug rtl-optimization/60086] suboptimal asm generated for a loop (store/load false aliasing)

2014-02-07 Thread amonakov at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60086 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug rtl-optimization/60086] suboptimal asm generated for a loop (store/load false aliasing)

2014-02-07 Thread amonakov at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60086 --- Comment #9 from Alexander Monakov --- By "good code" I was referring to the fact that your 4.7 asm does not contain stack (%rbp) references in the vectorized loop. Historically, first scheduling (-fschedule-insns) was problematic for 32-bit x

[Bug c/36750] -Wmissing-field-initializers relaxation request

2014-04-14 Thread amonakov at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36750 --- Comment #7 from Alexander Monakov --- Nightstrike, is there a particular reason you want C++ warning behavior be adjusted? Note that unlike C, in C++ you get zero-initialization by default, so you don't need to write ' = {0};' to zero-initial

[Bug c/36750] -Wmissing-field-initializers relaxation request

2014-04-14 Thread amonakov at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36750 --- Comment #9 from Alexander Monakov --- My statement about zero-initialization was inaccurate (thanks), but the general point still stands: in C you have to write ' = {0}' since empty-braces initializer is not supported by the language (you get

[Bug ipa/61144] [4.9/4.10 Regression] Invalid optimizations for extern vars with local weak definitions

2014-07-14 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61144 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug ipa/61144] [4.9/4.10 Regression] Invalid optimizations for extern vars with local weak definitions

2014-07-14 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61144 Alexander Monakov changed: What|Removed |Added Attachment #32830|0 |1 is obsolete|

[Bug target/62011] False Data Dependency in popcnt instruction

2014-08-05 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62011 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug c++/104631] Visibility of static member s yields duplicate symbols.

2022-04-22 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104631 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug target/105504] New: Fails to break dependency for vcvtss2sd xmm, xmm, mem

2022-05-06 Thread amonakov at gcc dot gnu.org via Gcc-bugs
: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Target Milestone: --- Created attachment 52933 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52933&action=edit testcase Hit

[Bug rtl-optimization/105513] New: [9/10/11/12/13 Regression] Unnecessary SSE spill

2022-05-07 Thread amonakov at gcc dot gnu.org via Gcc-bugs
: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Target Milestone: --- Target: x86_64-*-* i?86-*-* Minimized from PR 105504. Compile with -O2 -mtune=haswell -mavx (other

[Bug target/105504] Fails to break dependency for vcvtss2sd xmm, xmm, mem

2022-05-07 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105504 --- Comment #5 from Alexander Monakov --- The strange xmm0 spill issue may affect more code, so I reported an isolated testcase: PR 105513 (regression vs. gcc-8, the complete testcase in this PR also does not spill with gcc-8).

[Bug target/105513] [9/10/11/12/13 Regression] Unnecessary SSE spill since r9-5748-g1d4b4f4979171ef0

2022-05-20 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105513 --- Comment #7 from Alexander Monakov --- The second sequence is 3 uops vs 1/2 (issued/executed) uops in first, and on Haswell and Skylake it ties up port 5 for two cycles. Unclear if you're microbenchmarking latency or throughput, but in any c

[Bug target/61810] init-regs.c papers over issues elsewhere

2022-05-20 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61810 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug middle-end/21111] IA-64 NaT consumption faults due to uninitialized register reads

2021-10-11 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=2 --- Comment #18 from Alexander Monakov --- >From my perspective, the main blocker for a nice and clean solution is lack of "birth" statements on GIMPLE. Without them, expansion to RTL would either need to insert initialization at the top of the

[Bug target/93934] Unnecessary fld of uninitialized float stack variable results in ub of valid C++ code

2021-10-13 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93934 --- Comment #14 from Alexander Monakov --- Zoltan, excuse me, could you please clarify what specifically you are worried about? Your bug title says "results in UB" and the opening comment said "the behavior [..] is unpredictable", but as far as I

[Bug bootstrap/91972] Bootstrap should use -Wmissing-declarations

2021-11-30 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91972 --- Comment #7 from Alexander Monakov --- As I understand, only the gcc subdirectory changed implementation language from C to C++, so, yes (as far as this bug is concerned).

[Bug target/97127] FMA3 code transformation leads to slowdown on Skylake

2020-09-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97127 --- Comment #16 from Alexander Monakov --- Mostly because prior to register allocation the compiler does not naturally see that x = *mem + a*b will need an extra mov when both 'a' and 'b' are live (as in that case registers allocated for them can

[Bug target/97127] FMA3 code transformation leads to slowdown on Skylake

2020-09-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97127 --- Comment #17 from Alexander Monakov --- To me this suggests that in fact it's okay to carry the combined form in RTL up to register allocation, but RA should decompose it to load+fma instead of inserting a register copy that preserves the live

[Bug target/97194] optimize vector element set/extract at variable position

2020-09-28 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug target/97194] optimize vector element set/extract at variable position

2020-09-28 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194 --- Comment #9 from Alexander Monakov --- (In reply to Richard Biener from comment #8) > Note that currently RTL expansion forces a local vector typed variable > to the stack (instead of allocating a pseudo) when there are > variable-index access

[Bug target/97194] optimize vector element set/extract at variable position

2020-09-28 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194 --- Comment #11 from Alexander Monakov --- Yeah, for inserts such tactic would be inappropriate due to bad store forwarding stalls anyway. As you've shown in earlier comments, inserts have a very nice generic way to expand them (that does not tou

[Bug target/97194] optimize vector element set/extract at variable position

2020-09-28 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194 --- Comment #14 from Alexander Monakov --- I see, there are more weaknesses than I thought. For CSE (or rather fwprop?) I was thinking about a simpler case where the extracted-from value is loaded from memory, but even in trivial cases RTL optimi

[Bug libgomp/97291] [SIMT] Move SIMT_XCHG_* out of non-uniform execution region

2020-10-05 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97291 --- Comment #1 from Alexander Monakov --- Reshuffling statements and piling up extra abstraction doesn't help solve the core issue that GIMPLE passes can duplicate any basic block, but basic blocks of SIMT loop epilogue should be protected from t

[Bug middle-end/95189] [9/10 Regression] memcmp being wrongly stripped like strcmp

2020-10-05 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95189 Alexander Monakov changed: What|Removed |Added Known to fail||9.3.0 Known to work|9.3.0

[Bug target/97203] [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call

2020-10-09 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug target/97366] [8/9/10/11 Regression] Redundant load with SSE/AVX vector intrinsics

2020-10-11 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97366 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug target/97203] [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call

2020-10-12 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203 --- Comment #8 from Alexander Monakov --- No, -msoft-stack-reserve-local is really meant to be in bytes: it may not exceed the amount of .local memory reserved by CUDA driver (which is just 1-2 KB, unless overridden via cuCtxSetLimit, which nvptx

[Bug target/97366] [8/9/10/11 Regression] Redundant load with SSE/AVX vector intrinsics

2020-10-12 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97366 --- Comment #5 from Alexander Monakov --- afaict LRA is just following IRA decisions, and IRA allocates that pseudo to memory due to costs. Not sure where strange cost is coming from, but it depends on x86 tuning options: with -mtune=skylake we

[Bug target/97203] [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call

2020-10-12 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203 --- Comment #11 from Alexander Monakov --- Yes, that.

[Bug inline-asm/97708] Inline asm does not use the local register asm specified with register ... asm() as input

2020-11-05 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97708 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug target/97734] GCC using branches when a conditional move would be better

2020-11-06 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97734 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug rtl-optimization/86096] [8 Regression] ICE: qsort checking failed (error: qsort comparator non-negative on sorted output: 0)

2021-02-16 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86096 --- Comment #8 from Alexander Monakov --- It was fixed on the trunk only, so as the title says it remains an issue on the gcc-8 branch (which is still open). Bugzilla doesn't have separate resolutions for different branches, we cannot have this "

[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af

2021-03-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug rtl-optimization/99462] Enhance scheduling to split instructions

2021-03-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99462 --- Comment #3 from Alexander Monakov --- (for context, the above patch was for PR 98856, but it's based on incorrect latency analysis, see bug 98856 comment #38 ) Right now schedulers cannot easily split instructions for that purpose, it would

[Bug rtl-optimization/99469] ICE: qsort checking failed with selective scheduling on aarch64

2021-03-09 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99469 Alexander Monakov changed: What|Removed |Added Blocks||82407 --- Comment #2 from Alexander

[Bug middle-end/99619] New: fails to infer local-dynamic TLS model from hidden visibility

2021-03-16 Thread amonakov at gcc dot gnu.org via Gcc-bugs
-optimization Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Target Milestone: --- Thread-local variables with hidden visibility don't need to use the "general-dy

[Bug c++/99728] code pessimization when using wrapper classes around SIMD types

2021-03-23 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99728 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug target/99582] No intrinsics to access rcl or rcr instruction on x86_64

2021-03-23 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99582 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug inline-asm/97708] Inline asm does not use the local register asm specified with register ... asm() as input

2020-11-06 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97708 --- Comment #30 from Alexander Monakov --- Asm operand binding should work by looking at bound lvalue: "c"(a) binds an lvalue so if 'a' is a register var the compiler must remember its associated register; "c"(a+0) binds an rvalue, so what kind o

[Bug libstdc++/98226] Slow std::countr_one

2020-12-11 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98226 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug libgomp/98258] Can't compile programs for both OpenMP (CPU) + OpenACC (GPU)

2021-01-04 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98258 --- Comment #5 from Alexander Monakov --- One possible solution is -foffload=-fno-openmp Another possible solution is separate compilation and linking, with only OpenACC enabled at link step (needs explicit -lgomp): gfortran -fopenmp -fopenacc

[Bug libgomp/98258] Can't compile programs for both OpenMP (CPU) + OpenACC (GPU)

2021-01-04 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98258 --- Comment #8 from Alexander Monakov --- (In reply to Chinoune from comment #7) > $ gfortran-10 -O3 -fopenmp -fopenacc -c bug_omp_acc.f90 > $ gfortran-10 bug_omp_acc.o -lgomp -o test.x Contrary to my suggestion, you have omitted -fopenacc from

[Bug libgomp/98258] Can't compile programs for both OpenMP (CPU) + OpenACC (GPU)

2021-01-05 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98258 --- Comment #10 from Alexander Monakov --- Thanks for checking. As for this: > Please, stop suggesting untested workarounds. Yes, I should have mentioned those are untested. I was typing the response late at night without access to offloading-c

[Bug tree-optimization/98906] New: [8/9/10/11 Regression] Miscompiles code even at -O1

2021-01-31 Thread amonakov at gcc dot gnu.org via Gcc-bugs
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Target Milestone: --- Created attachment 50097 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50097&action=edit testcase The a

[Bug tree-optimization/98906] [8/9/10/11 Regression] Miscompiles code even at -O1

2021-02-01 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98906 --- Comment #6 from Alexander Monakov --- Ah, -fsanitize=float-cast-overflow catches it, but it needs to be enabled explicitly (not implied by -fsanitize=undefined). Thank you!

[Bug hsa/86948] Internal compiler error compiling brig.dg/test/gimple/mulhi.hsail

2021-12-24 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86948 --- Comment #8 from Alexander Monakov --- How does your patch expand 64-bit highpart multiply (i.e. with 128-bit full product) on 32-bit targets?

[Bug ipa/95558] [9/10/11/12 Regression] Invalid IPA optimizations based on weak definition

2022-01-17 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95558 --- Comment #10 from Alexander Monakov --- As comment #5 mentioned, it is still broken, you just need -fno-inline in addition to -O2 for the original testcase. Andrew's remark is quite useful for situations like this, you know :)

[Bug target/106902] [11/12/13 Regression] Program compiled with -O3 -mfma produces different result

2022-09-14 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug target/106902] [11/12/13 Regression] Program compiled with -O3 -mfma produces different result

2022-09-15 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902 --- Comment #7 from Alexander Monakov --- Lawrence, thank you for the nice work reducing the testcase. For RawTherapee the recommended course of action would be to compile everything with -ffp-contract=off, then manually reintroduce use of fma i

[Bug target/106952] Missed optimization: x < y ? x : y not lowered to minss

2022-09-15 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106952 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug target/106902] [11/12/13 Regression] Program compiled with -O3 -mfma produces different result

2022-09-19 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902 --- Comment #11 from Alexander Monakov --- Can we move -ffp-contract=fast under the -ffast-math umbrella and default to -ffp-contract=on/off? Isn't it easy now to implement -ffp-contract=on by a GENERIC-only match.pd rule?

[Bug target/106902] [11/12/13 Regression] Program compiled with -O3 -mfma produces different result

2022-09-19 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902 --- Comment #13 from Alexander Monakov --- (In reply to Richard Biener from comment #12) > > Isn't it easy now to implement -ffp-contract=on by a GENERIC-only match.pd > > rule? > > You mean in the frontend only for -ffp-contract=on? Yes. >

[Bug lto/107014] flatten+lto fails the kernel build

2022-09-23 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107014 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org

[Bug lto/107014] flatten+lto fails the kernel build

2022-09-23 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107014 --- Comment #5 from Alexander Monakov --- (In reply to Jiri Slaby from comment #4) > > I am surprised that "flatten" blows up on this function. Is that with any > > config, or again some specific settings like gcov? Is there an existing lkml > >

[Bug lto/107014] flatten+lto fails the kernel build

2022-09-23 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107014 --- Comment #7 from Alexander Monakov --- I wanted to understand what gets exposed in LTO mode that causes a blowup. I'd say flatten is not appropriate for this function (I don't think you want to force inlining of memset or _find_next_bit?), s

[Bug target/106902] [11/12/13 Regression] Program compiled with -O3 -mfma produces different result

2022-09-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902 --- Comment #15 from Alexander Monakov --- (In reply to Richard Biener from comment #14) > I can't > seem to reproduce any vectorization for your smaller example though. My small C samples omit some detail as they were meant to illustrate what

[Bug target/106902] [11/12/13 Regression] Program compiled with -O3 -mfma produces different result

2022-09-29 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902 --- Comment #17 from Alexander Monakov --- (In reply to Richard Biener from comment #16) > I do think that since the only way to > preserve expression boundaries is by PAREN_EXPR Yes, but... > that the middle-end > shouldn't care about FAST v

<    3   4   5   6   7   8   9   10   11   >