[Bug target/100637] [i386] Vectorize 4-byte vectors

2021-05-17 Thread ubizjak at gmail dot com via Gcc-bugs
at gcc dot gnu.org |ubizjak at gmail dot com Last reconfirmed||2021-05-17 Ever confirmed|0 |1 --- Comment #1 from Uroš Bizjak --- Created attachment 50822 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50822=edit Pa

[Bug target/100637] New: [i386] Vectorize 4-byte vectors

2021-05-17 Thread ubizjak at gmail dot com via Gcc-bugs
Assignee: unassigned at gcc dot gnu.org Reporter: ubizjak at gmail dot com Target Milestone: --- Following testcases involving 4 byte vectors, e.g.: typedef char __v4qi __attribute__ ((__vector_size__ (4))); __v4qi foo (__v4qi a, __v4qi b, __v4qi c) { return (a & ~b) + c; }

[Bug target/100626] [11/12 Regression] ICE Segmentation fault (during RTL pass: split1) since r11-165-geb72dc663e9070b2

2021-05-17 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100626 --- Comment #3 from Uroš Bizjak --- *di3_doubleword calls split_double_mode with: op0: (subreg:DI (reg/v:SI 89 [ li_18 ]) 0) op1: (reg:DI 90 [ uc_4 ]) op2: (mem/c:DI (plus:SI (reg/f:SI 19 frame) (const_int -4 [0xfffc]))

[Bug target/98218] [TARGET_MMX_WITH_SSE] Implement 64bit vector compares (AVX512 masked compares missing)

2021-05-13 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98218 --- Comment #16 from Uroš Bizjak --- (In reply to David Binderman from comment #15) > Bug first appears sometime between git hash 21dfb22920ce32fc, > dated yesterday and git hash 097fde5e7514e909, dated today. Fixed by PR100581.

[Bug target/100581] [12 Regression] ICE in extract_insn, at recog.c:2770 since r12-731-gb1f7fd8a2a5558da

2021-05-13 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100581 Uroš Bizjak changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug target/100581] [12 Regression] ICE in extract_insn, at recog.c:2770 since r12-731-gb1f7fd8a2a5558da

2021-05-13 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100581 --- Comment #3 from Uroš Bizjak --- (In reply to Alex Coplan from comment #1) > Is it valid to create a vector type with total size less than the element > size? Shouldn't this be rejected? No, the generated code is: vmovq

[Bug target/100581] [12 Regression] ICE in extract_insn, at recog.c:2770 since r12-731-gb1f7fd8a2a5558da

2021-05-13 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100581 Uroš Bizjak changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com

[Bug target/98218] [TARGET_MMX_WITH_SSE] Implement 64bit vector compares (AVX512 masked compares missing)

2021-05-12 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98218 --- Comment #13 from Uroš Bizjak --- (In reply to Uroš Bizjak from comment #12) > Yeah, this is a non-existent SSE "cmove". I tried to find all paths where > this should divert to a sequence of logic instructions or PBLENDB, but due > to

[Bug target/98218] [TARGET_MMX_WITH_SSE] Implement 64bit vector compares (AVX512 masked compares missing)

2021-05-12 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98218 --- Comment #12 from Uroš Bizjak --- (In reply to David Binderman from comment #11) > I might be seeing something similar: > > caxcpy.f: In function 'caxcpy': > caxcpy.f:53:72: error: unrecognizable insn: >53 | end subroutine >

[Bug target/98218] [TARGET_MMX_WITH_SSE] Implement 64bit vector compares (AVX512 masked compares missing)

2021-05-12 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98218 Uroš Bizjak changed: What|Removed |Added Assignee|ubizjak at gmail dot com |unassigned at gcc dot gnu.org

[Bug other/98375] [meta bug] GCC 12 pending patches

2021-05-12 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98375 Bug 98375 depends on bug 98218, which changed state. Bug 98218 Summary: [TARGET_MMX_WITH_SSE] Implement 64bit vector compares (AVX512 masked compares missing) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98218 What|Removed

[Bug target/98218] [TARGET_MMX_WITH_SSE] Implement 64bit vector compares (AVX512 masked compares missing)

2021-05-12 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98218 Uroš Bizjak changed: What|Removed |Added Summary|[TARGET_MMX_WITH_SSE] Miss |[TARGET_MMX_WITH_SSE]

[Bug target/100461] [11/12 Regression] mingw build broken due to change of rdtsc implementation

2021-05-06 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100461 Uroš Bizjak changed: What|Removed |Added CC||hjl.tools at gmail dot com --- Comment

[Bug target/100445] [12 Regression] ice during RTL pass: vregs

2021-05-06 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100445 --- Comment #10 from Uroš Bizjak --- Following patch fixes the failures: --cut here-- diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index 4dfe7d6c282..61b2f921f41 100644 --- a/gcc/config/i386/i386-expand.c +++

[Bug target/100445] [12 Regression] ice during RTL pass: vregs

2021-05-06 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100445 --- Comment #9 from Uroš Bizjak --- ix86_use_mask_cmp_p should be refined, it has an early return for 64bit modes: if (GET_MODE_SIZE (mode) == 64) return true;

[Bug target/100445] [12 Regression] ice during RTL pass: vregs

2021-05-06 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100445 --- Comment #6 from Uroš Bizjak --- (In reply to Uroš Bizjak from comment #5) > ix86_expand_sse_movcc has special TARGET_XOP path, so the following patch is > needed: Ah, you beat me by the second ;) Anyway, I have no XOP target, so probably

[Bug target/100445] [12 Regression] ice during RTL pass: vregs

2021-05-06 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100445 --- Comment #5 from Uroš Bizjak --- ix86_expand_sse_movcc has special TARGET_XOP path, so the following patch is needed: diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md index 347295afbb5..667dd057e0d 100644 ---

[Bug target/98218] [TARGET_MMX_WITH_SSE] Miss vec_cmpmn/vcondmn expander for 64bit vector

2021-05-05 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98218 Uroš Bizjak changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug other/98375] [meta bug] GCC 12 pending patches

2021-05-05 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98375 Bug 98375 depends on bug 98218, which changed state. Bug 98218 Summary: [TARGET_MMX_WITH_SSE] Miss vec_cmpmn/vcondmn expander for 64bit vector https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98218 What|Removed

[Bug rtl-optimization/100342] [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2

2021-05-04 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100342 --- Comment #8 from Uroš Bizjak --- FYI, this whole analysis was done with Fedora 33 system compiler: gcc version 10.3.1 20210422 (Red Hat 10.3.1-1) (GCC)

[Bug rtl-optimization/100342] [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2

2021-05-04 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100342 --- Comment #7 from Uroš Bizjak --- I have traced a bit where (insn 2275) and (insn 2287) come from. In _.ira, we have: 613: r125:QI=r2067:DI#0 ... 659: zero_extract(r2080:DI,0x8,0x8)=r125:QI#0 And in _.reload, a DImode reload is

[Bug rtl-optimization/100342] [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2

2021-05-04 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100342 --- Comment #5 from Uroš Bizjak --- The problem can be seen in _.pro_and_epilogue pass: Starting with: _.cmpelim 2741: r14:DI=[sp:DI+0x38] ... 368: di:DI=r14:DI ... 613: si:QI=r14:QI ... 2737: bp:DI=r14:DI ... 658:

[Bug rtl-optimization/100342] [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2

2021-05-04 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100342 --- Comment #4 from Uroš Bizjak --- The problematic insn is: 401cec: 44 89 f6mov%r14d,%esi This one should be 64 bit wide, movl%r14d, %esi # 613 [c=4 l=3] *movqi_internal/2 but is actually a

[Bug rtl-optimization/100342] [10/11 Regression] wrong code with -O2 -fno-dse -fno-forward-propagate -mno-sse2

2021-05-04 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100342 --- Comment #3 from Uroš Bizjak --- For some reason the *input* value at BSWAP insn is truncated to 32bits. v256u128 v256u128_1 = SHLV (SHLSV (__builtin_bswap64 (u128_0), (v256u128) (0 < v256u128_0)) <= 0, v256u128_0); u128_0

[Bug testsuite/100355] gcc.c-torture/execute/ieee/cdivchkld.c needs fmaxl

2021-05-04 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100355 --- Comment #3 from Uroš Bizjak --- (In reply to Christophe Lyon from comment #2) > Tried that, but it's not taken into account. > > ieee.exp uses c-torture-execute, maybe that function does not honor dg > directives? (none of the tests under

[Bug other/98375] [meta bug] GCC 12 pending patches

2021-04-30 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98375 Bug 98375 depends on bug 98060, which changed state. Bug 98060 Summary: Failure to optimize cmp+setnb+add to cmp+sbb https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98060 What|Removed |Added

[Bug target/98060] Failure to optimize cmp+setnb+add to cmp+sbb

2021-04-30 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98060 Uroš Bizjak changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug target/100312] __builtin_ia32_maskloadpd256 and friends should be pure

2021-04-29 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100312 Uroš Bizjak changed: What|Removed |Added Assignee|rguenth at gcc dot gnu.org |ubizjak at gmail dot com

[Bug target/100182] [8/9/10/11/12 Regression] Miscompilation of atomic_float/1.cc and atomic_float/wait_notify.cc on i686

2021-04-28 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100182 Uroš Bizjak changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED

[Bug target/82735] _mm256_zeroupper does not invalidate previously computed registers

2021-04-28 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82735 --- Comment #11 from Uroš Bizjak --- (In reply to Uroš Bizjak from comment #9) > (In reply to Richard Biener from comment #4) > > Indeed as far as I understand an unspec volatile isn't sth clobbering > > registers (not even memory?!). The insn

[Bug target/82735] _mm256_zeroupper does not invalidate previously computed registers

2021-04-28 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82735 --- Comment #9 from Uroš Bizjak --- (In reply to Richard Biener from comment #4) > Indeed as far as I understand an unspec volatile isn't sth clobbering > registers (not even memory?!). The insn is missing inputs/outputs > (we might be able to

[Bug target/82735] _mm256_zeroupper does not invalidate previously computed registers

2021-04-28 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82735 --- Comment #8 from Uroš Bizjak --- (In reply to Hongtao.liu from comment #7) > Confirmed, let me fix this. Please note that the current definition of vzeroupper does not model effects of the instruction at all. The current definition is

[Bug target/100041] ICE in curr_insn_transform, at lra-constraints.c:4022

2021-04-24 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100041 Uroš Bizjak changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED

[Bug target/100182] [8/9/10/11/12 Regression] Miscompilation of atomic_float/1.cc and atomic_float/wait_notify.cc on i686

2021-04-23 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100182 Uroš Bizjak changed: What|Removed |Added Attachment #50649|0 |1 is obsolete|

[Bug target/100182] [8/9/10/11/12 Regression] Miscompilation of atomic_float/1.cc and atomic_float/wait_notify.cc on i686

2021-04-23 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100182 --- Comment #17 from Uroš Bizjak --- (In reply to Uroš Bizjak from comment #16) > (In reply to Jakub Jelinek from comment #15) > > Yes, but do they preserve all the bits and never modify any bit patterns, > > including qNaNs and sNaNs? I

[Bug target/100182] [8/9/10/11/12 Regression] Miscompilation of atomic_float/1.cc and atomic_float/wait_notify.cc on i686

2021-04-23 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100182 --- Comment #16 from Uroš Bizjak --- (In reply to Jakub Jelinek from comment #15) > Yes, but do they preserve all the bits and never modify any bit patterns, > including qNaNs and sNaNs? I thought the point of using the fistp was that > it

[Bug target/100182] [8/9/10/11/12 Regression] Miscompilation of atomic_float/1.cc and atomic_float/wait_notify.cc on i686

2021-04-23 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100182 --- Comment #14 from Uroš Bizjak --- (In reply to Uroš Bizjak from comment #13) > DFmode loads and stores *are* atomic, this is what the optimization is based > on. Loads and stores to/from x87 and SSE registers, to be clear.

[Bug target/100182] [8/9/10/11/12 Regression] Miscompilation of atomic_float/1.cc and atomic_float/wait_notify.cc on i686

2021-04-23 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100182 --- Comment #13 from Uroš Bizjak --- (In reply to Jakub Jelinek from comment #12) > They do. Though, in the combined patch I'm still a little bit worried about > the first 4 modified peephole2s, the last 4 look good to me. > The last 4 are

[Bug target/100182] [8/9/10/11/12 Regression] Miscompilation of atomic_float/1.cc and atomic_float/wait_notify.cc on i686

2021-04-23 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100182 --- Comment #11 from Uroš Bizjak --- Jakub, do these two patches fix your failures?

[Bug target/100182] [8/9/10/11/12 Regression] Miscompilation of atomic_float/1.cc and atomic_float/wait_notify.cc on i686

2021-04-22 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100182 --- Comment #10 from Uroš Bizjak --- (In reply to Uroš Bizjak from comment #9) > (In reply to Jakub Jelinek from comment #8) > > I think there are 8 those peephole2s rather than just 4 (I've been looking > > for > > rtx_equal_p (XEXP.*, 0) in

[Bug target/100182] [8/9/10/11/12 Regression] Miscompilation of atomic_float/1.cc and atomic_float/wait_notify.cc on i686

2021-04-22 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100182 --- Comment #9 from Uroš Bizjak --- (In reply to Jakub Jelinek from comment #8) > I think there are 8 those peephole2s rather than just 4 (I've been looking > for > rtx_equal_p (XEXP.*, 0) in sync.md No, the other are not problematic.

[Bug target/100182] [8/9/10/11/12 Regression] Miscompilation of atomic_float/1.cc and atomic_float/wait_notify.cc on i686

2021-04-22 Thread ubizjak at gmail dot com via Gcc-bugs
dot gnu.org| Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com --- Comment #7 from Uroš Bizjak --- (In reply to Jakub Jelinek from comment #1) > In this particular case it is the sync.md:398 peephole2: > (define_peephole2 > [(set (match_ope

[Bug target/100119] [x86] Conversion unsigned int -> double produces -0 (-m32 -msse2 -mfpmath=sse)

2021-04-19 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100119 --- Comment #2 from Uroš Bizjak --- diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index dda08ff67f2..5a7a00c13bd 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -1550,6 +1550,8 @@

[Bug target/100041] ICE in curr_insn_transform, at lra-constraints.c:4022

2021-04-12 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100041 Uroš Bizjak changed: What|Removed |Added Target Milestone|11.0|12.0 --- Comment #20 from Uroš Bizjak

[Bug target/100041] ICE in curr_insn_transform, at lra-constraints.c:4022

2021-04-12 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100041 --- Comment #18 from Uroš Bizjak --- (In reply to Jakub Jelinek from comment #17) > Can we go with #c15 for GCC11 and do #c16 for GCC12? I'd like to kill the option for GCC11, and the solution is safer than #c15.

[Bug target/100041] ICE in curr_insn_transform, at lra-constraints.c:4022

2021-04-12 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100041 Uroš Bizjak changed: What|Removed |Added Target|x86_64-linux-musl |x86_64 Target Milestone|---

[Bug target/100041] ICE in curr_insn_transform, at lra-constraints.c:4022

2021-04-12 Thread ubizjak at gmail dot com via Gcc-bugs
at gcc dot gnu.org |ubizjak at gmail dot com --- Comment #16 from Uroš Bizjak --- Created attachment 50568 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50568=edit Proposed patch Attached patch disables -m96bit-long-double for 64-bit targets.

[Bug target/100041] ICE in curr_insn_transform, at lra-constraints.c:4022

2021-04-12 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100041 --- Comment #15 from Uroš Bizjak --- (In reply to Richard Biener from comment #12) > A possible solution might be to disallow the -m64 -m96bit-long-double > combination, the documentation suggests -m128bit-long-double was intended > as an

[Bug target/100041] ICE in curr_insn_transform, at lra-constraints.c:4022

2021-04-12 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100041 --- Comment #13 from Uroš Bizjak --- See PR79514.

[Bug target/100021] [9/10/11 Regression] std::clamp unprofitable vectorization on -march=nehalem/.../broadwell

2021-04-11 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100021 --- Comment #2 from Uroš Bizjak --- Also, you are passing -march=sandybridge, but the profiler seems to show Skylake (SKX) target. The STV pass heavily depends on target costs, and when -march=skylake is passed, the conversion is avoided.

[Bug target/100021] [9/10/11 Regression] std::clamp unprofitable vectorization on -march=nehalem/.../broadwell

2021-04-11 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100021 --- Comment #1 from Uroš Bizjak --- This is not vectorization, but the compiler uses vector registers to perform scalar operations. This is STV (scalar-to-vector) pass in action, you can use -mno-stv to avoid transformation. The transformation

[Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3

2021-04-07 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930 --- Comment #6 from Uroš Bizjak --- (In reply to Jakub Jelinek from comment #4) > Is there some reason why the patterns are written that way rather than split > immediately into the AND or XOR? Perhaps it could be done on SUBREGs to > make it

[Bug target/99652] inline doesn't with -mno-sse

2021-03-18 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99652 --- Comment #5 from Uroš Bizjak --- inline long double foo (void) { return 1.0; } gcc -S -O2 -mno-80387 double.c double.c: In function ‘foo’: double.c:3:1: error: x87 register return with x87 disabled 3 | { | ^

[Bug c++/99601] [11 regression] g++.dg/modules/iostream-1_b.C on x86_64 with -m32

2021-03-15 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99601 --- Comment #3 from Uroš Bizjak --- (In reply to CVS Commits from comment #1) > The master branch has been updated by Nathan Sidwell : > > https://gcc.gnu.org/g:770d3487ef18a71f65626c182625889eee29f580 There is a typo in the selector: +// {

[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af

2021-03-05 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856 --- Comment #34 from Uroš Bizjak --- (In reply to rguent...@suse.de from comment #32) > what about reload_completed? We really only want to do this after RA. No need for it, this is peephole2 pass that *always* runs after reload.

[Bug target/99405] Rotate with mask not optimized on x86 for QI/HImode rotates

2021-03-05 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99405 --- Comment #2 from Uroš Bizjak --- (In reply to Jakub Jelinek from comment #1) > Created attachment 50306 [details] > gcc11-pr99405.patch > > Untested fix. - (match_operand:SI 2 "register_operand" "c") +

[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af

2021-03-05 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856 --- Comment #31 from Uroš Bizjak --- (In reply to Richard Biener from comment #29) > The simplified variant below works but IMHO matches cases we do not > want to transform. I can't find any example on how to achieve that > though. I think

[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af

2021-03-05 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856 --- Comment #28 from Uroš Bizjak --- (In reply to Uroš Bizjak from comment #27) > (In reply to Richard Biener from comment #26) > > but that doesn't seem to match for some unknown reason. > Try this: The latency problem with the original

[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af

2021-03-05 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856 --- Comment #27 from Uroš Bizjak --- (In reply to Richard Biener from comment #26) > but that doesn't seem to match for some unknown reason. Try this: (define_peephole2 [(match_scratch:DI 5 "Yv") (set (match_operand:DI 0

[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af

2021-03-05 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856 --- Comment #24 from Uroš Bizjak --- (In reply to Richard Biener from comment #22) > That works to avoid the vpinsrq. I guess the case of a mem operand > behaves similar to a gpr (plus the load uop), at least I don't have any > contrary

[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af

2021-03-04 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856 --- Comment #21 from Uroš Bizjak --- (In reply to Uroš Bizjak from comment #20) > (In reply to Richard Biener from comment #18) > > Even on Skylake it's 2 (movq) + 3 (vpinsr), so there it's 6 vs. 3. Not > > sure if we should somehow do this

[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af

2021-03-04 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856 --- Comment #20 from Uroš Bizjak --- (In reply to Richard Biener from comment #18) > Even on Skylake it's 2 (movq) + 3 (vpinsr), so there it's 6 vs. 3. Not > sure if we should somehow do this late somehow (peephole or splitter) since > it

[Bug middle-end/26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

2021-02-25 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 Bug 26163 depends on bug 99083, which changed state. Bug 99083 Summary: Big run-time regressions of 519.lbm_r with LTO https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99083 What|Removed |Added

[Bug target/99083] Big run-time regressions of 519.lbm_r with LTO

2021-02-25 Thread ubizjak at gmail dot com via Gcc-bugs
||patch Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com Resolution|FIXED |--- --- Comment #13 from Uroš Bizjak --- (In reply to Martin Jambor from comment #12) > For the record, I have benchmarked the patches f

[Bug middle-end/26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

2021-02-21 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 Bug 26163 depends on bug 99083, which changed state. Bug 99083 Summary: Big run-time regressions of 519.lbm_r with LTO https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99083 What|Removed |Added

[Bug target/99083] Big run-time regressions of 519.lbm_r with LTO

2021-02-21 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99083 Uroš Bizjak changed: What|Removed |Added Target Milestone|--- |11.0

[Bug target/99083] Big run-time regressions of 519.lbm_r with LTO

2021-02-21 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99083 Uroš Bizjak changed: What|Removed |Added Resolution|--- |FIXED Status|NEW

[Bug target/99115] ICE in extract_insn, at recog.c:2309 on alpha (error: unrecognizable insn) with -O2

2021-02-17 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99115 --- Comment #4 from Uroš Bizjak --- Compiles OK with: GNU C++14 (GCC) version 8.4.1 20210216 [releases/gcc-8 revision c6513400d84:39c49bc104d:1f3a07da9b6bcfa4733750826746bd18ac6f20db] (alpha-unknown-openbsd6.8) built as a cross from

[Bug target/99115] ICE in extract_insn, at recog.c:2309 on alpha (error: unrecognizable insn) with -O2

2021-02-16 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99115 Uroš Bizjak changed: What|Removed |Added Known to work||11.0 --- Comment #3 from Uroš Bizjak ---

[Bug target/99083] Big run-time regressions of 519.lbm_r with LTO

2021-02-15 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99083 --- Comment #10 from Uroš Bizjak --- (In reply to Richard Biener from comment #7) > There are a lot of targets that define REG_ALLOC_ORDER ^ > HONOR_REG_ALLOC_ORDER and thus are affected by this change... The following patch should solve this

[Bug target/99083] Big run-time regressions of 519.lbm_r with LTO

2021-02-15 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99083 --- Comment #8 from Uroš Bizjak --- (In reply to Richard Biener from comment #7) > Btw, for GCC 11 it might be tempting to simply revert the "no-op" change? I agree, this is the safest way at this time. The situation now looks like going into

[Bug target/99083] Big run-time regressions of 519.lbm_r with LTO

2021-02-15 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99083 --- Comment #6 from Uroš Bizjak --- As a side note, it is strange that ADJUST_REG_ALLOC_ORDER somehow require REG_ALLOC_ORDER to be defined (c.f. Comment #3), while its documentation says: The macro body should not assume anything about

[Bug target/99083] Big run-time regressions of 519.lbm_r with LTO

2021-02-15 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99083 --- Comment #5 from Uroš Bizjak --- Martin, can you please benchmark the patch from Comment #4? The patch is not totally trivial, because it introduces HONOR_REG_ALLOC_ORDER to x86 and this define disables some other code in ira-color.c,

[Bug target/99083] Big run-time regressions of 519.lbm_r with LTO

2021-02-15 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99083 --- Comment #4 from Uroš Bizjak --- Created attachment 50185 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50185=edit Proposed patch Proposed patch that fixes ira-color.c and introduces HONOR_REG_ALLOC_ORDER.

[Bug target/99083] Big run-time regressions of 519.lbm_r with LTO

2021-02-15 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99083 --- Comment #3 from Uroš Bizjak --- It looks to me another one is in reload1.c, find_reg: if (this_cost < best_cost /* Among registers with equal cost, prefer caller-saved ones, or use REG_ALLOC_ORDER if

[Bug target/99083] Big run-time regressions of 519.lbm_r with LTO

2021-02-13 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99083 --- Comment #1 from Uroš Bizjak --- This should be a no-op. According to the documentation: --q-- Macro: REG_ALLOC_ORDER If defined, an initializer for a vector of integers, containing the numbers of hard registers in the order in which

[Bug target/99025] [11 Regression] ICE Segmentation fault since r11-6351-g12ae2bc70846a2be

2021-02-10 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99025 --- Comment #2 from Uroš Bizjak --- Comment on attachment 50154 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50154 gcc11-pr99025.patch >2021-02-09 Jakub Jelinek >+ if (SUBREG_P (operands[1])) >+operands[1] = force_reg

[Bug rtl-optimization/98962] Perform bitops on floats directly with SSE

2021-02-04 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98962 --- Comment #4 from Uroš Bizjak --- (In reply to Jakub Jelinek from comment #3) > Another possibility is add x/v constraints to *andsi_1 and *anddi_1 with the > immediates and disparage that alternative enough to reflect the fact that > the

[Bug rtl-optimization/98961] Failure to optimize successive comparisons with 0 into clz

2021-02-04 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98961 --- Comment #3 from Uroš Bizjak --- Please note that LZCNT insn has it own set of problems (e.g. TARGET_AVOID_FALSE_DEP_FOR_BMI), so I'm not convinced that even: int z (int i) { return i == 0; } benefits from using LZCNT: 0: 31 c0

[Bug rtl-optimization/98961] Failure to optimize successive comparisons with 0 into clz

2021-02-04 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98961 Uroš Bizjak changed: What|Removed |Added CC||ubizjak at gmail dot com --- Comment #2

[Bug target/98737] Atomic operation on x86 no optimized to use flags

2021-01-19 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98737 --- Comment #2 from Uroš Bizjak --- This can be optimized with peephole2, we already have similar case in sync.md: ;; This peephole2 and following insn optimize ;; __sync_fetch_and_add (x, -N) == N into just lock {add,sub,inc,dec} ;; followed

[Bug target/98612] _mm_comieq_sd has wrong semantics

2021-01-18 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98612 --- Comment #8 from Uroš Bizjak --- (In reply to Hongtao.liu from comment #7) > I asked my colleagues within intel to revise the descriptions in the > intrinsics guide to make it more explicit about NAN operands. > > I'll fix this issue after

[Bug ada/98724] [11 Regression] gnat build failure on alpha-linux-gnu

2021-01-18 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98724 --- Comment #1 from Uroš Bizjak --- Sorry, I don't have access to alpha anymore. (And I'm surprised that gnat even builds, because I've never tried.)

[Bug middle-end/98713] Failure to generate branch version of abs if user requested it

2021-01-18 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713 --- Comment #4 from Uroš Bizjak --- Please see PR 56309 (and PR 85559 meta bug). Quote from Honza: The decision on whether to use cmov or jmp was always tricky on x86 architectures. Cmov increase dependency chains, register pressure (both

[Bug tree-optimization/96674] Failure to optimize combination of comparisons to dec+compare

2021-01-14 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96674 --- Comment #8 from Uroš Bizjak --- Comment on attachment 49969 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49969 Optimize combination of comparisons to dec+compare >+/* y == XXX_MIN || x < y --> x <= y - 1 */ Can we use TYPE_MIN

[Bug target/98671] gcc/config/i386/i386-options.c:787:redundantAssignment

2021-01-14 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98671 --- Comment #6 from Uroš Bizjak --- (In reply to David Binderman from comment #5) > (In reply to Uroš Bizjak from comment #4) > > I'm not sure if solving this would bring us anything. > > For clarity, at very most a 4% reduction in the size of

[Bug target/98683] Non-canonical compare produced with the VAX backend

2021-01-14 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98683 --- Comment #1 from Uroš Bizjak --- Maybe TARGET_CANONICALIZE_COMPARISON would help here? x86 had a similar issue with ficom x87 insn where float RTX was always the first operand, but the compare was with the float extend of the second one.

[Bug target/98671] gcc/config/i386/i386-options.c:787:redundantAssignment

2021-01-14 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98671 Uroš Bizjak changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED

[Bug target/98671] gcc/config/i386/i386-options.c:787:redundantAssignment

2021-01-14 Thread ubizjak at gmail dot com via Gcc-bugs
|1 Last reconfirmed||2021-01-14 Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com Target Milestone|--- |11.0 --- Comment #2 from Uroš Bizjak --- Let me fix this.

[Bug target/98482] -mfentry creates invalid call for -mcmodel=large

2021-01-08 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98482 --- Comment #14 from Uroš Bizjak --- (In reply to Jakub Jelinek from comment #10) > If we are emitting for nested functions > pushq %r10 > 1:call__fentry__ > popq%r10 > (is it ok to misalign the stack for __fentry__?

[Bug target/98482] -mfentry creates invalid call for -mcmodel=large

2021-01-08 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98482 --- Comment #9 from Uroš Bizjak --- (In reply to Topi Miettinen from comment #8) > I'm unfortunately ignorant to GCC internals and usage of %r10, but otherwise > the patch looks good to me. > > For -mcmodel=large -fPIC, the call sequence

[Bug target/98482] -mfentry creates invalid call for -mcmodel=large

2021-01-07 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98482 --- Comment #5 from Uroš Bizjak --- (In reply to Topi Miettinen from comment #4) > Sorry, I didn't check the ABI. It seems that %r11 and maybe %r10 should be > usable: %r11 is already used as PROFILE_COUNT_REGISTER for !NO_PROFILE_COUNTERS

[Bug target/98482] -mfentry creates invalid call for -mcmodel=large

2021-01-07 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98482 --- Comment #3 from Uroš Bizjak --- (In reply to Uroš Bizjak from comment #2) > (In reply to Hongtao.liu from comment #1) > > and by the time of output __fentry__ in gcc, register is already accocated, > > is there any regs supposed to be safe

[Bug target/98482] -mfentry creates invalid call for -mcmodel=large

2021-01-07 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98482 --- Comment #2 from Uroš Bizjak --- (In reply to Hongtao.liu from comment #1) > and by the time of output __fentry__ in gcc, register is already accocated, > is there any regs supposed to be safe in the entry of function? or we need > to spill

[Bug target/98567] Failure to optimize using ZF flag from blsi

2021-01-06 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98567 --- Comment #2 from Uroš Bizjak --- Comment on attachment 49901 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49901 gcc11-pr98567.patch >+(define_insn "*bmi_blsi__cmp" >+ [(set (reg:CCZ FLAGS_REG) >+ (compare:CCZ >+

[Bug target/98522] _mm_cvttps_pi32 and _mm_cvtps_pi32 raise spurious FP exceptions

2021-01-06 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98522 Uroš Bizjak changed: What|Removed |Added Target Milestone|--- |10.3 Status|ASSIGNED

[Bug target/98521] [x86] _mm256_cmov_si256 XOP function is missing

2021-01-06 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98521 Uroš Bizjak changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED

[Bug target/98521] [x86] _mm256_cmov_si256 XOP function is missing

2021-01-05 Thread ubizjak at gmail dot com via Gcc-bugs
dot gnu.org |ubizjak at gmail dot com Status|UNCONFIRMED |ASSIGNED --- Comment #2 from Uroš Bizjak --- Created attachment 49882 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49882=edit Proposed patch

[Bug target/98522] _mm_cvttps_pi32 and _mm_cvtps_pi32 raise spurious FP exceptions

2021-01-05 Thread ubizjak at gmail dot com via Gcc-bugs
||2021-01-05 Ever confirmed|0 |1 Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com --- Comment #1 from Uroš Bizjak --- Created attachment 49881 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49881=edit Proposed patch

[Bug target/64243] Passing and returning structures with single member of floating type via SSE registers is wrong on Windows x86-64 ABI

2020-12-30 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64243 Uroš Bizjak changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Known to work|

<    3   4   5   6   7   8   9   10   11   12   >