at gcc dot gnu.org |ubizjak at gmail dot com
Last reconfirmed||2021-05-17
Ever confirmed|0 |1
--- Comment #1 from Uroš Bizjak ---
Created attachment 50822
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50822=edit
Pa
Assignee: unassigned at gcc dot gnu.org
Reporter: ubizjak at gmail dot com
Target Milestone: ---
Following testcases involving 4 byte vectors, e.g.:
typedef char __v4qi __attribute__ ((__vector_size__ (4)));
__v4qi foo (__v4qi a, __v4qi b, __v4qi c)
{
return (a & ~b) + c;
}
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100626
--- Comment #3 from Uroš Bizjak ---
*di3_doubleword calls split_double_mode with:
op0: (subreg:DI (reg/v:SI 89 [ li_18 ]) 0)
op1: (reg:DI 90 [ uc_4 ])
op2: (mem/c:DI (plus:SI (reg/f:SI 19 frame)
(const_int -4 [0xfffc]))
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98218
--- Comment #16 from Uroš Bizjak ---
(In reply to David Binderman from comment #15)
> Bug first appears sometime between git hash 21dfb22920ce32fc,
> dated yesterday and git hash 097fde5e7514e909, dated today.
Fixed by PR100581.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100581
Uroš Bizjak changed:
What|Removed |Added
Status|ASSIGNED|RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100581
--- Comment #3 from Uroš Bizjak ---
(In reply to Alex Coplan from comment #1)
> Is it valid to create a vector type with total size less than the element
> size? Shouldn't this be rejected?
No, the generated code is:
vmovq
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100581
Uroš Bizjak changed:
What|Removed |Added
Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98218
--- Comment #13 from Uroš Bizjak ---
(In reply to Uroš Bizjak from comment #12)
> Yeah, this is a non-existent SSE "cmove". I tried to find all paths where
> this should divert to a sequence of logic instructions or PBLENDB, but due
> to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98218
--- Comment #12 from Uroš Bizjak ---
(In reply to David Binderman from comment #11)
> I might be seeing something similar:
>
> caxcpy.f: In function 'caxcpy':
> caxcpy.f:53:72: error: unrecognizable insn:
>53 | end subroutine
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98218
Uroš Bizjak changed:
What|Removed |Added
Assignee|ubizjak at gmail dot com |unassigned at gcc dot
gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98375
Bug 98375 depends on bug 98218, which changed state.
Bug 98218 Summary: [TARGET_MMX_WITH_SSE] Implement 64bit vector compares
(AVX512 masked compares missing)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98218
What|Removed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98218
Uroš Bizjak changed:
What|Removed |Added
Summary|[TARGET_MMX_WITH_SSE] Miss |[TARGET_MMX_WITH_SSE]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100461
Uroš Bizjak changed:
What|Removed |Added
CC||hjl.tools at gmail dot com
--- Comment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100445
--- Comment #10 from Uroš Bizjak ---
Following patch fixes the failures:
--cut here--
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 4dfe7d6c282..61b2f921f41 100644
--- a/gcc/config/i386/i386-expand.c
+++
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100445
--- Comment #9 from Uroš Bizjak ---
ix86_use_mask_cmp_p should be refined, it has an early return for 64bit modes:
if (GET_MODE_SIZE (mode) == 64)
return true;
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100445
--- Comment #6 from Uroš Bizjak ---
(In reply to Uroš Bizjak from comment #5)
> ix86_expand_sse_movcc has special TARGET_XOP path, so the following patch is
> needed:
Ah, you beat me by the second ;)
Anyway, I have no XOP target, so probably
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100445
--- Comment #5 from Uroš Bizjak ---
ix86_expand_sse_movcc has special TARGET_XOP path, so the following patch is
needed:
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 347295afbb5..667dd057e0d 100644
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98218
Uroš Bizjak changed:
What|Removed |Added
Status|ASSIGNED|RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98375
Bug 98375 depends on bug 98218, which changed state.
Bug 98218 Summary: [TARGET_MMX_WITH_SSE] Miss vec_cmpmn/vcondmn expander for
64bit vector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98218
What|Removed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100342
--- Comment #8 from Uroš Bizjak ---
FYI, this whole analysis was done with Fedora 33 system compiler:
gcc version 10.3.1 20210422 (Red Hat 10.3.1-1) (GCC)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100342
--- Comment #7 from Uroš Bizjak ---
I have traced a bit where (insn 2275) and (insn 2287) come from.
In _.ira, we have:
613: r125:QI=r2067:DI#0
...
659: zero_extract(r2080:DI,0x8,0x8)=r125:QI#0
And in _.reload, a DImode reload is
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100342
--- Comment #5 from Uroš Bizjak ---
The problem can be seen in _.pro_and_epilogue pass:
Starting with:
_.cmpelim
2741: r14:DI=[sp:DI+0x38]
...
368: di:DI=r14:DI
...
613: si:QI=r14:QI
...
2737: bp:DI=r14:DI
...
658:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100342
--- Comment #4 from Uroš Bizjak ---
The problematic insn is:
401cec: 44 89 f6mov%r14d,%esi
This one should be 64 bit wide,
movl%r14d, %esi # 613 [c=4 l=3] *movqi_internal/2
but is actually a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100342
--- Comment #3 from Uroš Bizjak ---
For some reason the *input* value at BSWAP insn is truncated to 32bits.
v256u128 v256u128_1 =
SHLV (SHLSV (__builtin_bswap64 (u128_0), (v256u128) (0 < v256u128_0)) <=
0, v256u128_0);
u128_0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100355
--- Comment #3 from Uroš Bizjak ---
(In reply to Christophe Lyon from comment #2)
> Tried that, but it's not taken into account.
>
> ieee.exp uses c-torture-execute, maybe that function does not honor dg
> directives? (none of the tests under
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98375
Bug 98375 depends on bug 98060, which changed state.
Bug 98060 Summary: Failure to optimize cmp+setnb+add to cmp+sbb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98060
What|Removed |Added
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98060
Uroš Bizjak changed:
What|Removed |Added
Status|ASSIGNED|RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100312
Uroš Bizjak changed:
What|Removed |Added
Assignee|rguenth at gcc dot gnu.org |ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100182
Uroš Bizjak changed:
What|Removed |Added
Resolution|--- |FIXED
Status|ASSIGNED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82735
--- Comment #11 from Uroš Bizjak ---
(In reply to Uroš Bizjak from comment #9)
> (In reply to Richard Biener from comment #4)
> > Indeed as far as I understand an unspec volatile isn't sth clobbering
> > registers (not even memory?!). The insn
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82735
--- Comment #9 from Uroš Bizjak ---
(In reply to Richard Biener from comment #4)
> Indeed as far as I understand an unspec volatile isn't sth clobbering
> registers (not even memory?!). The insn is missing inputs/outputs
> (we might be able to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82735
--- Comment #8 from Uroš Bizjak ---
(In reply to Hongtao.liu from comment #7)
> Confirmed, let me fix this.
Please note that the current definition of vzeroupper does not model effects of
the instruction at all. The current definition is
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100041
Uroš Bizjak changed:
What|Removed |Added
Resolution|--- |FIXED
Status|ASSIGNED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100182
Uroš Bizjak changed:
What|Removed |Added
Attachment #50649|0 |1
is obsolete|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100182
--- Comment #17 from Uroš Bizjak ---
(In reply to Uroš Bizjak from comment #16)
> (In reply to Jakub Jelinek from comment #15)
> > Yes, but do they preserve all the bits and never modify any bit patterns,
> > including qNaNs and sNaNs? I
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100182
--- Comment #16 from Uroš Bizjak ---
(In reply to Jakub Jelinek from comment #15)
> Yes, but do they preserve all the bits and never modify any bit patterns,
> including qNaNs and sNaNs? I thought the point of using the fistp was that
> it
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100182
--- Comment #14 from Uroš Bizjak ---
(In reply to Uroš Bizjak from comment #13)
> DFmode loads and stores *are* atomic, this is what the optimization is based
> on.
Loads and stores to/from x87 and SSE registers, to be clear.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100182
--- Comment #13 from Uroš Bizjak ---
(In reply to Jakub Jelinek from comment #12)
> They do. Though, in the combined patch I'm still a little bit worried about
> the first 4 modified peephole2s, the last 4 look good to me.
> The last 4 are
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100182
--- Comment #11 from Uroš Bizjak ---
Jakub, do these two patches fix your failures?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100182
--- Comment #10 from Uroš Bizjak ---
(In reply to Uroš Bizjak from comment #9)
> (In reply to Jakub Jelinek from comment #8)
> > I think there are 8 those peephole2s rather than just 4 (I've been looking
> > for
> > rtx_equal_p (XEXP.*, 0) in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100182
--- Comment #9 from Uroš Bizjak ---
(In reply to Jakub Jelinek from comment #8)
> I think there are 8 those peephole2s rather than just 4 (I've been looking
> for
> rtx_equal_p (XEXP.*, 0) in sync.md
No, the other are not problematic.
dot gnu.org|
Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com
--- Comment #7 from Uroš Bizjak ---
(In reply to Jakub Jelinek from comment #1)
> In this particular case it is the sync.md:398 peephole2:
> (define_peephole2
> [(set (match_ope
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100119
--- Comment #2 from Uroš Bizjak ---
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index dda08ff67f2..5a7a00c13bd 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -1550,6 +1550,8 @@
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100041
Uroš Bizjak changed:
What|Removed |Added
Target Milestone|11.0|12.0
--- Comment #20 from Uroš Bizjak
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100041
--- Comment #18 from Uroš Bizjak ---
(In reply to Jakub Jelinek from comment #17)
> Can we go with #c15 for GCC11 and do #c16 for GCC12?
I'd like to kill the option for GCC11, and the solution is safer than #c15.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100041
Uroš Bizjak changed:
What|Removed |Added
Target|x86_64-linux-musl |x86_64
Target Milestone|---
at gcc dot gnu.org |ubizjak at gmail dot com
--- Comment #16 from Uroš Bizjak ---
Created attachment 50568
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50568=edit
Proposed patch
Attached patch disables -m96bit-long-double for 64-bit targets.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100041
--- Comment #15 from Uroš Bizjak ---
(In reply to Richard Biener from comment #12)
> A possible solution might be to disallow the -m64 -m96bit-long-double
> combination, the documentation suggests -m128bit-long-double was intended
> as an
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100041
--- Comment #13 from Uroš Bizjak ---
See PR79514.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100021
--- Comment #2 from Uroš Bizjak ---
Also, you are passing -march=sandybridge, but the profiler seems to show
Skylake (SKX) target. The STV pass heavily depends on target costs, and when
-march=skylake is passed, the conversion is avoided.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100021
--- Comment #1 from Uroš Bizjak ---
This is not vectorization, but the compiler uses vector registers to perform
scalar operations. This is STV (scalar-to-vector) pass in action, you can use
-mno-stv to avoid transformation.
The transformation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930
--- Comment #6 from Uroš Bizjak ---
(In reply to Jakub Jelinek from comment #4)
> Is there some reason why the patterns are written that way rather than split
> immediately into the AND or XOR? Perhaps it could be done on SUBREGs to
> make it
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99652
--- Comment #5 from Uroš Bizjak ---
inline long double
foo (void)
{
return 1.0;
}
gcc -S -O2 -mno-80387 double.c
double.c: In function ‘foo’:
double.c:3:1: error: x87 register return with x87 disabled
3 | {
| ^
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99601
--- Comment #3 from Uroš Bizjak ---
(In reply to CVS Commits from comment #1)
> The master branch has been updated by Nathan Sidwell :
>
> https://gcc.gnu.org/g:770d3487ef18a71f65626c182625889eee29f580
There is a typo in the selector:
+// {
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856
--- Comment #34 from Uroš Bizjak ---
(In reply to rguent...@suse.de from comment #32)
> what about reload_completed? We really only want to do this after RA.
No need for it, this is peephole2 pass that *always* runs after reload.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99405
--- Comment #2 from Uroš Bizjak ---
(In reply to Jakub Jelinek from comment #1)
> Created attachment 50306 [details]
> gcc11-pr99405.patch
>
> Untested fix.
- (match_operand:SI 2 "register_operand" "c")
+
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856
--- Comment #31 from Uroš Bizjak ---
(In reply to Richard Biener from comment #29)
> The simplified variant below works but IMHO matches cases we do not
> want to transform. I can't find any example on how to achieve that
> though.
I think
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856
--- Comment #28 from Uroš Bizjak ---
(In reply to Uroš Bizjak from comment #27)
> (In reply to Richard Biener from comment #26)
> > but that doesn't seem to match for some unknown reason.
> Try this:
The latency problem with the original
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856
--- Comment #27 from Uroš Bizjak ---
(In reply to Richard Biener from comment #26)
> but that doesn't seem to match for some unknown reason.
Try this:
(define_peephole2
[(match_scratch:DI 5 "Yv")
(set (match_operand:DI 0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856
--- Comment #24 from Uroš Bizjak ---
(In reply to Richard Biener from comment #22)
> That works to avoid the vpinsrq. I guess the case of a mem operand
> behaves similar to a gpr (plus the load uop), at least I don't have any
> contrary
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856
--- Comment #21 from Uroš Bizjak ---
(In reply to Uroš Bizjak from comment #20)
> (In reply to Richard Biener from comment #18)
> > Even on Skylake it's 2 (movq) + 3 (vpinsr), so there it's 6 vs. 3. Not
> > sure if we should somehow do this
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856
--- Comment #20 from Uroš Bizjak ---
(In reply to Richard Biener from comment #18)
> Even on Skylake it's 2 (movq) + 3 (vpinsr), so there it's 6 vs. 3. Not
> sure if we should somehow do this late somehow (peephole or splitter) since
> it
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
Bug 26163 depends on bug 99083, which changed state.
Bug 99083 Summary: Big run-time regressions of 519.lbm_r with LTO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99083
What|Removed |Added
||patch
Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com
Resolution|FIXED |---
--- Comment #13 from Uroš Bizjak ---
(In reply to Martin Jambor from comment #12)
> For the record, I have benchmarked the patches f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
Bug 26163 depends on bug 99083, which changed state.
Bug 99083 Summary: Big run-time regressions of 519.lbm_r with LTO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99083
What|Removed |Added
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99083
Uroš Bizjak changed:
What|Removed |Added
Target Milestone|--- |11.0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99083
Uroš Bizjak changed:
What|Removed |Added
Resolution|--- |FIXED
Status|NEW
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99115
--- Comment #4 from Uroš Bizjak ---
Compiles OK with:
GNU C++14 (GCC) version 8.4.1 20210216 [releases/gcc-8 revision
c6513400d84:39c49bc104d:1f3a07da9b6bcfa4733750826746bd18ac6f20db]
(alpha-unknown-openbsd6.8)
built as a cross from
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99115
Uroš Bizjak changed:
What|Removed |Added
Known to work||11.0
--- Comment #3 from Uroš Bizjak ---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99083
--- Comment #10 from Uroš Bizjak ---
(In reply to Richard Biener from comment #7)
> There are a lot of targets that define REG_ALLOC_ORDER ^
> HONOR_REG_ALLOC_ORDER and thus are affected by this change...
The following patch should solve this
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99083
--- Comment #8 from Uroš Bizjak ---
(In reply to Richard Biener from comment #7)
> Btw, for GCC 11 it might be tempting to simply revert the "no-op" change?
I agree, this is the safest way at this time. The situation now looks like
going into
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99083
--- Comment #6 from Uroš Bizjak ---
As a side note, it is strange that ADJUST_REG_ALLOC_ORDER somehow require
REG_ALLOC_ORDER to be defined (c.f. Comment #3), while its documentation says:
The macro body should not assume anything about
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99083
--- Comment #5 from Uroš Bizjak ---
Martin, can you please benchmark the patch from Comment #4?
The patch is not totally trivial, because it introduces HONOR_REG_ALLOC_ORDER
to x86 and this define disables some other code in ira-color.c,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99083
--- Comment #4 from Uroš Bizjak ---
Created attachment 50185
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50185=edit
Proposed patch
Proposed patch that fixes ira-color.c and introduces HONOR_REG_ALLOC_ORDER.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99083
--- Comment #3 from Uroš Bizjak ---
It looks to me another one is in reload1.c, find_reg:
if (this_cost < best_cost
/* Among registers with equal cost, prefer caller-saved ones, or
use REG_ALLOC_ORDER if
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99083
--- Comment #1 from Uroš Bizjak ---
This should be a no-op. According to the documentation:
--q--
Macro: REG_ALLOC_ORDER
If defined, an initializer for a vector of integers, containing the numbers
of hard registers in the order in which
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99025
--- Comment #2 from Uroš Bizjak ---
Comment on attachment 50154
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50154
gcc11-pr99025.patch
>2021-02-09 Jakub Jelinek
>+ if (SUBREG_P (operands[1]))
>+operands[1] = force_reg
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98962
--- Comment #4 from Uroš Bizjak ---
(In reply to Jakub Jelinek from comment #3)
> Another possibility is add x/v constraints to *andsi_1 and *anddi_1 with the
> immediates and disparage that alternative enough to reflect the fact that
> the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98961
--- Comment #3 from Uroš Bizjak ---
Please note that LZCNT insn has it own set of problems (e.g.
TARGET_AVOID_FALSE_DEP_FOR_BMI), so I'm not convinced that even:
int z (int i)
{
return i == 0;
}
benefits from using LZCNT:
0: 31 c0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98961
Uroš Bizjak changed:
What|Removed |Added
CC||ubizjak at gmail dot com
--- Comment #2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98737
--- Comment #2 from Uroš Bizjak ---
This can be optimized with peephole2, we already have similar case in sync.md:
;; This peephole2 and following insn optimize
;; __sync_fetch_and_add (x, -N) == N into just lock {add,sub,inc,dec}
;; followed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98612
--- Comment #8 from Uroš Bizjak ---
(In reply to Hongtao.liu from comment #7)
> I asked my colleagues within intel to revise the descriptions in the
> intrinsics guide to make it more explicit about NAN operands.
>
> I'll fix this issue after
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98724
--- Comment #1 from Uroš Bizjak ---
Sorry, I don't have access to alpha anymore.
(And I'm surprised that gnat even builds, because I've never tried.)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713
--- Comment #4 from Uroš Bizjak ---
Please see PR 56309 (and PR 85559 meta bug).
Quote from Honza:
The decision on whether to use cmov or jmp was always tricky on x86
architectures. Cmov increase dependency chains, register pressure (both
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96674
--- Comment #8 from Uroš Bizjak ---
Comment on attachment 49969
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49969
Optimize combination of comparisons to dec+compare
>+/* y == XXX_MIN || x < y --> x <= y - 1 */
Can we use TYPE_MIN
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98671
--- Comment #6 from Uroš Bizjak ---
(In reply to David Binderman from comment #5)
> (In reply to Uroš Bizjak from comment #4)
> > I'm not sure if solving this would bring us anything.
>
> For clarity, at very most a 4% reduction in the size of
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98683
--- Comment #1 from Uroš Bizjak ---
Maybe TARGET_CANONICALIZE_COMPARISON would help here? x86 had a similar issue
with ficom x87 insn where float RTX was always the first operand, but the
compare was with the float extend of the second one.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98671
Uroš Bizjak changed:
What|Removed |Added
Resolution|--- |FIXED
Status|ASSIGNED
|1
Last reconfirmed||2021-01-14
Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com
Target Milestone|--- |11.0
--- Comment #2 from Uroš Bizjak ---
Let me fix this.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98482
--- Comment #14 from Uroš Bizjak ---
(In reply to Jakub Jelinek from comment #10)
> If we are emitting for nested functions
> pushq %r10
> 1:call__fentry__
> popq%r10
> (is it ok to misalign the stack for __fentry__?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98482
--- Comment #9 from Uroš Bizjak ---
(In reply to Topi Miettinen from comment #8)
> I'm unfortunately ignorant to GCC internals and usage of %r10, but otherwise
> the patch looks good to me.
>
> For -mcmodel=large -fPIC, the call sequence
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98482
--- Comment #5 from Uroš Bizjak ---
(In reply to Topi Miettinen from comment #4)
> Sorry, I didn't check the ABI. It seems that %r11 and maybe %r10 should be
> usable:
%r11 is already used as PROFILE_COUNT_REGISTER for !NO_PROFILE_COUNTERS
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98482
--- Comment #3 from Uroš Bizjak ---
(In reply to Uroš Bizjak from comment #2)
> (In reply to Hongtao.liu from comment #1)
> > and by the time of output __fentry__ in gcc, register is already accocated,
> > is there any regs supposed to be safe
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98482
--- Comment #2 from Uroš Bizjak ---
(In reply to Hongtao.liu from comment #1)
> and by the time of output __fentry__ in gcc, register is already accocated,
> is there any regs supposed to be safe in the entry of function? or we need
> to spill
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98567
--- Comment #2 from Uroš Bizjak ---
Comment on attachment 49901
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49901
gcc11-pr98567.patch
>+(define_insn "*bmi_blsi__cmp"
>+ [(set (reg:CCZ FLAGS_REG)
>+ (compare:CCZ
>+
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98522
Uroš Bizjak changed:
What|Removed |Added
Target Milestone|--- |10.3
Status|ASSIGNED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98521
Uroš Bizjak changed:
What|Removed |Added
Resolution|--- |FIXED
Status|ASSIGNED
dot gnu.org |ubizjak at gmail dot com
Status|UNCONFIRMED |ASSIGNED
--- Comment #2 from Uroš Bizjak ---
Created attachment 49882
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49882=edit
Proposed patch
||2021-01-05
Ever confirmed|0 |1
Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com
--- Comment #1 from Uroš Bizjak ---
Created attachment 49881
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49881=edit
Proposed patch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64243
Uroš Bizjak changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Known to work|
701 - 800 of 6636 matches
Mail list logo