https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114659
--- Comment #15 from Alexander Monakov ---
(In reply to Jakub Jelinek from comment #14)
> (In reply to Alexander Monakov from comment #13)
> > fldt does not convert (otherwise there's no way to spill/reload x87
> > registers).
>
> Doesn't it
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114659
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115533
--- Comment #26 from Alexander Monakov ---
(In reply to Richard Biener from comment #24)
>
> That's because of -fno-vect-cost-model, it wouldn't be vectorized otherwise.
Thanks, I forgot. The testcase in PR 106902 was vectorized at plain -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115533
--- Comment #23 from Alexander Monakov ---
I suggest it to close this a dup of PR 106902 if there are no better ideas.
By the way, in both cases SLP introduces vectors in a loop where scalar
computations it's attempting to replace are not
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115533
--- Comment #22 from Alexander Monakov ---
Similar to the RawTherapee issue, SLP opportunities are created by predcom, so
either -fno-predictive-commoning or -fno-tree-slp-vectorize avoids numerical
runaway on the small testcase.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115533
--- Comment #20 from Alexander Monakov ---
Sam, can you provide more context? It seems there is no downstream bugreport?
How does the alleged miscompilation manifest?
Note that effects of interplay of fp-contract=fast and vectorization can be
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115333
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115161
--- Comment #23 from Alexander Monakov ---
(In reply to Sergei Trofimovich from comment #22)
> Here `pcmpeqd %xmm2,%xmm1` is a problematic instruction. Why does `gcc` use
> `%xmm2` (result of `cvttps2dq`) instead of, say `%xmm0` which contains
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115170
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115161
--- Comment #20 from Alexander Monakov ---
(In reply to Jakub Jelinek from comment #19)
> If we guarantee that we never constant fold FIX/UNSIGNED_FIX with
> -ftrapping-math (we shouldn't, as the exceptions should be raised), then
> using
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115161
--- Comment #18 from Alexander Monakov ---
No, allowing value-changing transformations under -ftrapping-math is really not
appropriate. Invoking the intrinsic on a large floating-point value is not UB.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115161
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115132
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115091
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115014
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114944
--- Comment #4 from Alexander Monakov ---
Like this:
pandxmm1, XMMWORD PTR .LC0[rip]
movaps XMMWORD PTR [rsp-40], xmm0
xor eax, eax
xor edx, edx
movaps XMMWORD PTR [rsp-24], xmm1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114944
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114960
Bug ID: 114960
Summary: [12/13/14/15 Regression] fails to clean up vector
casts
Product: gcc
Version: 12.3.1
Status: UNCONFIRMED
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114923
--- Comment #4 from Alexander Monakov ---
You can place points of possible access outside of abstract machine in a
fine-grained manner with volatile asms:
asm volatile("" : "=m"(buf));
This cannot be reordered against accesses to volatile
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114923
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114765
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480
--- Comment #21 from Alexander Monakov ---
It is possible to reduce gcc_qsort workload by improving the presorted-ness of
the array, but of course avoiding quadratic behavior would be much better.
With the following change, we go from
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480
--- Comment #20 from Alexander Monakov ---
(note that if you uninclude the testcase and compile with -fno-exceptions it's
much faster)
On the smaller testcase from comment 14, prune_unused_phi_nodes invokes
gcc_qsort 53386 times. There are two
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114337
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108866
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114261
--- Comment #10 from Alexander Monakov ---
Indeed, but OTOH according to bug 84402 comment 58 it caused a noticeable hit
on gimple-match.cc compilation:
733a1b777f16cd397b43a242d9c31761f66d3da8 13th January 2023
sched-deps: do not schedule
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114261
--- Comment #8 from Alexander Monakov ---
If we want to get rid of the compilation time regression sooner rather than
later, I can suggest limiting my change only to functions that call setjmp:
diff --git a/gcc/sched-deps.cc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114261
Alexander Monakov changed:
What|Removed |Added
CC||mkuvyrkov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114261
--- Comment #3 from Alexander Monakov ---
The first attachment is empty (perhaps you made a non-recursive archive when
you meant to recursively zip a directory).
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66487
--- Comment #28 from Alexander Monakov ---
The bug is about the issue of lacking diagnostics, it should be fine to make
note of various approaches to remedy the problem in one bug report.
(in any case, all discussion of the Valgrind-based
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113903
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113890
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113560
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113293
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113280
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113159
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113082
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44179
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697
--- Comment #9 from Alexander Monakov ---
... as does inserting a nop before the compare ¯\_(ツ)_/¯
--- d.out.ltrans0.ltrans.slow.s 2023-12-01 18:32:54.255841611 +0300
+++ d.out.ltrans0.ltrans.s 2023-12-01 18:53:04.909438690 +0300
@@
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697
--- Comment #8 from Alexander Monakov ---
Thanks, I can reproduce it. It is pretty tricky though. For instance, just
swapping the mov and the compare is enough to make it fast:
--- d.out.ltrans0.ltrans.slow.s 2023-12-01 18:32:54.255841611
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=07
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112701
Bug ID: 112701
Summary: wrong type inference for ternary operator in
preprocessing context
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112699
--- Comment #2 from Alexander Monakov ---
Sorry, even though GCC's limits.h is installed under include-fixed, it is
generated separately, not by the generic fixincludes mechanism. I was confused.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112699
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111655
--- Comment #13 from Alexander Monakov ---
> Then there is the MULT_EXPR x * x case
This is PR 111701.
It would be nice to clarify what "nonnegative" means in the contracts of this
family of functions, because it's ambiguous for NaNs and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110307
Alexander Monakov changed:
What|Removed |Added
CC||uros at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82242
--- Comment #5 from Alexander Monakov ---
The small testcase from comment 3 is now improved on trunk, possibly thanks to
work in PR 110215.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112367
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66487
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111884
Bug ID: 111884
Summary: unsigned char no longer aliases anything under
-std=c2x
Product: gcc
Version: 13.2.1
Status: UNCONFIRMED
Keywords: wrong-code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111768
--- Comment #11 from Alexander Monakov ---
(In reply to Hongtao.liu from comment #10)
> > indeed (but I believe it did happen with Alder Lake already, by accident,
> > with AVX512 on P-cores but not on E-cores).
>
> AVX512 is physically fused
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111768
--- Comment #9 from Alexander Monakov ---
(In reply to Arsen Arsenović from comment #8)
> indeed (but I believe it did happen with Alder Lake already, by accident,
> with AVX512 on P-cores but not on E-cores).
AFAIK on those Alder Lake CPUs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111768
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111768
--- Comment #5 from Alexander Monakov ---
I think it's similar to attempting -march=native under distcc, which is already
warned about on Gentoo wiki: https://wiki.gentoo.org/wiki/Distcc
The difference here is that Intel so far decided to make
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111694
--- Comment #7 from Alexander Monakov ---
No backport for gcc-13 planned?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111736
--- Comment #3 from Alexander Monakov ---
Sorry, the second half of my comment is confusing. To clarify, ASan works fine
for TLS data (the compiler knows that TLS base is at fs:0; libsanitizer uses
some hacks to initialize shadow for TLS
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111736
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111643
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111701
Bug ID: 111701
Summary: [11/12/13/14 Regression] wrong code for
__builtin_signbit(x*x)
Product: gcc
Version: 13.2.1
Status: UNCONFIRMED
Keywords: wrong-code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111694
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111655
--- Comment #11 from Alexander Monakov ---
(In reply to Richard Biener from comment #10)
> And this conservatively has to apply to all FP divisions where we might infer
> "nonnegative" unless we can also infer !zerop?
Yes, I think the logic in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51446
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111655
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111210
--- Comment #4 from Alexander Monakov ---
The testcase is small enough to notice the issue by inspection.
Note that you get the "expected" answer with -fno-strict-aliasing, and as
explained in https://gcc.gnu.org/bugs/ it is one of the things
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111210
Alexander Monakov changed:
What|Removed |Added
Resolution|--- |INVALID
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43
--- Comment #6 from Alexander Monakov ---
Thanks.
i5-1335U has two "performance cores" (with HT, four logical CPUs) and eight
"efficiency cores". They have different micro-architecture. Are you binding the
benchmark to some core in particular?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=01
Alexander Monakov changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111009
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110979
--- Comment #2 from Alexander Monakov ---
Yes, it is wrong-code to full extent. To demonstrate, you can initialize 'sum'
and the array to negative zeroes:
#define FLT double
#define N 20
__attribute__((noipa))
FLT
foo3 (FLT *a)
{
FLT sum
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946
--- Comment #11 from Alexander Monakov ---
(In reply to Alexander Monakov from comment #8)
> inline void mbedtls_put_unaligned_uint64(void *p, uint64_t x)
> {
> memcpy(p, , sizeof(x));
> }
>
>
> We deciding to not inline this, while
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946
--- Comment #10 from Alexander Monakov ---
Ah, the non-static inlines are intentional, the corresponding extern
declarations appear in library/platform_util.c. Sorry, I missed that file the
first time around.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946
--- Comment #9 from Alexander Monakov ---
(In reply to Alexander Monakov from comment #2)
> Note that inline functions in mbedtls/library/alignment.h all miss the
> 'static' qualifier, which affects inlining decisions, and looks like a
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946
--- Comment #8 from Alexander Monakov ---
Why? There's no bswap here, in particular mbedtls_put_unaligned_uint64 is a
straightforward wrapper for memcpy:
inline void mbedtls_put_unaligned_uint64(void *p, uint64_t x)
{
memcpy(p, ,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110926
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110202
Alexander Monakov changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799
--- Comment #16 from Alexander Monakov ---
In C11 and C++11 the issue of compiler-introduced racing loads is discussed as
follows (5.1.2.4 Multi-threaded executions and data races in C11):
28 NOTE 14 Transformations that introduce a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110823
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799
--- Comment #9 from Alexander Monakov ---
(In reply to Tom de Vries from comment #7)
> Can you elaborate on what you consider a correct approach?
I think this optimization is incorrect and should be active only under -Ofast.
I can offer two
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110762
--- Comment #14 from Alexander Monakov ---
That seems undesirable in light of comment #4, you'd risk creating a situation
when -fno-trapping-math is unpredictably slower when denormals appear in dirty
upper halves.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110762
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110611
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110438
--- Comment #3 from Alexander Monakov ---
Patch available:
https://inbox.sourceware.org/gcc-patches/8f73371d732237ed54ede44b7bd88...@ispras.ru/T/#u
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110202
--- Comment #9 from Alexander Monakov ---
(In reply to Hongtao.liu from comment #8)
>
> For this one, we can load *a into %zmm0 to avoid false_dependence.
>
> vmovdqau ZMMWORD PTR [rdi], zmm0
> vpternlogq zmm0, zmm0, zmm0, 85
Yes, since
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110202
--- Comment #7 from Alexander Monakov ---
Note that vpxor serves as a dependency-breaking instruction (see PR 110438). So
in negate1 we do the right thing for the wrong reasons, and in negate2 we can
cause a substantial stall if the previous
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110438
--- Comment #1 from Alexander Monakov ---
We might want to omit PXOR when optimizing for size.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110438
Bug ID: 110438
Summary: generating all-ones zmm needs dep-breaking pxor before
ternlog
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237
--- Comment #21 from Alexander Monakov ---
(In reply to rguent...@suse.de from comment #19)
> But the size argument doesn't have anything to do with TBAA (and
> may_alias is about TBAA). I don't think we have any way to circumvent
> C object
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110431
Bug ID: 110431
Summary: Incorrect disambiguation of wide accesess from
store-merging or SLP
Product: gcc
Version: 12.3.0
Status: UNCONFIRMED
Keywords:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237
--- Comment #18 from Alexander Monakov ---
(In reply to rguent...@suse.de from comment #17)
> Yes, we do the same to loads. I hope that's not a common technique
> though but I have to admit the vectorizer itself assesses whether it's
> safe to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237
--- Comment #16 from Alexander Monakov ---
(In reply to rguent...@suse.de from comment #14)
> vectors of T and scalar T interoperate TBAA wise. What we disambiguate is
>
> int a[2];
>
> int foo(int *p)
> {
> a[0] = 1;
> *(v4si *)p =
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110273
--- Comment #8 from Alexander Monakov ---
(In reply to Sam James from comment #7)
> We keep getting quite a few reports of this downstream.
Of this mingw32 stack realignment issue specifically, i.e. Wine breakage when
AVX512 is enabled via
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237
--- Comment #13 from Alexander Monakov ---
(In reply to rguent...@suse.de from comment #12)
> As explained in comment#3 the issue is related to the tree alias oracle
> part that gets invoked on the MEM_EXPR for the load where there is
> no
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237
--- Comment #11 from Alexander Monakov ---
The trapping angle seems valid, but I have a really hard time understanding the
DSE issue, and the preceding issue about disambiguation based on RTL aliasing.
How would DSE optimize out 'd[5] = 1' in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110307
--- Comment #13 from Alexander Monakov ---
Note to self: check how control_flow_insn_p relates.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110369
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
1 - 100 of 385 matches
Mail list logo