https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902
--- Comment #19 from Alexander Monakov ---
(In reply to rguent...@suse.de from comment #18)
> True - but does that catch the cases people are interested and are
> allowed by the FP contraction rules? I'm thinking of
>
> x = a*b + c*d + e + f;
Assignee: unassigned at gcc dot gnu.org
Reporter: amonakov at gcc dot gnu.org
Target Milestone: ---
For the following testcase
#include
__attribute__((target("avx")))
int f(__m128i a[], long n)
{
for (long i = 0; i < n; i++)
if (!_mm_testz_si128(a[i], a[i]))
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107107
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107115
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107115
--- Comment #8 from Alexander Monakov ---
Just optimizing out the redundant store seems difficult because on some targets
scheduling is invoked from reorg (and it relies on alias sets).
We need a solution that works for combine too — is it poss
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107115
--- Comment #12 from Alexander Monakov ---
For reference, the previous whacked mole appears to be PR 106187 (where
mems_same_for_tbaa_p comes from).
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107250
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107250
--- Comment #3 from Alexander Monakov ---
Well, obviously because in one function both 'f' and 'tmp' are live across the
call, and in the other function only 'f' is live across the call. The
difference is literally pushing one register vs. two r
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102380
Bug 102380 depends on bug 99619, which changed state.
Bug 99619 Summary: fails to infer local-dynamic TLS model from hidden visibility
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99619
What|Removed |Added
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99619
Alexander Monakov changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832
--- Comment #1 from Alexander Monakov ---
Suggested partial fix for the integer-pipe side of the blowup:
https://inbox.sourceware.org/gcc-patches/4549f27b-238a-7d77-f72b-cc77df8ae...@ispras.ru/
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107353
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107353
--- Comment #8 from Alexander Monakov ---
(In reply to Arseny Solokha from comment #7)
> I have it on x86_64-pc-linux-gnu…
Thanks for the info (I assume you don't have any special configure arguments),
but that's surprising, I ran bootstrap+reg
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107353
--- Comment #9 from Alexander Monakov ---
Actually, latest results from H.J. Lu's periodic x86_64 tester don't exhibit
such issues either:
https://inbox.sourceware.org/gcc-testresults/20221025065901.6dc0062...@gnu-34.sc.intel.com/T/#u
: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: amonakov at gcc dot gnu.org
CC: amonakov at gcc dot gnu.org, asolokha at gmx dot com,
bergner at gcc dot gnu.org, iains at gcc dot gnu.org,
law
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107353
--- Comment #11 from Alexander Monakov ---
I've broken out the C++ issue from comment #10 as PR 107393, thanks for the
testcase. It's a separate issue from emutls and Fortran ICEs on other targets.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107353
--- Comment #12 from Alexander Monakov ---
ICE on the emutls-3.c testcase isn't related to emutls. Rather, the frontend
invokes decl_default_tls_model before attributes are processed, so the first
time around we miss the 'common' attribute when
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107353
--- Comment #13 from Alexander Monakov ---
As for the Fortran testcases, the issue is again caused by the front-end
invoking decl_default_tls_model before assigning DECL_COMMON, this time in
fortran/trans-common.cc:build_common_decl.
So I guess
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: amonakov at gcc dot gnu.org
CC: amonakov at gcc dot gnu.org, asolokha at gmx dot com,
bergner at gcc dot gnu.org, iains at gcc dot gnu.org
Keywords: openmp
Severity: normal
Priority: P3
Component: fortran
Assignee: unassigned at gcc dot gnu.org
Reporter: amonakov at gcc dot gnu.org
CC: amonakov at gcc dot gnu.org, asolokha at gmx dot com,
bergner at gcc dot gnu.o
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107353
Alexander Monakov changed:
What|Removed |Added
Summary|[13 regression] Numerous|frontends sometimes select
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107505
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832
--- Comment #3 from Alexander Monakov ---
Followup patches have been posted at
https://inbox.sourceware.org/gcc-patches/20221101162637.14238-1-amona...@ispras.ru/
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107505
Alexander Monakov changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107621
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107647
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107647
--- Comment #6 from Alexander Monakov ---
Sure, but I was talking specifically about the pattern matching introduced by
that commit.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688
--- Comment #15 from Alexander Monakov ---
Ah, there will be an mfence after the vmovdqa when necessary for an atomic
store, thanks (I missed that because the testcase doesn't scan for mfence).
||amonakov at gcc dot gnu.org
Resolution|--- |FIXED
--- Comment #8 from Alexander Monakov ---
Fixed.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832
--- Comment #6 from Alexander Monakov ---
With these patches on trunk, current situation is:
nm -CS -t d --defined-only gcc/insn-automata.o | sed 's/^[0-9]* 0*//' | sort -n
| tail -40
2496 r slm_base
2527 r bdver3_load_min_issue_delay
2746 r glm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832
--- Comment #8 from Alexander Monakov ---
(In reply to Jan Hubicka from comment #7)
> > 53730 r btver2_fp_min_issue_delay
> > 53760 r znver1_fp_transitions
> > 93960 r bdver3_fp_transitions
> > 106102 r lujiazui_core_check
> > 106102 r lujiazui_c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107715
--- Comment #3 from Alexander Monakov ---
There's a forward dependency over 'c' (read of c[i] vs. write of c[i+1] with
'i' iterating forward), and the vectorized variant takes the hit on each
iteration. How is a slowdown even surprising.
For th
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832
--- Comment #10 from Alexander Monakov ---
(In reply to Jan Hubicka from comment #9)
> Actually for older cores I think the manufacturers do not care much. I
> still have a working Bulldozer machine and I can do some testing.
> I think in Buldoz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107719
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107647
--- Comment #15 from Alexander Monakov ---
I'm confused about the first hunk in the attached patch:
--- a/gcc/tree-vect-slp-patterns.cc
+++ b/gcc/tree-vect-slp-patterns.cc
@@ -1035,8 +1035,10 @@ complex_mul_pattern::matches (complex_operation_t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97832
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107879
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97832
--- Comment #21 from Alexander Monakov ---
(In reply to Michael_S from comment #19)
> > Also note that 'vfnmadd231pd 32(%rdx,%rax), %ymm3, %ymm0' would be
> > 'unlaminated' (turned to 2 uops before renaming), so selecting independent
> > IVs for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107772
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688
--- Comment #24 from Alexander Monakov ---
(In reply to Peter Cordes from comment #23)
> But at least on Linux, I don't think there's a way for user-space to even
> ask for a page of WT or WP memory (or UC or WC). Only WB memory is easily
> ava
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688
--- Comment #26 from Alexander Monakov ---
Sure, the right course of action seems to be to simply document that atomic
types and built-ins are meant to be used on "common" (writeback) memory, and no
guarantees can be given otherwise, because it
||amonakov at gcc dot gnu.org
--- Comment #3 from Alexander Monakov ---
LLVM does a better job at code layout, and massively wins on the amount of
executed branches (in particular unconditional jumps). With -fdisable-rtl-bbro
gcc achieves a similar performance.
||amonakov at gcc dot gnu.org
Resolution|--- |FIXED
--- Comment #3 from Alexander Monakov ---
Fixed for gcc-13.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107905
--- Comment #5 from Alexander Monakov ---
Not sure what you don't like about the inputs, they appear quite reasonable.
Perhaps GCC's estimation of bb frequencies is off (with profile feedback we
achieve good performance).
Georgi: you'll likely
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107905
--- Comment #6 from Alexander Monakov ---
Let me add that Clang supports GCC's -fprofile-{generate,use} flags for
compatibility as well.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107879
--- Comment #10 from Alexander Monakov ---
If anyone is confused like I was, the commit actually includes a testcase, but
the addition is not mentioned in the Changelog. I was sure the server-side
receive hook was supposed to reject such incompl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107971
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108008
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832
--- Comment #11 from Alexander Monakov ---
Factoring out Lujiazui divider shrinks its tables by almost 20x:
3 r lujiazui_decoder_min_issue_delay
20 r lujiazui_decoder_transitions
32 r lujiazui_agu_min_issue_delay
126 r lujiazui_agu_transitions
3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108008
--- Comment #9 from Alexander Monakov ---
I think this is tree-ldist placing memset(sameZ, 0, zPlaneCount) after the
loop, overwriting conditional 'sameZ[i] = true' assignments that happen in the
loop.
For the smaller testcase from comment #6,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108008
--- Comment #10 from Alexander Monakov ---
Looks similar to PR 107323, but needs explicit -ftree-loop-distribution to
trigger.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108076
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108117
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108117
Alexander Monakov changed:
What|Removed |Added
Status|RESOLVED|UNCONFIRMED
Resolution|INVA
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108117
--- Comment #9 from Alexander Monakov ---
(In reply to Feng Xue from comment #8)
> In another angle, because gcc already model control flow and SSA web for
> setjmp/longjmp, explicit volatile specification is not really needed.
That covers GIM
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: amonakov at gcc dot gnu.org
Target Milestone: ---
match.pd has multi-pattern matcher 'nop_atomic_bit_test_and_p'.
It expands to ~38 KLOC in gimple-match.cc and ~350 KB in the compiled binary.
There h
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108117
--- Comment #12 from Alexander Monakov ---
Shouldn't there be another bug for the sched1 issue specifically? In absence of
abnormal control flow, extending lifetimes of pseudos across calls is still
likely to be a pessimization.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108117
Alexander Monakov changed:
What|Removed |Added
Resolution|DUPLICATE |FIXED
--- Comment #14 from Alexande
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108117
Alexander Monakov changed:
What|Removed |Added
Resolution|FIXED |DUPLICATE
--- Comment #15 from Alex
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57067
--- Comment #9 from Alexander Monakov ---
*** Bug 108117 has been marked as a duplicate of this bug. ***
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108140
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
||amonakov at gcc dot gnu.org,
||zhroma at gcc dot gnu.org
--- Comment #1 from Alexander Monakov ---
Hi Martin, this is a modulo-scheduling bug; I think you added "Blocks:
sel-sched" by mistake — removing, and Cc'ing Roma
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93031
--- Comment #7 from Alexander Monakov ---
In comment #2 I touched upon a potentially more practical way to offer
-fno-strict-alignment:
Run early work with ABI alignments: compute __alignof correctly, lay out
composite types as required by ABI,
|UNCONFIRMED
CC||amonakov at gcc dot gnu.org
--- Comment #4 from Alexander Monakov ---
32-bit Linux should also be affected (perhaps with less probability if clock()
is more precise). It is surprising we track time in a 'double'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100618
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100483
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100618
--- Comment #3 from Alexander Monakov ---
Furthermore as discussed in bug 100483 this request appears based on a
misunderstanding what the 'semantic-' part of the option is about. It does not
affect assembly/linker-level binding mechanism, so th
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593
--- Comment #3 from Alexander Monakov ---
I understand what you're saying, but it seems we're talking past each other.
I agree that if a library is linked with any -Bsymbolic* flag, the main
executable is at risk of broken address uniqueness un
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593
--- Comment #5 from Alexander Monakov ---
Hm, I still don't think I'm misunderstanding what you're saying. I'm familiar
with the ELF standard (and FWIW I have read your blog posts on related
matters). I am responding to this sentiment from the o
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593
--- Comment #7 from Alexander Monakov ---
Thanks. I agree that inferring address significance on the linker side is
problematic.
Thinking about your original request, I was about to say that it would be very
reasonable to do under -fno-plt flag
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100573
--- Comment #14 from Alexander Monakov ---
I would break in gdb on cuModuleGetFunction and
x/s $rdx
to print the failing symbol (it's the third argument to the function).
It seems the "inner" entrypoint (which your patch attempted to nullif
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100573
--- Comment #17 from Alexander Monakov ---
Yes, I'd agree normally it's present in the offload table, but ideally if
you're trying to stub out the call, it should not be present in the offload
table.
I think Tobias is saying that on GIMPLE this
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100573
--- Comment #19 from Alexander Monakov ---
Ah, does the issue arise because foo._omp_fn.0 is (before the patch) callable
in two contexts, in one it's called from host and should be 'omp target
entrypoint', and in the other it's called from offlo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593
--- Comment #10 from Alexander Monakov ---
Is there something wrong or undesirable with making this under -fno-plt (or the
noplt attribute as in your example)?
(after all, it is a kind of PLT-avoidance transformation, just for addressing
rather
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105700
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105700
--- Comment #5 from Alexander Monakov ---
(In reply to Artem S. Tashkinov from comment #4)
> > There should be a note in dmesg when a process segfaults outside of a
> > debugger. If you run wine without gdb, and winedevice.exe crashes, is there
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105688
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105863
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: amonakov at gcc dot gnu.org
Target Milestone: ---
In the following code, 'f' is not SLP-vectorized, but 'g' is. From a brief look
at slp2 dump, looks like
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106277
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101347
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91299
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101347
Alexander Monakov changed:
What|Removed |Added
Summary|[11/12/13 Regression] ICE |[11/12 Regression] ICE in
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: amonakov at gcc dot gnu.org
Target Milestone: ---
int main(int argc, char **argv)
{
__label__ loop, end;
void jmp(int c) { goto *(c ? &&loop : &&
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106422
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106422
--- Comment #4 from Alexander Monakov ---
Regarding point 1 above, I should mention that Glibc headers mark both 'vfork'
and 'raise' as leaf.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106422
--- Comment #7 from Alexander Monakov ---
I think item 2 from comment #3 (jump threading) still needs to be solved
independently of what is decided about item 1 (leaf functions resuming earlier
returns_twice call).
---
The problem with 'leaf'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106422
--- Comment #8 from Alexander Monakov ---
I mean the minimized testcase, the original attachment does execve/_exit after
vfork.
Severity: normal
Priority: P3
Component: ipa
Assignee: unassigned at gcc dot gnu.org
Reporter: amonakov at gcc dot gnu.org
CC: amonakov at gcc dot gnu.org, asolokha at gmx dot com,
dcb314 at hotmail dot com, hubicka at
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106437
--- Comment #1 from Alexander Monakov ---
With the exception of '_exit', exit family of functions (exit, _Exit,
quick_exit) are also marked leaf despite exit and quick_exit invoking
atexit/on_exit/at_quick_exit handlers. Only _Exit is specified
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106422
--- Comment #10 from Alexander Monakov ---
The leaf issue is now PR 106437.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106422
--- Comment #11 from Alexander Monakov ---
A cleaner testcase for jump threading (still ICEs despite presence of
ABNORMAL_DISPATCHER):
void vfork() __attribute__((__leaf__));
void semanage_reload_policy(char *arg, void cb(void)) {
if (!arg) {
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91299
--- Comment #11 from Alexander Monakov ---
Marxin, you've marked this as WAITING, can you please re-evaluate? The nice
testcase from comment #2 is reproducible on trunk as well.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105135
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
Assignee: unassigned at gcc dot gnu.org
Reporter: amonakov at gcc dot gnu.org
Target Milestone: ---
On 64-bit x86, straightforward use of SSE 4.2 crc instruction looks like
#include
#include
uint32_t f(uint32_t c, uint64_t *p, size_t n)
{
for (size_t i = 0; i < n
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106422
Alexander Monakov changed:
What|Removed |Added
CC||aldyh at gcc dot gnu.org
--- Commen
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106453
--- Comment #1 from Alexander Monakov ---
Any idea if the following is reasonable? It compiles and achieves the desired
result.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index bdde577dd..d82656678 100644
--- a/gcc/config/i3
801 - 900 of 1080 matches
Mail list logo