https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110289
Jan Hubicka changed:
What|Removed |Added
Resolution|--- |FIXED
Status|NEW
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287
Bug 110287 depends on bug 110289, which changed state.
Bug 110289 Summary: Phiprop may be good idea in early opts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110289
What|Removed |Added
Severity: normal
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
In the following testcase
void test2(int);
void
test(int n)
{
if (n > 5)
__builtin_unreacha
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109689
--- Comment #8 from Jan Hubicka ---
An easy way would be to avoid unlooping if tree_ssa_loop_ch is executed in loop
closed ssa (which happens from ch_vect pass).
I wonder how hard would be however to get this right?
I think this means to take
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110334
--- Comment #6 from Jan Hubicka ---
Comdats are really in conflict with the fact that we have command line options.
I blame C++ standard for that and I don't think there is fully satisfactory
solution to this problem.
I was playing with the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86590
Jan Hubicka changed:
What|Removed |Added
CC||hubicka at gcc dot gnu.org
--- Comment
: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
Seen here
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=474.210.0
between
g:067a8c7cb897b6a1ea5b1d26df8e89ccc7f0659c
and
g
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
This happen between
g:9e3607e19bcd34e1fc857ca44ae30a8a1a4f5e20
and
g:57446d1bc9757ee1fb030600d38fa9487231f2a4 (Jun 16 2023)
https://lnt.opensuse.org/db_default/v4/SPEC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287
--- Comment #2 from Jan Hubicka ---
With patch in PR110289 to optimize the std::max int MAX_EXPR and the throw
commented out I get:
size_type std::vector >::_M_check_len
(const struct vector * const this, size_type __n, const char * __s)
{
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110289
--- Comment #2 from Jan Hubicka ---
This patch fixes the problem
diff --git a/gcc/passes.def b/gcc/passes.def
index c9a8f19747b..faa5208b26b 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -88,6 +88,8 @@ along with GCC; see the file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110289
--- Comment #1 from Jan Hubicka ---
This is caused by the way libstdc++ defines max:
constexpr
inline const _Tp&
max(const _Tp& __a, const _Tp& __b)
{
if (__a < __b)
return __b;
return __a;
}
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
libstdc++ in push_back operation does equivalent of the following:
int max(int a, int b)
{
int *ptr;
if (a > b)
ptr =
e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287
--- Comment #1 from Jan Hubicka ---
Another problem is:
D.27747 = _8;
if (__n.3_2 > _8)
goto ; [34.00%]
else
goto ; [66.00%]
[local count: 364926196]:
[local count: 1073312330]:
# _18 = PHI <(4), &__n(5)>
_3 = *_18;
++
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
I am looking into ineffective codegen for loops controlled by std::vec based
stack (see testcase in PR109849).
The problem is that we fail to inline enough of implementation of
std
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
--- Comment #14 from Jan Hubicka ---
One interesting situation is:
void std::vector >::push_back (struct
vector * const this, const struct value_type & __x)
{
struct __normal_iterator D.27894;
struct pair * _1;
struct pair * _2;
struct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110062
--- Comment #5 from Jan Hubicka ---
In sharpening the number of iterations depends on sharpen radius. Not sure what
it is for the benchmark, but in normal situations the number of iterations is
indeed not very large.
However clang simply slp
Version: 13.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
Seen here:
https://lnt.opensuse.org/db_default/v4/CPP
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110062
Jan Hubicka changed:
What|Removed |Added
Status|WAITING |NEW
--- Comment #3 from Jan Hubicka ---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109812
--- Comment #17 from Jan Hubicka ---
I was also thinking of DCE. It looks like plausible idea. It may leads to a
surprise where you sture same undefined variable to two places and later
compare them for equality, but that is undefined anyway.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109812
Jan Hubicka changed:
What|Removed |Added
CC||rguenther at suse dot de
See
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
Phoronix claims 31% performance difference between gcc13 and clang on sharpen
benchmark of graphicsmagick. On zen3 I reproduce only 4%, but the benchmark
has only single short
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109985
--- Comment #6 from Jan Hubicka ---
Created attachment 55180
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55180=edit
untested patch
It turns out that as modref was written for memory loads/stores only and later
side effects discovery
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109985
Jan Hubicka changed:
What|Removed |Added
Assignee|unassigned at gcc dot gnu.org |hubicka at gcc dot
gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109914
--- Comment #2 from Jan Hubicka ---
The reason why gcc warns is that it is unable to prove that the function is
always finite. This means that it can not auto-detect pure attribute since
optimizing the call out may turn infinite program to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79704
Jan Hubicka changed:
What|Removed |Added
CC||hubicka at gcc dot gnu.org
--- Comment #2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110015
--- Comment #1 from Jan Hubicka ---
opj_t1_enc_refpass is not inlined due to large function growth and some others
due to max-inline-insns-auto. With inlining forced I get profile:
87.35% opj_t1_cblk_encode_processor
6.22%
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
I tried to reproduce openjpeg benchmarks from Phoronix
https://www.phoronix.com/review/gcc13-clang16-raptorlake/5
On zen3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109812
--- Comment #10 from Jan Hubicka ---
This is benchmarkeable version of the simplified testcase:
jan@localhost:/tmp> cat t.c
#define N 1000
struct rgb {unsigned char r,g,b;} rgbs[N];
int *addr;
struct drgb {double r,g,b;
#ifdef OPACITY
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109812
--- Comment #9 from Jan Hubicka ---
Oddly enough simplified version of the loop SLP vectorizes for me:
struct rgb {unsigned char r,g,b;} *rgbs;
int *addr;
double *weights;
struct drgb {double r,g,b;};
struct drgb sum()
{
struct drgb r;
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109812
--- Comment #8 from Jan Hubicka ---
Created attachment 55178
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55178=edit
Preprocessed source of VerticalFiller and HorisontalFiller
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109812
Jan Hubicka changed:
What|Removed |Added
Summary|GraphicsMagick resize is a |GraphicsMagick resize is a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109812
Jan Hubicka changed:
What|Removed |Added
CC||hubicka at gcc dot gnu.org
--- Comment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110007
Jan Hubicka changed:
What|Removed |Added
CC||hubicka at gcc dot gnu.org
--- Comment
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
https://lnt.opensuse.org/db_default/v4/CPP/latest_runs_report
it is between g:c5300bf3110b44e2742b36f49c2a380abd08d9c5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811
--- Comment #11 from Jan Hubicka ---
I got -fprofile-use builds working and with profile we peel the innermost loop
8 times which actually gets it off the hottest spot.
We get more slective on what to inline (do not inline cold calls) which may
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
--- Comment #10 from Jan Hubicka ---
Thanks. I tested the patch on jpegxl and it does not help there (I guess
becuase the redundancy there is partial). But it is cool we compile at least
the simplified testcase well.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
--- Comment #8 from Jan Hubicka ---
We can only SRA if the address is non-escaping. Clang does not seem to need it
to optimize better:
jan@localhost:~> cat t.c
extern void q(int *);
__attribute__ ((noinline))
void
test()
{
for (int a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811
--- Comment #10 from Jan Hubicka ---
Actually vectorization hurts on both compilers and bit more with clang.
It seems that all important loops are hand vectorized and since register
pressure is a problem, vectorizing other loops causes enough
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
Jan Hubicka changed:
What|Removed |Added
Blocks||109811
CC|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811
--- Comment #8 from Jan Hubicka ---
Created attachment 55101
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55101=edit
hottest loop
jpegxl build machinery adds -fno-vectorize and -fno-slp-vectorize to clang
flags. Adding
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811
--- Comment #6 from Jan Hubicka ---
hottest loop in clang's profile is:
for (size_t y = 0; y < opsin.ysize(); y++) {
for (size_t x = 0; x < opsin.xsize(); x++) {
if (is_background_row[y * is_background_stride + x]) continue;
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811
--- Comment #5 from Jan Hubicka ---
Also forgot to mention, I used zen3 machine. So Raptor lake is not necessary.
Note that build systems appends -O2 after any CFLAGS specified, so it really is
-O2 build:
# Force build with optimizations in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811
Jan Hubicka changed:
What|Removed |Added
CC||hubicka at gcc dot gnu.org
Ever
-end
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
jan@localhost:/tmp> cat t.C
#include
typedef unsigned int uint32_t;
std::vector> stack;
void
test()
{
while (!stack.empty()) {
std::pa
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106943
Jan Hubicka changed:
What|Removed |Added
CC||hubicka at gcc dot gnu.org
--- Comment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109690
--- Comment #7 from Jan Hubicka ---
Thanks a lot! There however still seems to be problem with vectorization
On zen4 i now get:
jh@ryzen4:~/gcc/build/gcc> ./xgcc -B ./ -O2 -march=native slp.c ; perf stat
./a.out
Performance counter stats
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
model name : AMD Ryzen 7 5800X 8-Core Processor
reproduces on my znver1 laptop too.
h@ryzen3:~/gcc-kub/build/gcc> cat tt.c
int a[100];
[[gnu::noipa]]
void l
: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
-ftree-vectorize enables -ftree-slp-vectorize and -ftree-loop-vectorize however
-fno-tree-vectorize does not disable them. This is quite counter-intuitive
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
The following loop can iterate only 0 times before hitting undefined behaviour.
struct foo {
int a[3];
int b;
} c;
test(int p)
{
for (int i = 3; i < p
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109137
--- Comment #26 from Jan Hubicka ---
reverted the znver1-3 change on gcc-12 branch. We still may want to fix IRA to
avoid the problem on core_avx512 targets.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79416
Jan Hubicka changed:
What|Removed |Added
CC||hubicka at gcc dot gnu.org
--- Comment #4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109509
--- Comment #5 from Jan Hubicka ---
For a summary
- PR109491 does not seem to be about integration time. most time is RTL PRE.
- PR108086 has 10% spent in integration and seems to be operand scan issue
- PR99785 is hard to judge given
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109491
Jan Hubicka changed:
What|Removed |Added
CC||hubicka at gcc dot gnu.org
--- Comment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108086
Jan Hubicka changed:
What|Removed |Added
CC||hubicka at gcc dot gnu.org
--- Comment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109509
Jan Hubicka changed:
What|Removed |Added
CC||hubicka at gcc dot gnu.org
--- Comment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109137
--- Comment #21 from Jan Hubicka ---
Zen 1-3 changes were intentional in the original tuning patch (it is also
briefly mentioned in the commit message). By allowing 256 bit AVX moves
instead of 64bit integer moves (or 128bit) we can move
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109341
Jan Hubicka changed:
What|Removed |Added
Assignee|unassigned at gcc dot gnu.org |hubicka at gcc dot
gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109213
--- Comment #8 from Jan Hubicka ---
We have large-stack-frame-growth that is relative, so yes, increasing stack
size of caller makes gcc to think that it is heavy and making it event heavier
will not hurt that much.
We originally ran into
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106896
Jan Hubicka changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106896
--- Comment #11 from Jan Hubicka ---
Originally to_sreal_frquency was intended to work both inter-procedurally and
intra-procedurally. However in such setup there are side cases that can not be
solved without knowing the corresponding
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108429
Jan Hubicka changed:
What|Removed |Added
Resolution|--- |FIXED
Status|ASSIGNED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 108429, which changed state.
Bug 108429 Summary: [13 Regression] FAIL: gcc.target/i386/pr89618.c
scan-tree-dump vect "LOOP VECTORIZED"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108429
What|Removed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101118
--- Comment #6 from Jan Hubicka ---
I am not really expert on coroutines. But this seems to be a type (not a
declaration we globalize during LTO) generated internally by the front-end.
The name __D.9984.3.4 looks like it has a global counter
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106896
--- Comment #10 from Jan Hubicka ---
The problem the assert is trying to solve is that local counters are all
frequencies relative to the entry block count, while IPA counters are absolute
values within the whole program. So comparing them
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108887
--- Comment #3 from Jan Hubicka ---
We don't really have way to mark nodes for removal. I am not 100% sure I
understand what the code does, but removing random nodes from cgraph in hook
invoked from mangling seems dangerous, since we invoke
: normal
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
While experimenting with a new gimple pass we noticed that pr70920.c is
sensitive on order of substitutions made. If 0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106258
Jan Hubicka changed:
What|Removed |Added
CC||hubicka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103585
--- Comment #15 from Jan Hubicka ---
We get 47s runtime with -O2 -flto and 53s with -O2
-fno-inline-functions-called-once.
The call sequence is:
[local count: 109362591]:
_1656 = (unsigned long) _45;
_1655 = _1656 + ivtmp.1182_2540;
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108511
--- Comment #6 from Jan Hubicka ---
The function is used to discard early summaries that will lead to external
calls. This saves some memory allocations.
At this stage we have identified prevailing symbols and they are first in the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108565
--- Comment #5 from Jan Hubicka ---
Teaching modref that THIS parameter of all destructors is nonescape looks like
interesting idea (and easy to implement).
Memory stores are currently indeed handled as "anyting may happen". modref does
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105469
--- Comment #18 from Jan Hubicka ---
It should just make any bug to go latent. It surprises me it makes any
difference given that things not cloned by ipa-cp should be all handled by
ipa-sra.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94360
Jan Hubicka changed:
What|Removed |Added
Status|UNCONFIRMED |NEW
Last reconfirmed|
at gcc dot gnu.org |hubicka at gcc dot
gnu.org
--- Comment #4 from Jan Hubicka ---
I see this is scatter with generic tuning. I actually did not intend to
disable it there without more testing, so I will revert that part of change.
In meantime I noticed that aocc sometimes seems to use
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106075
--- Comment #6 from Jan Hubicka ---
The SRA issue is fixed now, but I am not quite sure what is desrable solution
here...
This blocks modref from understanding side effects of functions doing EH.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108425
Jan Hubicka changed:
What|Removed |Added
Status|RESOLVED|REOPENED
Last reconfirmed|
: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
With non-call exceptions we misoptimize following testcase:
struct a{int a,b,c,d,e;};
void
test(struct a * __restrict a, struct a *b)
{
*a = (struct a){0,1,2,3,4};
*a = *b;
}
jan@localhost:/tmp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108346
--- Comment #2 from Jan Hubicka ---
Sadly the win/loss cases does not seem to suggest a simple cost scheme.
We currently compute gather/scatter costs as static startup cost + cost per
lane and they are set to approximately match actual
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99408
--- Comment #4 from Jan Hubicka ---
On Zen4 it is 20s for gcc and 6.9s for aocc, so still a problem.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108376
--- Comment #3 from Jan Hubicka ---
If I make the arrays random then GCC code is indeed faster:
#include
#include
typedef float real_t;
#define iterations 100
#define LEN_1D 32000
#define LEN_2D 256
real_t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56139
Jan Hubicka changed:
What|Removed |Added
CC||hubicka at gcc dot gnu.org
--- Comment #4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107950
--- Comment #7 from Jan Hubicka ---
Thanks for looking into the incremental link of libbackend. I had it in my tree
for a while but never got around implementing correct way to enable it only
during bootstrap since host compiler may not support
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
x264 benchmark has a loop averaging two unsigned char arrays that is executed
with relatively low trip counts that does not play well with our vectorized
code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99411
--- Comment #8 from Jan Hubicka ---
Compared to aocc we also do worse on zen4:
jh@alberti:~/tsvc/bin> ~/trunk-install/bin/gcc -Ofast -march=native s311.c
jh@alberti:~/tsvc/bin> time ./a.out
real0m3.207s
user0m3.206s
sys
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99634
--- Comment #2 from Jan Hubicka ---
AOCC produced code is:
.LBB0_2:# %vector.body
# Parent Loop BB0_1 Depth=1
# => This Inner
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99412
--- Comment #2 from Jan Hubicka ---
This is also seen with zen4 comparing gcc and aocc. (about 2.3 times
differnece)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99408
--- Comment #3 from Jan Hubicka ---
with zen4 gcc build loop takes 19s, while aocc 6.6.
aocc:
.LBB0_1:# %for.cond22.preheader
# =>This Loop Header: Depth=1
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
jh@alberti:~/tsvc/bin> more s1279.c
#include
#include
typedef float real_t;
#define iterations 100
#define LEN_1D 32000
#define LEN_2D 256
rea
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
The following two benchmarks tests gather/scatter codegen:
s4113.c:
#include
#include
//typedef float real_t;
#define
at gcc dot gnu.org |hubicka at gcc dot
gnu.org
--- Comment #7 from Jan Hubicka ---
mine.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107597
--- Comment #7 from Jan Hubicka ---
So I guess it is asan being confused by our optimization. We intentionaly
duplicate the symbol in order to reduce cost of dynamic linking in situations
where we know it does not change semantics, but asan
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107769
Jan Hubicka changed:
What|Removed |Added
Assignee|unassigned at gcc dot gnu.org |hubicka at gcc dot
gnu.org
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
jh@alberti:~/tsvc/bin> cat tt5.c
#include
typedef double real_t;
#define iterations 10
#define LEN_1D 32000
#define LEN_2D
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107715
Jan Hubicka changed:
What|Removed |Added
Summary|TSVC s161 for double runs |TSVC s161 and s277 for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99411
--- Comment #7 from Jan Hubicka ---
With znver4 current trunk and clang15 I still see this problem (clang code is
about 60% faster) for s311, s312 and s3111.
Curious s3 and s3110 no longer shows a regression.
-end
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
This is a stupid benchmark but still...
jh@alberti:~/tsvc/bin> more tt2.c
typedef double real_t;
#define iterations 10
#define LEN_1D 32000
#define LEN_2D 256
rea
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99408
--- Comment #2 from Jan Hubicka ---
This also reproduces with zen4 and double.
jh@alberti:~/tsvc/bin> cat tt.c
typedef double real_t;
#define iterations 10
#define LEN_1D 32000
#define LEN_2D 256
real_t
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
jh@alberti:~/tsvc/bin> more test.c
typedef double real_t;
#define iterations 10
#define LEN_1D 32
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101839
--- Comment #11 from Jan Hubicka ---
Fixed on mainline with r:0f2c7ccd14a29a8af8318f50b8296098fb0ab218
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101839
--- Comment #10 from Jan Hubicka ---
Created attachment 53430
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53430=edit
Patch I am testing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101839
--- Comment #9 from Jan Hubicka ---
Thanks for looking into this.
What happens here is that we start working from a call where we know that
outer_type is BA. We correctly find the BA::type and notice that it is final
and thus we do not need
201 - 300 of 3454 matches
Mail list logo