http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342
--- Comment #10 from Yuri Rumyantsev ysrumyan at gmail dot com ---
After fix rev. 202468 assembly looks slightly better but we met with another RA
inefficiency which can be illustrated on the attached (t1.c) test compiled with
options -march=atom
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342
--- Comment #11 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 30816
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30816action=edit
test-case to reproduce
t1.c must be compiled on x86 with options:
-O2 -march=atom
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
We found out that phase loop distribution is responsible for it, namely wrong
cfg is generated (after ldist) for pdv.f if it was compiled with options
: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
We noticed significant performance regression on important bench from eembc2.0
suite which can be exhibit with attached test-case.
Assembly
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58459
--- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 30850
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30850action=edit
test-case to reproduce
Test must be compiled on x86 with options -Ofast -m332 -march
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61822
--- Comment #2 from Yuri Rumyantsev ysrumyan at gmail dot com ---
It looks like
/* { dg-require-effective-target vect_condition } */
directive was missed in vect-cond-reduc-1.c test.
I will fix it asap.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61822
--- Comment #3 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Hi Rainer,
Could you try attached patch to check if it helps (test should not be
run for sparc).
Thanks ahead.
Yuri..
2014-07-16 19:20 GMT+04:00 ro at gcc dot gnu.org gcc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61672
--- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Any comments will be appreciated.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61672
--- Comment #3 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 33235
-- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33235action=edit
file to reproduce
Need to be compiled with
-m32 -O3 -Wframe-larger-than=1728 -std=gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61672
--- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Richard,
I put into attachment original file. For compiler built 20140208 and 20140730
I've got:
grep -c redundant test.cc.179r.pre (20140208)
3825
grep -c redundant test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61672
--- Comment #5 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Richard,
I put the original file into 61672 attachment and add comments for
reproducing.
2014-08-04 15:16 GMT+04:00 rguenth at gcc dot gnu.org
gcc-bugzi...@gcc.gnu.org:
https
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61672
--- Comment #7 from Yuri Rumyantsev ysrumyan at gmail dot com ---
It really fixes the issue. Thanks.
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
We noticed that for one important benchmark using '-lto' options leads to
performance degradation which is caused by not-vectorizing the hottest loop
after function inlining. I
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62012
--- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 33241
-- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33241action=edit
test-case to reproduce
Options to compile are:
-Ofast -m64 -march=core-avx2 -fopenmp
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
For attached simple test-case if we omit 'uniform' specification compiler
produces ICE:
error: incorrect type of vector CONSTRUCTOR elements
Note that for stmt
_38 = {vect_cst_.62_39, vect_cst_
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62021
--- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 33247
-- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33247action=edit
test-case to reprroduce
Test should be compiled with
-O2 -fopenmp -march=core-avx2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61743
--- Comment #3 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Any comments will be appreciated.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62011
Yuri Rumyantsev ysrumyan at gmail dot com changed:
What|Removed |Added
CC||ysrumyan
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61743
--- Comment #8 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Richard,
I tested both proposed fixes and i turned out that the first one is preferable
since performance of benchmark came back. Note that hoisting 2nd vrp pass gave
us another
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62011
--- Comment #7 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Please ignore my previous comment - if we insert nullifying of destination
register before each popcnt (and lzcnt) performance will restore:
original test results:
unsigned
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62011
--- Comment #9 from Yuri Rumyantsev ysrumyan at gmail dot com ---
This is not u32 version but u64. The first loop (u32) version looks like:
.L23:
leal1(%rdx), %ecx
xorq%rax, %rax
popcntq(%rbx,%rax,8), %rax
leal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61743
--- Comment #10 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Richard,
Do you have any progress?
Thanks.
2014-08-13 12:35 GMT+04:00 rguenth at gcc dot gnu.org
gcc-bugzi...@gcc.gnu.org:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61743
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62012
--- Comment #2 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Any updates?
Thanks.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62012
--- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com ---
I checked that our benchmark is successfully vectorized with function inlining.
So this bug must be closed as fixed/resolved.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62012
--- Comment #5 from Yuri Rumyantsev ysrumyan at gmail dot com ---
You can close this bug as fixed/resolved (see my comment).
Thanks.
Yuri.
2014-09-08 15:29 GMT+04:00 rguenth at gcc dot gnu.org
gcc-bugzi...@gcc.gnu.org:
https://gcc.gnu.org
: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
We noticed that adding 'const' qualifier to function arguments marked with simd
declare pragma leads to issue ICE on attached test-case. Test is compiled
successfully if 'const
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60823
--- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 32585
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32585action=edit
C++ test-case to reproduce
Need to be compiled with -O1 -m64 test.cpp -c -fopenmp
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60823
--- Comment #3 from Yuri Rumyantsev ysrumyan at gmail dot com ---
I'd like to notice that this is test with using 'omp declare simd' pragma and
issue is rather related to its support in gcc.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61391
--- Comment #3 from Yuri Rumyantsev ysrumyan at gmail dot com ---
A check that stmt-bb belongs to loop is missed in is_cond_scalar_reduction, if
we add the following lines
if (gimple_code (stmt) != GIMPLE_ASSIGN
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61518
Yuri Rumyantsev ysrumyan at gmail dot com changed:
What|Removed |Added
CC||ysrumyan
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61576
--- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com ---
There is an issue with phi-node and reduction stmt - after r211302 new hammock
was inserted between reduction stmt and bb containing phi:
bb 6:
d.6_12 = d_lsm.14_17 + 1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61391
--- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com ---
It turned out that wrong PR number was used in ChangeLog. In fact this bug was
fixed:
URL: http://gcc.gnu.org/viewcvs?rev=211263root=gccview=rev
Log:
gcc/
PR tree-optimization
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
In real application which is compiled with restrictions on frame size after
r208113 number of deleted redundant instruction decreased significantly
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
We discovered significant performance regression on one important benchmark
from eembc2.0 suite after r211625. It turned out that complete unroll
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61743
--- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 33088
-- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33088action=edit
test-case to reproduce
Use '-O3 -funroll-loops -Dbtype=[int,e_u8]' to reproduce.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61742
--- Comment #2 from Yuri Rumyantsev ysrumyan at gmail dot com ---
This is duplicate of PR 61576 and it should pass after r212347.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61742
--- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Ok. I will add it.
2014-07-08 14:45 GMT+04:00 jakub at gcc dot gnu.org gcc-bugzi...@gcc.gnu.org:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61742
--- Comment #3 from Jakub
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61743
--- Comment #12 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Richard,
Did you have a chance to look at this and prepare more general fix?
Thanks.
Yuri.
2014-09-08 15:13 GMT+04:00 rguenther at suse dot de gcc-bugzi...@gcc.gnu.org:
https
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61391
--- Comment #6 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Arseny, I am not able to close this bug but you can do it.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63941
--- Comment #2 from Yuri Rumyantsev ysrumyan at gmail dot com ---
My patch is responsible for ICE - I did not assume that before if-convert phase
cfg may contain redundant degenerative conditional branches:
bb 4:
...
_14 = d[pretmp_51
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63743
Yuri Rumyantsev ysrumyan at gmail dot com changed:
What|Removed |Added
CC||ysrumyan
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
We noticed huge regression on eembc1.1 and eembc2.0 for 32-bit target at x86.
It can be reproduced on attached test-case:
before this fix number
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64434
--- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 34345
-- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34345action=edit
simple reproducer
Need to compile with -m32 on x86 platform.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64434
--- Comment #3 from Yuri Rumyantsev ysrumyan at gmail dot com ---
I put into attachment two assembly files for test-case compiled with
-O2 -m32 -S options.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64434
--- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 34348
-- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34348action=edit
assembly files for test.c
Assembly file fro test.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64434
--- Comment #5 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 34349
-- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34349action=edit
assembly file before r216728
Assembly file.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64434
--- Comment #6 from Yuri Rumyantsev ysrumyan at gmail dot com ---
H.J.
I put before/after assembly files into bug attachment. We saw slowdown
on SLM and HSW for 32-bit on eembc2.0, e.g. des degradated on 36%
(SLM) and 7%(HSW). But we did not see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64434
--- Comment #8 from Yuri Rumyantsev ysrumyan at gmail dot com ---
The issue is caused by operand canonicalization, i.e. there is special
operand odering for commutative operations to have the same
representation for a + b and b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64434
--- Comment #11 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 34363
-- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34363action=edit
patch to fix issue
This patch fixed almost all issues related to operand
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65078
--- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 34782
-- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34782action=edit
test-case to reproduce
Options -m32 -msse2 -O3 must be used.
: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
Using attached simple test-case extracted from codec we found out that 4.8.2
compiler generates more compact binaries in comparison
: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
We noticed 10% regression on one important benchmark using for testing x86
32-bit platforms. This regression can be reproduced on attached test-case: one
more fill is present
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65135
--- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 34814
-- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34814action=edit
test-case to reproduce
Need to compile with -O2 -m32 -fPIE -pie options.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65135
--- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com ---
This patch improves performance of almost all benchmarks in pic-mode for 32-bit
target, but we have the only huge degradation on benchmark from eembc1.1 suite.
I mentioned
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64746
--- Comment #2 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 34551
-- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34551action=edit
proposed patch
Patch to cure vectorization issue.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64809
Yuri Rumyantsev ysrumyan at gmail dot com changed:
What|Removed |Added
CC||ysrumyan
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64434
--- Comment #19 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Andrew!
Could you please try modified test-case (test1.c) which is attached.
Thanks.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64434
--- Comment #20 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 34700
-- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34700action=edit
another test-case
: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
5.0 compiler is not vectorized simple loop extracted from geekbench but 4.9
compiler does. This is caused by different operand ordering
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65494
--- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 35072
-- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35072action=edit
test-case to reproduce
The following options are used to reproduce: -Ofast -funroll
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65651
--- Comment #2 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 35203
-- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35203action=edit
test-case to reproduce
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65651
--- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 35202
-- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35202action=edit
test-case to reproduce
Need to compile with -O2 flag only.
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
Compile attached bad.c with -O2 option only we can see that redundant cmp
with zero instruction is generated:
subl%r9d, %eax
cmpl$0, %eax
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65651
--- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Jakub,
Thanks for your comments.
We will try to fix this issue ourselves.
Best regards.
Yuri.
P.S. Note that icc does not produce such redundant cmp with zero.
2015-04-01 16
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
I noticed that vectorized version of loop is deleted although compiler reports
that it was successfully vectorized:
t1.c:7:3: note: LOOP VECTORIZED
but after we can see in vect-dump:
Removing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65206
--- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 34867
-- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34867action=edit
test-case to reproduce
Test needs to be compiled with -Ofast -m64 -mcore-avx2 options.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65161
Yuri Rumyantsev ysrumyan at gmail dot com changed:
What|Removed |Added
CC||ysrumyan
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65161
--- Comment #3 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 34856
-- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34856action=edit
possible patch
Add check on selective scheduling to not perform instruction
: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
Attached simple test-case extracted from important suite is not vectorized even
if 'pragma omp simd' is used since
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64746
--- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 34548
-- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34548action=edit
test-case to reproduce.
Need to compile this test on x86 with option
-O3 -fopenmp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65950
--- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 35432
-- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35432action=edit
test-case to reproduce
Must be compiled with -Ofast and -fopenmp options.
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
Target Milestone: ---
If we compile attached test-case without lto, e.g. using -Ofast and -fopenmp
loop in foo is vectorized but if we add -flto option it won't be vectorized.
The problem is 'exit' statement
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65950
--- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com ---
The function containing given loop is marked as:
foo/24 (foo) @0x7f39f4b84620
Type: function definition analyzed
Visibility: prevailing_def_ironly
References:
Referring
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64691
--- Comment #2 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 35526
-- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35526action=edit
tset-case to reproduce and assembly file.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64691
Yuri Rumyantsev ysrumyan at gmail dot com changed:
What|Removed |Added
CC||ysrumyan
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
Target Milestone: ---
The attached test-case compiled with -Ofast -fopenmp -march=core-avx2 options
contains loop marked with pragma omp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142
--- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 35541
-- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35541action=edit
test-case to reproduce
Must be compiled with -Ofast -fopenmp -march=core-avx2 options.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65698
--- Comment #2 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 35257
-- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35257action=edit
assembly for test.c
Additional option '-march=slm' was used for it but it is non
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
For attached test-case in inner loop we can see the following deficiencies:
1. 2 redundant fills and one spill in comparison part of loop - I assume
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65698
--- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 35256
-- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35256action=edit
test-case to reproduce
It needs to be compiled with -O3 -m32 options.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142
--- Comment #13 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Original test-case is not vectorized yet with Richard patch for sccvn.
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
Target Milestone: ---
For attached simple test-case we can see strange spills to stack, namely
for (i=0; in; i++)
out[j * n + i] = in[j * n + i
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67206
--- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 36180
-- https://gcc.gnu.org/bugzilla/attachment.cgi?id=36180action=edit
test-case to reproduce
Must be compiled with -O3 -m32 -march=slm to reproduce.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309
--- Comment #34 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 36138
-- https://gcc.gnu.org/bugzilla/attachment.cgi?id=36138action=edit
simple reproducer
Use -O3 -std=c++14 options to compile and -fno-tree-loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309
--- Comment #33 from Yuri Rumyantsev ysrumyan at gmail dot com ---
With current compiler there is not performance difference for by-ref and by-val
test-cases, but if we turn off if-convert transformation we will get ~2X
speed-up:
on Intel(R) Xeon
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66951
Yuri Rumyantsev ysrumyan at gmail dot com changed:
What|Removed |Added
CC||ysrumyan
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66926
--- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com ---
I have a fix in my local area which cures ICE and perform outer-loop
vectorization:
vect-pr40979.f90:8:0: note: LOOP VECTORIZED
vect-pr40979.f90:8:0: note: OUTER LOOP VECTORIZED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66926
--- Comment #2 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Could somebody provides me with an instruction how to build trunk (fresh)
compiler with graphite?
Thanks.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67438
--- Comment #9 from Yuri Rumyantsev ---
It looks like such transformation is profitable if only def statements have a
single use, i.e. it looks reasonable for
if (255 - a) > (255 -b) /* a,b have char type. */
but it does not look reasonable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68021
--- Comment #3 from Yuri Rumyantsev ---
It looks like unswitching of outer loops pass simply triggers the issue and
this tree-ssa-loop-ivopts issue.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68021
--- Comment #4 from Yuri Rumyantsev ---
Indeed, there is an issue with outer-loop unswitching - it should not be
performed for infinite loops. But if we slightly modify test if finite
outer-loop we will get the same error:
char a;
void fn1(char
/bugzilla/show_bug.cgi?id=68021
>
> H.J. Lu changed:
>
>What|Removed |Added
>
> CC| |ysrumyan at gmail dot com
>
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67947
--- Comment #2 from Yuri Rumyantsev ---
revision 228760 must fix this bug.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67909
--- Comment #4 from Yuri Rumyantsev ---
Created attachment 36498
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36498=edit
proposed patch
This patch cures run-time error for 416.gamess.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67920
--- Comment #8 from Yuri Rumyantsev ---
Please check that revision 228760 will cure your issue.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67909
--- Comment #3 from Yuri Rumyantsev ---
Check that guard edge is around the inner loop was missed. After adding it
416.gamess run successfully.
I sent the fix for review.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66752
Yuri Rumyantsev ysrumyan at gmail dot com changed:
What|Removed |Added
CC||ysrumyan
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66752
--- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 35947
-- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35947action=edit
test-case to reproduce
compile with -Ofast -m32 -march=slm and notice redundant test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142
--- Comment #23 from Yuri Rumyantsev ---
Richard,
Do we have any chance to vectorize attached test-case using GCC6 compiler?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142
--- Comment #24 from Yuri Rumyantsev ---
Richard,
Do we have any chance to vectorize attached test-case using GCC6 compiler?
101 - 200 of 309 matches
Mail list logo