[Bug rtl-optimization/55342] [4.8/4.9 Regression] [LRA,x86] Non-optimal code for simple loop with LRA

2013-09-13 Thread ysrumyan at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342 --- Comment #10 from Yuri Rumyantsev ysrumyan at gmail dot com --- After fix rev. 202468 assembly looks slightly better but we met with another RA inefficiency which can be illustrated on the attached (t1.c) test compiled with options -march=atom

[Bug rtl-optimization/55342] [4.8/4.9 Regression] [LRA,x86] Non-optimal code for simple loop with LRA

2013-09-13 Thread ysrumyan at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342 --- Comment #11 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 30816 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30816action=edit test-case to reproduce t1.c must be compiled on x86 with options: -O2 -march=atom

[Bug tree-optimization/58444] New: [4.9 regression] Runfail on spec2006/434.zeusmp after r202516.

2013-09-17 Thread ysrumyan at gmail dot com
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com We found out that phase loop distribution is responsible for it, namely wrong cfg is generated (after ldist) for pdv.f if it was compiled with options

[Bug tree-optimization/58459] New: [4.9 regression] Loop invariant is not hoisted out of loop after r202525.

2013-09-18 Thread ysrumyan at gmail dot com
: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com We noticed significant performance regression on important bench from eembc2.0 suite which can be exhibit with attached test-case. Assembly

[Bug tree-optimization/58459] [4.9 regression] Loop invariant is not hoisted out of loop after r202525.

2013-09-18 Thread ysrumyan at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58459 --- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 30850 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30850action=edit test-case to reproduce Test must be compiled on x86 with options -Ofast -m332 -march

[Bug tree-optimization/61822] gcc.dg/vect/vect-cond-reduc-1.c FAILs

2014-07-17 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61822 --- Comment #2 from Yuri Rumyantsev ysrumyan at gmail dot com --- It looks like /* { dg-require-effective-target vect_condition } */ directive was missed in vect-cond-reduc-1.c test. I will fix it asap.

[Bug tree-optimization/61822] gcc.dg/vect/vect-cond-reduc-1.c FAILs

2014-07-22 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61822 --- Comment #3 from Yuri Rumyantsev ysrumyan at gmail dot com --- Hi Rainer, Could you try attached patch to check if it helps (test should not be run for sparc). Thanks ahead. Yuri.. 2014-07-16 19:20 GMT+04:00 ro at gcc dot gnu.org gcc

[Bug rtl-optimization/61672] Less redundant instructions deleted by pre_delete after r208113.

2014-08-04 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61672 --- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com --- Any comments will be appreciated.

[Bug rtl-optimization/61672] Less redundant instructions deleted by pre_delete after r208113.

2014-08-04 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61672 --- Comment #3 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 33235 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33235action=edit file to reproduce Need to be compiled with -m32 -O3 -Wframe-larger-than=1728 -std=gnu

[Bug rtl-optimization/61672] Less redundant instructions deleted by pre_delete after r208113.

2014-08-04 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61672 --- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com --- Richard, I put into attachment original file. For compiler built 20140208 and 20140730 I've got: grep -c redundant test.cc.179r.pre (20140208) 3825 grep -c redundant test

[Bug rtl-optimization/61672] Less redundant instructions deleted by pre_delete after r208113.

2014-08-04 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61672 --- Comment #5 from Yuri Rumyantsev ysrumyan at gmail dot com --- Richard, I put the original file into 61672 attachment and add comments for reproducing. 2014-08-04 15:16 GMT+04:00 rguenth at gcc dot gnu.org gcc-bugzi...@gcc.gnu.org: https

[Bug rtl-optimization/61672] [4.9/4.10 Regression] Less redundant instructions deleted by pre_delete after r208113.

2014-08-04 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61672 --- Comment #7 from Yuri Rumyantsev ysrumyan at gmail dot com --- It really fixes the issue. Thanks.

[Bug tree-optimization/62012] New: Loop is not vectorized after function inlining (SCEV)

2014-08-04 Thread ysrumyan at gmail dot com
Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com We noticed that for one important benchmark using '-lto' options leads to performance degradation which is caused by not-vectorizing the hottest loop after function inlining. I

[Bug tree-optimization/62012] Loop is not vectorized after function inlining (SCEV)

2014-08-04 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62012 --- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 33241 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33241action=edit test-case to reproduce Options to compile are: -Ofast -m64 -march=core-avx2 -fopenmp

[Bug tree-optimization/62021] New: ICE in verify_gimple_assign_single

2014-08-05 Thread ysrumyan at gmail dot com
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com For attached simple test-case if we omit 'uniform' specification compiler produces ICE: error: incorrect type of vector CONSTRUCTOR elements Note that for stmt _38 = {vect_cst_.62_39, vect_cst_

[Bug tree-optimization/62021] ICE in verify_gimple_assign_single

2014-08-05 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62021 --- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 33247 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33247action=edit test-case to reprroduce Test should be compiled with -O2 -fopenmp -march=core-avx2

[Bug tree-optimization/61743] [4.10 Regression] Complete unroll is not happened for loops with short upper bound

2014-08-07 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61743 --- Comment #3 from Yuri Rumyantsev ysrumyan at gmail dot com --- Any comments will be appreciated.

[Bug target/62011] False Data Dependency in popcnt instruction

2014-08-11 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62011 Yuri Rumyantsev ysrumyan at gmail dot com changed: What|Removed |Added CC||ysrumyan

[Bug tree-optimization/61743] [4.10 Regression] Complete unroll is not happened for loops with short upper bound

2014-08-11 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61743 --- Comment #8 from Yuri Rumyantsev ysrumyan at gmail dot com --- Richard, I tested both proposed fixes and i turned out that the first one is preferable since performance of benchmark came back. Note that hoisting 2nd vrp pass gave us another

[Bug target/62011] False Data Dependency in popcnt instruction

2014-08-12 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62011 --- Comment #7 from Yuri Rumyantsev ysrumyan at gmail dot com --- Please ignore my previous comment - if we insert nullifying of destination register before each popcnt (and lzcnt) performance will restore: original test results: unsigned

[Bug target/62011] False Data Dependency in popcnt instruction

2014-08-12 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62011 --- Comment #9 from Yuri Rumyantsev ysrumyan at gmail dot com --- This is not u32 version but u64. The first loop (u32) version looks like: .L23: leal1(%rdx), %ecx xorq%rax, %rax popcntq(%rbx,%rax,8), %rax leal

[Bug tree-optimization/61743] [5 Regression] Complete unroll is not happened for loops with short upper bound

2014-09-08 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61743 --- Comment #10 from Yuri Rumyantsev ysrumyan at gmail dot com --- Richard, Do you have any progress? Thanks. 2014-08-13 12:35 GMT+04:00 rguenth at gcc dot gnu.org gcc-bugzi...@gcc.gnu.org: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61743

[Bug tree-optimization/62012] Loop is not vectorized after function inlining (SCEV)

2014-09-08 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62012 --- Comment #2 from Yuri Rumyantsev ysrumyan at gmail dot com --- Any updates? Thanks.

[Bug tree-optimization/62012] Loop is not vectorized after function inlining (SCEV)

2014-09-09 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62012 --- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com --- I checked that our benchmark is successfully vectorized with function inlining. So this bug must be closed as fixed/resolved.

[Bug tree-optimization/62012] Loop is not vectorized after function inlining (SCEV)

2014-09-09 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62012 --- Comment #5 from Yuri Rumyantsev ysrumyan at gmail dot com --- You can close this bug as fixed/resolved (see my comment). Thanks. Yuri. 2014-09-08 15:29 GMT+04:00 rguenth at gcc dot gnu.org gcc-bugzi...@gcc.gnu.org: https://gcc.gnu.org

[Bug tree-optimization/60823] New: ICE in gimple_expand_cfg, at cfgexpand.c:5644

2014-04-11 Thread ysrumyan at gmail dot com
: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com We noticed that adding 'const' qualifier to function arguments marked with simd declare pragma leads to issue ICE on attached test-case. Test is compiled successfully if 'const

[Bug tree-optimization/60823] ICE in gimple_expand_cfg, at cfgexpand.c:5644

2014-04-11 Thread ysrumyan at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60823 --- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 32585 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32585action=edit C++ test-case to reproduce Need to be compiled with -O1 -m64 test.cpp -c -fopenmp

[Bug tree-optimization/60823] [4.9/4.10 Regression] ICE in gimple_expand_cfg, at cfgexpand.c:5644

2014-04-15 Thread ysrumyan at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60823 --- Comment #3 from Yuri Rumyantsev ysrumyan at gmail dot com --- I'd like to notice that this is test with using 'omp declare simd' pragma and issue is rather related to its support in gcc.

[Bug other/61391] [4.10 Regression] ICE in execute_one_pass at -O3 and above

2014-06-04 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61391 --- Comment #3 from Yuri Rumyantsev ysrumyan at gmail dot com --- A check that stmt-bb belongs to loop is missed in is_cond_scalar_reduction, if we add the following lines if (gimple_code (stmt) != GIMPLE_ASSIGN

[Bug tree-optimization/61518] [4.10 Regression] wrong code (by tree vectorizer) at -O3 on x86_64-linux-gnu

2014-06-16 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61518 Yuri Rumyantsev ysrumyan at gmail dot com changed: What|Removed |Added CC||ysrumyan

[Bug tree-optimization/61576] [4.10 Regression] wrong code at -O3 on x86_64-linux-gnu

2014-06-23 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61576 --- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com --- There is an issue with phi-node and reduction stmt - after r211302 new hammock was inserted between reduction stmt and bb containing phi: bb 6: d.6_12 = d_lsm.14_17 + 1

[Bug other/61391] [4.10 Regression] ICE in execute_one_pass at -O3 and above

2014-06-26 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61391 --- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com --- It turned out that wrong PR number was used in ChangeLog. In fact this bug was fixed: URL: http://gcc.gnu.org/viewcvs?rev=211263root=gccview=rev Log: gcc/ PR tree-optimization

[Bug rtl-optimization/61672] New: Less redundant instructions deleted by pre_delete after r208113.

2014-07-02 Thread ysrumyan at gmail dot com
Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com In real application which is compiled with restrictions on frame size after r208113 number of deleted redundant instruction decreased significantly

[Bug tree-optimization/61743] New: Complete unroll is not happened for loops with short upper bound

2014-07-08 Thread ysrumyan at gmail dot com
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com We discovered significant performance regression on one important benchmark from eembc2.0 suite after r211625. It turned out that complete unroll

[Bug tree-optimization/61743] Complete unroll is not happened for loops with short upper bound

2014-07-08 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61743 --- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 33088 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33088action=edit test-case to reproduce Use '-O3 -funroll-loops -Dbtype=[int,e_u8]' to reproduce.

[Bug tree-optimization/61742] [4.10 Regression] wrong code at -O3 on x86_64-linux-gnu

2014-07-08 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61742 --- Comment #2 from Yuri Rumyantsev ysrumyan at gmail dot com --- This is duplicate of PR 61576 and it should pass after r212347.

[Bug tree-optimization/61742] [4.10 Regression] wrong code at -O3 on x86_64-linux-gnu

2014-07-08 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61742 --- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com --- Ok. I will add it. 2014-07-08 14:45 GMT+04:00 jakub at gcc dot gnu.org gcc-bugzi...@gcc.gnu.org: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61742 --- Comment #3 from Jakub

[Bug tree-optimization/61743] [5 Regression] Complete unroll is not happened for loops with short upper bound

2014-10-24 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61743 --- Comment #12 from Yuri Rumyantsev ysrumyan at gmail dot com --- Richard, Did you have a chance to look at this and prepare more general fix? Thanks. Yuri. 2014-09-08 15:13 GMT+04:00 rguenther at suse dot de gcc-bugzi...@gcc.gnu.org: https

[Bug other/61391] [5 Regression] ICE in execute_one_pass at -O3 and above

2014-11-07 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61391 --- Comment #6 from Yuri Rumyantsev ysrumyan at gmail dot com --- Arseny, I am not able to close this bug but you can do it.

[Bug tree-optimization/63941] [5 Regression] ICE on valid code at -O3 and above on x86_64-linux-gnu

2014-11-28 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63941 --- Comment #2 from Yuri Rumyantsev ysrumyan at gmail dot com --- My patch is responsible for ICE - I did not assume that before if-convert phase cfg may contain redundant degenerative conditional branches: bb 4: ... _14 = d[pretmp_51

[Bug tree-optimization/63743] Thumb1: big regression for float operators by r216728

2014-12-26 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63743 Yuri Rumyantsev ysrumyan at gmail dot com changed: What|Removed |Added CC||ysrumyan

[Bug tree-optimization/64434] New: Performance regression after operand canonicalization (r216728).

2014-12-29 Thread ysrumyan at gmail dot com
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com We noticed huge regression on eembc1.1 and eembc2.0 for 32-bit target at x86. It can be reproduced on attached test-case: before this fix number

[Bug tree-optimization/64434] Performance regression after operand canonicalization (r216728).

2014-12-29 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64434 --- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 34345 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34345action=edit simple reproducer Need to compile with -m32 on x86 platform.

[Bug tree-optimization/64434] [5 Regression] Performance regression after operand cannibalization (r216728).

2014-12-29 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64434 --- Comment #3 from Yuri Rumyantsev ysrumyan at gmail dot com --- I put into attachment two assembly files for test-case compiled with -O2 -m32 -S options.

[Bug tree-optimization/64434] [5 Regression] Performance regression after operand cannibalization (r216728).

2014-12-29 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64434 --- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 34348 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34348action=edit assembly files for test.c Assembly file fro test.c

[Bug tree-optimization/64434] [5 Regression] Performance regression after operand cannibalization (r216728).

2014-12-29 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64434 --- Comment #5 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 34349 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34349action=edit assembly file before r216728 Assembly file.

[Bug tree-optimization/64434] [5 Regression] Performance regression after operand cannibalization (r216728).

2014-12-29 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64434 --- Comment #6 from Yuri Rumyantsev ysrumyan at gmail dot com --- H.J. I put before/after assembly files into bug attachment. We saw slowdown on SLM and HSW for 32-bit on eembc2.0, e.g. des degradated on 36% (SLM) and 7%(HSW). But we did not see

[Bug tree-optimization/64434] [5 Regression] Performance regression after operand cannibalization (r216728).

2014-12-29 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64434 --- Comment #8 from Yuri Rumyantsev ysrumyan at gmail dot com --- The issue is caused by operand canonicalization, i.e. there is special operand odering for commutative operations to have the same representation for a + b and b

[Bug tree-optimization/64434] [5 Regression] Performance regression after operand canonicalization (r216728).

2014-12-30 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64434 --- Comment #11 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 34363 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34363action=edit patch to fix issue This patch fixed almost all issues related to operand

[Bug rtl-optimization/65078] [5.0 Regression] 4.9 and 5.0 generate more spill-fill in comparison with 4.8.2

2015-02-16 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65078 --- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 34782 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34782action=edit test-case to reproduce Options -m32 -msse2 -O3 must be used.

[Bug rtl-optimization/65078] New: [5.0 Regression] 4.9 and 5.0 generate more spill-fill in comparison with 4.8.2

2015-02-16 Thread ysrumyan at gmail dot com
: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Using attached simple test-case extracted from codec we found out that 4.8.2 compiler generates more compact binaries in comparison

[Bug rtl-optimization/65135] New: Performance regression in pic mode after r220674.

2015-02-20 Thread ysrumyan at gmail dot com
: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com We noticed 10% regression on one important benchmark using for testing x86 32-bit platforms. This regression can be reproduced on attached test-case: one more fill is present

[Bug rtl-optimization/65135] Performance regression in pic mode after r220674.

2015-02-20 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65135 --- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 34814 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34814action=edit test-case to reproduce Need to compile with -O2 -m32 -fPIE -pie options.

[Bug rtl-optimization/65135] [5 Regression] Performance regression in pic mode after r220674.

2015-02-20 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65135 --- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com --- This patch improves performance of almost all benchmarks in pic-mode for 32-bit target, but we have the only huge degradation on benchmark from eembc1.1 suite. I mentioned

[Bug tree-optimization/64746] Loop with nested load/stores is not vectorized using aggressive if-conversion.

2015-01-23 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64746 --- Comment #2 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 34551 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34551action=edit proposed patch Patch to cure vectorization issue.

[Bug middle-end/64809] [5 Regression] ICE at -O3 with -g enabled on x86_64-linux-gnu (in 32-bit mode)

2015-01-27 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64809 Yuri Rumyantsev ysrumyan at gmail dot com changed: What|Removed |Added CC||ysrumyan

[Bug tree-optimization/64434] [5 Regression] Performance regression after operand canonicalization (r216728).

2015-02-09 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64434 --- Comment #19 from Yuri Rumyantsev ysrumyan at gmail dot com --- Andrew! Could you please try modified test-case (test1.c) which is attached. Thanks.

[Bug tree-optimization/64434] [5 Regression] Performance regression after operand canonicalization (r216728).

2015-02-09 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64434 --- Comment #20 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 34700 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34700action=edit another test-case

[Bug tree-optimization/65494] New: [5.0 Regression] Loop is not vectorized because of operand canonicalization.

2015-03-20 Thread ysrumyan at gmail dot com
: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com 5.0 compiler is not vectorized simple loop extracted from geekbench but 4.9 compiler does. This is caused by different operand ordering

[Bug tree-optimization/65494] [5.0 Regression] Loop is not vectorized because of operand canonicalization.

2015-03-20 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65494 --- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 35072 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35072action=edit test-case to reproduce The following options are used to reproduce: -Ofast -funroll

[Bug rtl-optimization/65651] Redundant cmp with zero instruction in loop for x86 target.

2015-04-01 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65651 --- Comment #2 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 35203 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35203action=edit test-case to reproduce

[Bug rtl-optimization/65651] Redundant cmp with zero instruction in loop for x86 target.

2015-04-01 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65651 --- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 35202 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35202action=edit test-case to reproduce Need to compile with -O2 flag only.

[Bug rtl-optimization/65651] New: Redundant cmp with zero instruction in loop for x86 target.

2015-04-01 Thread ysrumyan at gmail dot com
Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Compile attached bad.c with -O2 option only we can see that redundant cmp with zero instruction is generated: subl%r9d, %eax cmpl$0, %eax

[Bug rtl-optimization/65651] Redundant cmp with zero instruction in loop for x86 target.

2015-04-01 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65651 --- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com --- Jakub, Thanks for your comments. We will try to fix this issue ourselves. Best regards. Yuri. P.S. Note that icc does not produce such redundant cmp with zero. 2015-04-01 16

[Bug tree-optimization/65206] New: Vectorized version of loop is removed.

2015-02-25 Thread ysrumyan at gmail dot com
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com I noticed that vectorized version of loop is deleted although compiler reports that it was successfully vectorized: t1.c:7:3: note: LOOP VECTORIZED but after we can see in vect-dump: Removing

[Bug tree-optimization/65206] Vectorized version of loop is removed.

2015-02-25 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65206 --- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 34867 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34867action=edit test-case to reproduce Test needs to be compiled with -Ofast -m64 -mcore-avx2 options.

[Bug target/65161] ICE: in vec_haifa_insn_data, va_heap, vl_embed::operator[], at vec.h:736 with -O3 -fselective-scheduling2 -mtune=slm

2015-02-24 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65161 Yuri Rumyantsev ysrumyan at gmail dot com changed: What|Removed |Added CC||ysrumyan

[Bug target/65161] ICE: in vec_haifa_insn_data, va_heap, vl_embed::operator[], at vec.h:736 with -O3 -fselective-scheduling2 -mtune=slm

2015-02-24 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65161 --- Comment #3 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 34856 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34856action=edit possible patch Add check on selective scheduling to not perform instruction

[Bug tree-optimization/64746] New: Loop with nested load/stores is not vectorized using aggressive if-conversion.

2015-01-23 Thread ysrumyan at gmail dot com
: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Attached simple test-case extracted from important suite is not vectorized even if 'pragma omp simd' is used since

[Bug tree-optimization/64746] Loop with nested load/stores is not vectorized using aggressive if-conversion.

2015-01-23 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64746 --- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 34548 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34548action=edit test-case to reproduce. Need to compile this test on x86 with option -O3 -fopenmp

[Bug lto/65950] Loop is not vectorized with lto.

2015-04-30 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65950 --- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 35432 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35432action=edit test-case to reproduce Must be compiled with -Ofast and -fopenmp options.

[Bug lto/65950] New: Loop is not vectorized with lto.

2015-04-30 Thread ysrumyan at gmail dot com
Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Target Milestone: --- If we compile attached test-case without lto, e.g. using -Ofast and -fopenmp loop in foo is vectorized but if we add -flto option it won't be vectorized. The problem is 'exit' statement

[Bug lto/65950] Loop is not vectorized with lto.

2015-05-05 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65950 --- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com --- The function containing given loop is marked as: foo/24 (foo) @0x7f39f4b84620 Type: function definition analyzed Visibility: prevailing_def_ironly References: Referring

[Bug target/64691] Suboptimal register allocation for bytes comparison on i386

2015-05-12 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64691 --- Comment #2 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 35526 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35526action=edit tset-case to reproduce and assembly file.

[Bug target/64691] Suboptimal register allocation for bytes comparison on i386

2015-05-12 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64691 Yuri Rumyantsev ysrumyan at gmail dot com changed: What|Removed |Added CC||ysrumyan

[Bug tree-optimization/66142] New: Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2015-05-14 Thread ysrumyan at gmail dot com
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Target Milestone: --- The attached test-case compiled with -Ofast -fopenmp -march=core-avx2 options contains loop marked with pragma omp

[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2015-05-14 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 --- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 35541 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35541action=edit test-case to reproduce Must be compiled with -Ofast -fopenmp -march=core-avx2 options.

[Bug rtl-optimization/65698] Non-optimal code for simple compare function for x86 32-bit target

2015-04-08 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65698 --- Comment #2 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 35257 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35257action=edit assembly for test.c Additional option '-march=slm' was used for it but it is non

[Bug rtl-optimization/65698] New: Non-optimal code for simple compare function for x86 32-bit target

2015-04-08 Thread ysrumyan at gmail dot com
Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com For attached test-case in inner loop we can see the following deficiencies: 1. 2 redundant fills and one spill in comparison part of loop - I assume

[Bug rtl-optimization/65698] Non-optimal code for simple compare function for x86 32-bit target

2015-04-08 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65698 --- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 35256 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35256action=edit test-case to reproduce It needs to be compiled with -O3 -m32 options.

[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2015-05-26 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 --- Comment #13 from Yuri Rumyantsev ysrumyan at gmail dot com --- Original test-case is not vectorized yet with Richard patch for sccvn.

[Bug rtl-optimization/67206] New: Redundant spills in simple copy loop for 32-bit x86 target

2015-08-13 Thread ysrumyan at gmail dot com
Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Target Milestone: --- For attached simple test-case we can see strange spills to stack, namely for (i=0; in; i++) out[j * n + i] = in[j * n + i

[Bug rtl-optimization/67206] Redundant spills in simple copy loop for 32-bit x86 target

2015-08-13 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67206 --- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 36180 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=36180action=edit test-case to reproduce Must be compiled with -O3 -m32 -march=slm to reproduce.

[Bug target/56309] conditional moves instead of compare and branch result in almost 2x slower code

2015-08-06 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309 --- Comment #34 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 36138 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=36138action=edit simple reproducer Use -O3 -std=c++14 options to compile and -fno-tree-loop

[Bug target/56309] conditional moves instead of compare and branch result in almost 2x slower code

2015-08-06 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309 --- Comment #33 from Yuri Rumyantsev ysrumyan at gmail dot com --- With current compiler there is not performance difference for by-ref and by-val test-cases, but if we turn off if-convert transformation we will get ~2X speed-up: on Intel(R) Xeon

[Bug tree-optimization/66951] [6 Regression] ICE at -O3 on x86_64-linux-gnu, verify_ssa failed

2015-07-21 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66951 Yuri Rumyantsev ysrumyan at gmail dot com changed: What|Removed |Added CC||ysrumyan

[Bug tree-optimization/66926] [6 regression] FAIL: gfortran.dg/graphite/vect-pr40979.f90 -O (internal compiler error)

2015-07-21 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66926 --- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com --- I have a fix in my local area which cures ICE and perform outer-loop vectorization: vect-pr40979.f90:8:0: note: LOOP VECTORIZED vect-pr40979.f90:8:0: note: OUTER LOOP VECTORIZED

[Bug tree-optimization/66926] [6 regression] FAIL: gfortran.dg/graphite/vect-pr40979.f90 -O (internal compiler error)

2015-07-20 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66926 --- Comment #2 from Yuri Rumyantsev ysrumyan at gmail dot com --- Could somebody provides me with an instruction how to build trunk (fresh) compiler with graphite? Thanks.

[Bug middle-end/67438] [6 Regression] ~X op ~Y pattern relocation causes loop performance degradation on 32bit x86

2015-11-17 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67438 --- Comment #9 from Yuri Rumyantsev --- It looks like such transformation is profitable if only def statements have a single use, i.e. it looks reasonable for if (255 - a) > (255 -b) /* a,b have char type. */ but it does not look reasonable

[Bug tree-optimization/68021] [6 Regression] ice in rewrite_use_nonlinear_expr with -O3

2015-10-20 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68021 --- Comment #3 from Yuri Rumyantsev --- It looks like unswitching of outer loops pass simply triggers the issue and this tree-ssa-loop-ivopts issue.

[Bug tree-optimization/68021] [6 Regression] ice in rewrite_use_nonlinear_expr with -O3

2015-10-21 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68021 --- Comment #4 from Yuri Rumyantsev --- Indeed, there is an issue with outer-loop unswitching - it should not be performed for infinite loops. But if we slightly modify test if finite outer-loop we will get the same error: char a; void fn1(char

[Bug tree-optimization/68021] [6 Regression] ice in rewrite_use_nonlinear_expr with -O3

2015-10-21 Thread ysrumyan at gmail dot com
/bugzilla/show_bug.cgi?id=68021 > > H.J. Lu changed: > >What|Removed |Added > > CC| |ysrumyan at gmail dot com > >

[Bug tree-optimization/67947] [6 Regression] wrong code at -O3 on x86_64-linux-gnu

2015-10-13 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67947 --- Comment #2 from Yuri Rumyantsev --- revision 228760 must fix this bug.

[Bug tree-optimization/67909] [6 Regression] 416.gamess in SPEC CPU 2006 is miscompiled

2015-10-13 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67909 --- Comment #4 from Yuri Rumyantsev --- Created attachment 36498 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36498=edit proposed patch This patch cures run-time error for 416.gamess.

[Bug tree-optimization/67920] [6 Regression] wrong code with -O3

2015-10-13 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67920 --- Comment #8 from Yuri Rumyantsev --- Please check that revision 228760 will cure your issue.

[Bug tree-optimization/67909] [6 Regression] 416.gamess in SPEC CPU 2006 is miscompiled

2015-10-13 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67909 --- Comment #3 from Yuri Rumyantsev --- Check that guard edge is around the inner loop was missed. After adding it 416.gamess run successfully. I sent the fix for review.

[Bug lto/66752] spec2000 255.vortex performance compiled with GCC is ~20% lower than with CLANG

2015-07-10 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66752 Yuri Rumyantsev ysrumyan at gmail dot com changed: What|Removed |Added CC||ysrumyan

[Bug lto/66752] spec2000 255.vortex performance compiled with GCC is ~20% lower than with CLANG

2015-07-10 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66752 --- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 35947 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35947action=edit test-case to reproduce compile with -Ofast -m32 -march=slm and notice redundant test

[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2015-12-08 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 --- Comment #23 from Yuri Rumyantsev --- Richard, Do we have any chance to vectorize attached test-case using GCC6 compiler?

[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2015-12-08 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 --- Comment #24 from Yuri Rumyantsev --- Richard, Do we have any chance to vectorize attached test-case using GCC6 compiler?

<    1   2   3   4   >