[Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization

2008-01-08 Thread jv244 at cam dot ac dot uk
--- Comment #5 from jv244 at cam dot ac dot uk 2008-01-08 09:52 --- updated the summary after the analysis in comment #4, and and CCed Dorit for the vectorization issue. -- jv244 at cam dot ac dot uk changed: What|Removed |Added ---

[Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization

2013-03-27 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079 Richard Biener changed: What|Removed |Added Status|NEW |RESOLVED Known to work|

[Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization

2012-07-18 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079 --- Comment #14 from Richard Guenther 2012-07-18 13:28:41 UTC --- Smart again - with stock trunk I get everything optimized away ;)

[Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization

2008-08-18 Thread rguenth at gcc dot gnu dot org
--- Comment #6 from rguenth at gcc dot gnu dot org 2008-08-18 15:20 --- The problem for the GCC vectorizer is that there are no loads or stores left in the loop and it doesn't handle vectorizing "registers" only. This is a case where real vectorization of straight-line code would be nec

[Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization

2008-08-18 Thread rguenth at gcc dot gnu dot org
--- Comment #7 from rguenth at gcc dot gnu dot org 2008-08-18 15:22 --- That is, GCCs inner loop is .L6: addl$1, %eax addsd %xmm12, %xmm11 cmpl$1, %eax addsd %xmm14, %xmm3 addsd %xmm15, %xmm2 addsd %xmm13, %xmm1

[Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization

2008-08-18 Thread jv244 at cam dot ac dot uk
--- Comment #8 from jv244 at cam dot ac dot uk 2008-08-19 05:43 --- (In reply to comment #7) > That is, GCCs inner loop is > > .L6: > addl$1, %eax > addsd %xmm12, %xmm11 > cmpl$1, %eax > addsd %xmm14, %xmm3 > addsd %xmm15, %x

[Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization

2008-08-18 Thread jv244 at cam dot ac dot uk
--- Comment #9 from jv244 at cam dot ac dot uk 2008-08-19 05:44 --- Created an attachment (id=16093) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16093&action=view) comment #0 source -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079

[Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization

2008-08-18 Thread jv244 at cam dot ac dot uk
--- Comment #10 from jv244 at cam dot ac dot uk 2008-08-19 05:45 --- Created an attachment (id=16094) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16094&action=view) comment #0 intel's assembly (ifort 9.1 at -O2 -xT) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079

[Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization

2008-08-18 Thread jv244 at cam dot ac dot uk
--- Comment #11 from jv244 at cam dot ac dot uk 2008-08-19 06:09 --- Created an attachment (id=16095) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16095&action=view) new testcase This (PR31079_11.f90) should be a replacement for comment #4, and illustrates the vectorizer issue.

[Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization

2008-08-18 Thread jv244 at cam dot ac dot uk
--- Comment #12 from jv244 at cam dot ac dot uk 2008-08-19 06:11 --- Created an attachment (id=16096) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16096&action=view) ifort's asm for PR31079_11.f90 at -O3 -xT -S -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079

[Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization

2008-08-19 Thread jv244 at cam dot ac dot uk
--- Comment #13 from jv244 at cam dot ac dot uk 2008-08-19 13:31 --- (In reply to comment #11) > This (PR31079_11.f90) should be a replacement for comment #4, and illustrates > the vectorizer issue. The patch Richard posted in PR37150 also improves this PR31079_11.f90 testcase a lot: