[Bug target/40648] misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu

2009-10-28 Thread ubizjak at gmail dot com
--- Comment #12 from ubizjak at gmail dot com 2009-10-28 10:36 --- The patch fixed the regression, see test_fpu chart [1] between 2009-10-27 and 2009-10-28. [1] http://gcc.opensuse.org/c++bench/polyhedron/polyhedron-summary.txt-2-0.html -- ubizjak at gmail dot com changed:

[Bug target/40648] misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu

2009-10-28 Thread ubizjak at gmail dot com
--- Comment #11 from ubizjak at gmail dot com 2009-10-28 10:33 --- Author: revitale Date: Tue Oct 27 11:46:07 2009 New Revision: 153590 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=153590 Log: Fix PR40648 -- Fix misaligned store vectorizer patch Modified: trunk/gcc/ChangeL

[Bug target/40648] misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu

2009-10-25 Thread eres at il dot ibm dot com
--- Comment #10 from eres at il dot ibm dot com 2009-10-25 12:41 --- (In reply to comment #0) > Hello! > The "[patch, vectorizer] misaligned store support" patch [1] resulted in more > than 10% longer execution time for Polyhedron test_fpu test on Core2. > The test is compiled with "-mar

[Bug target/40648] misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu

2009-07-09 Thread eres at il dot ibm dot com
--- Comment #9 from eres at il dot ibm dot com 2009-07-09 07:32 --- > Not using unaligned stores for this kind of data dependence or peeling > for alignment will probably help here. The decision of how to vectorized can be changed for x86 (or any other target). Instead of first checking

[Bug target/40648] misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu

2009-07-07 Thread rguenth at gcc dot gnu dot org
--- Comment #8 from rguenth at gcc dot gnu dot org 2009-07-07 15:47 --- The issue is likely the sequence load upper half of cache line 1 load lower half of cache line 2 store upper half of cache line 1 store lower half of cache line 2 <--- load upper half of cache line 2

[Bug target/40648] misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu

2009-07-05 Thread eres at il dot ibm dot com
--- Comment #7 from eres at il dot ibm dot com 2009-07-05 08:12 --- Testing test_fpu on Power7 with the power7 branch shows no significant difference between the version compiled with the misaligned store support patch and without it. (using -mcpu=power7 -ffast-math -funroll-loops -O3) T

[Bug target/40648] misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu

2009-07-04 Thread dominiq at lps dot ens dot fr
--- Comment #6 from dominiq at lps dot ens dot fr 2009-07-04 14:02 --- I have seen this problem also. From a crude profiling, it seems that the slow routines are dgemm as pointed in comment #2 and gauss. This is a regression with respect to 4.4.0 and it has started between June 5 and 6.

[Bug target/40648] misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu

2009-07-04 Thread ubizjak at gmail dot com
--- Comment #5 from ubizjak at gmail dot com 2009-07-04 13:40 --- (In reply to comment #4) > and in regressed case: ... in NON-regressed case. The regressed code is the first dump. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40648

[Bug target/40648] misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu

2009-07-04 Thread ubizjak at gmail dot com
--- Comment #4 from ubizjak at gmail dot com 2009-07-04 12:43 --- (In reply to comment #1) > Can you check numbers with vectorization disabled? I see the regression as > well on a AMD Fam 10 machine which supposedly has unaligned moves as fast > as aligned moves (if the data turns out t

[Bug target/40648] misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu

2009-07-04 Thread rguenth at gcc dot gnu dot org
--- Comment #3 from rguenth at gcc dot gnu dot org 2009-07-04 12:36 --- Tuned for Core2 I get for the innermost loop .L19: leal(%eax,%ebx), %edx movsd (%eax,%ecx), %xmm1 movsd (%edx), %xmm7 movhpd 8(%eax,%ecx), %xmm1 movhpd 8(%edx), %xmm

[Bug target/40648] misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu

2009-07-04 Thread rguenth at gcc dot gnu dot org
--- Comment #2 from rguenth at gcc dot gnu dot org 2009-07-04 12:33 --- One loop is 139 0.0046 : DO l = 1 , K 622 0.0208 : IF ( B(l,j)/=ZERO ) THEN : temp = Alpha*B(l,j) 21380 0.7146 : DO i =

[Bug target/40648] misaligned store vectorizer patch introduced 10% runtime regression on Polyhedron test_fpu

2009-07-04 Thread rguenth at gcc dot gnu dot org
--- Comment #1 from rguenth at gcc dot gnu dot org 2009-07-04 12:05 --- Can you check numbers with vectorization disabled? I see the regression as well on a AMD Fam 10 machine which supposedly has unaligned moves as fast as aligned moves (if the data turns out to be aligned). Which mea