--- Comment #12 from ubizjak at gmail dot com 2009-10-28 10:36 ---
The patch fixed the regression, see test_fpu chart [1] between
2009-10-27 and 2009-10-28.
[1] http://gcc.opensuse.org/c++bench/polyhedron/polyhedron-summary.txt-2-0.html
--
ubizjak at gmail dot com changed:
--- Comment #11 from ubizjak at gmail dot com 2009-10-28 10:33 ---
Author: revitale
Date: Tue Oct 27 11:46:07 2009
New Revision: 153590
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=153590
Log:
Fix PR40648 -- Fix misaligned store vectorizer patch
Modified:
trunk/gcc/ChangeL
--- Comment #10 from eres at il dot ibm dot com 2009-10-25 12:41 ---
(In reply to comment #0)
> Hello!
> The "[patch, vectorizer] misaligned store support" patch [1] resulted in more
> than 10% longer execution time for Polyhedron test_fpu test on Core2.
> The test is compiled with "-mar
--- Comment #9 from eres at il dot ibm dot com 2009-07-09 07:32 ---
> Not using unaligned stores for this kind of data dependence or peeling
> for alignment will probably help here.
The decision of how to vectorized can be changed for x86 (or any other target).
Instead of first checking
--- Comment #8 from rguenth at gcc dot gnu dot org 2009-07-07 15:47 ---
The issue is likely the sequence
load upper half of cache line 1
load lower half of cache line 2
store upper half of cache line 1
store lower half of cache line 2 <---
load upper half of cache line 2
--- Comment #7 from eres at il dot ibm dot com 2009-07-05 08:12 ---
Testing test_fpu on Power7 with the power7 branch shows no significant
difference between the version compiled with the misaligned store support patch
and without it. (using -mcpu=power7 -ffast-math -funroll-loops -O3)
T
--- Comment #6 from dominiq at lps dot ens dot fr 2009-07-04 14:02 ---
I have seen this problem also. From a crude profiling, it seems that the slow
routines are dgemm as pointed in comment #2 and gauss. This is a regression
with respect to 4.4.0 and it has started between June 5 and 6.
--- Comment #5 from ubizjak at gmail dot com 2009-07-04 13:40 ---
(In reply to comment #4)
> and in regressed case:
... in NON-regressed case. The regressed code is the first dump.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40648
--- Comment #4 from ubizjak at gmail dot com 2009-07-04 12:43 ---
(In reply to comment #1)
> Can you check numbers with vectorization disabled? I see the regression as
> well on a AMD Fam 10 machine which supposedly has unaligned moves as fast
> as aligned moves (if the data turns out t
--- Comment #3 from rguenth at gcc dot gnu dot org 2009-07-04 12:36 ---
Tuned for Core2 I get for the innermost loop
.L19:
leal(%eax,%ebx), %edx
movsd (%eax,%ecx), %xmm1
movsd (%edx), %xmm7
movhpd 8(%eax,%ecx), %xmm1
movhpd 8(%edx), %xmm
--- Comment #2 from rguenth at gcc dot gnu dot org 2009-07-04 12:33 ---
One loop is
139 0.0046 : DO l = 1 , K
622 0.0208 : IF ( B(l,j)/=ZERO ) THEN
: temp = Alpha*B(l,j)
21380 0.7146 : DO i =
--- Comment #1 from rguenth at gcc dot gnu dot org 2009-07-04 12:05 ---
Can you check numbers with vectorization disabled? I see the regression as
well on a AMD Fam 10 machine which supposedly has unaligned moves as fast
as aligned moves (if the data turns out to be aligned). Which mea
12 matches
Mail list logo