--- Comment #9 from rguenth at gcc dot gnu dot org 2007-10-28 16:38 ---
The main difference I see is that 4.2 avoids re-use of %eax as index register:
.L34:
movq%r11, %rdi
addq8(%r10), %rdi
movq8(%r10), %rsi
movq8(%r10), %rdx
movq
--- Comment #8 from lucier at math dot purdue dot edu 2007-10-28 16:08
---
Subject: Re: 33% performance slowdown from 4.2.2 to 4.3.0 in floating-point
code
On Oct 28, 2007, at 8:05 AM, rguenth at gcc dot gnu dot org wrote:
> --- Comment #2 from rguenth at gcc dot gnu dot org 20
--- Comment #7 from lucier at math dot purdue dot edu 2007-10-28 16:05
---
time with -O2 instead of -O1:
with 4.2.2:
(time (direct-fft-recursive-4 a table))
426 ms real time
426 ms cpu time (425 user, 1 system)
no collections
64 bytes allocated
no minor faults
--- Comment #6 from lucier at math dot purdue dot edu 2007-10-28 15:45
---
Created an attachment (id=14426)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14426&action=view)
assembly after replacing -O1 with -O2
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
--- Comment #5 from lucier at math dot purdue dot edu 2007-10-28 15:45
---
Created an attachment (id=14425)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14425&action=view)
assembly after replacing -O1 with -O2
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
--- Comment #4 from lucier at math dot purdue dot edu 2007-10-28 15:42
---
Created an attachment (id=14424)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14424&action=view)
assembly from 4.3.0
I had to remove the "static" from the declaration of direct-fft-recursive to
get assemb
--- Comment #3 from lucier at math dot purdue dot edu 2007-10-28 15:41
---
Created an attachment (id=14423)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14423&action=view)
Assembly from 4.2.2
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
--- Comment #2 from rguenth at gcc dot gnu dot org 2007-10-28 12:05 ---
Can you attach assembler files? What happens if you use -O2? Why do you need
-fno-strict-aliasing? Does -fno-ivopts help?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928