: nbenoit at tuxfamily dot org
GCC build triplet: x86_64-unknown-linux-gnu
GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: x86_64-unknown-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44303
--- Comment #17 from nbenoit at tuxfamily dot org 2009-12-17 09:32 ---
(In reply to comment #16)
Created an attachment (id=19332)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19332action=view) [edit]
Real fix
Now, before I blow it again, would you be so kind to test
--- Comment #18 from nbenoit at tuxfamily dot org 2009-12-17 09:34 ---
(In reply to comment #17)
(In reply to comment #16)
Created an attachment (id=19332)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19332action=view) [edit]
Real fix
Now, before I blow it again, would
--- Comment #8 from nbenoit at tuxfamily dot org 2009-12-16 10:34 ---
I am confused, a performance regression is still noticeable:
* Intel Xeon E5320 (x86_64 arch but gcc machine is i686-pc-linux-gnu), with -O1
flag
GCC-4.4.2 7364 ms
GCC-trunk-r155286 9515 ms
* Intel Xeon
--- Comment #9 from nbenoit at tuxfamily dot org 2009-12-16 11:06 ---
Here is a unified diff which focuses on the inner-loop exit conditions.
--- 442/convol.s
+++ r155286/convol.s
.L3:
movl(%edx), %ebx
- imull (%esi,%eax,4), %ebx
+ imull H(,%eax,4), %ebx
--- Comment #11 from nbenoit at tuxfamily dot org 2009-12-16 12:53 ---
The fastest is the variant with more jumps (442/convol.s in the diff) generated
by GCC-4.4.2.
In the one jump variant (r155286/convol.s in the diff), I guess it is the
computing of both conditions before jumping
--- Comment #12 from nbenoit at tuxfamily dot org 2009-12-16 12:55 ---
Created an attachment (id=19321)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19321action=view)
Diff of the RTL expand dump between revisions 151079 and 151080
--
http://gcc.gnu.org/bugzilla/show_bug.cgi
--- Comment #4 from nbenoit at tuxfamily dot org 2009-12-01 10:11 ---
It seems that this regression first appeared with revision 151080
* with -O1
GCC-4.4.2 7.4 s
GCC-trunk-r151078 7.4 s
GCC-trunk-r151079 7.4 s
GCC-trunk-r151080 9.4 s
GCC-trunk-r151081 9.4 s
GCC-trunk
--- Comment #3 from nbenoit at tuxfamily dot org 2009-11-26 15:08 ---
Using integer instead of double, the performance difference is even more
noticeable :
* with -O1
GCC 4.4.2 7475 ms
GCC-trunk-r154672 9390 ms
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: nbenoit at tuxfamily dot org
GCC build triplet: i686-pc-linux-gnu
GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42027
--- Comment #1 from nbenoit at tuxfamily dot org 2009-11-13 09:51 ---
Created an attachment (id=19010)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19010action=view)
Source file with a convolution loop pattern.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42027
11 matches
Mail list logo