--- Comment #24 from whaley at cs dot utsa dot edu 2006-06-27 16:44 ---
Guys,
OK, here is a table summarizing the performance you can see using the
mmbench4s.tar.gz. I believe this covers a strong majority of the x86
architectures in use today (there are some specialty processors such
--- Comment #23 from whaley at cs dot utsa dot edu 2006-06-27 14:20 ---
Uros,
OK, I made the stupid assumption that the P4 would behave like the P4e,
should've known better :)
I got access to a Pentium 4 (family=15, model=2), and indeed I can repeat the
several surprising things you re
--- Comment #22 from uros at kss-loka dot si 2006-06-27 05:49 ---
(In reply to comment #21)
> Note that you are running the opposite of my test case: SSE vs SSE rather than
> x87 vs x87. This whole bug report is about x87 performance. You can get more
> detail on why I want x87 in my
--- Comment #21 from whaley at cs dot utsa dot edu 2006-06-26 15:03 ---
Uros,
Thanks for the reply; I think some confusion has set in (see below) :)
>And the results are a bit suprising (this is the exact output of your test):
Note that you are running the opposite of my test case: SS
--- Comment #20 from uros at kss-loka dot si 2006-06-26 06:31 ---
(In reply to comment #15)
> Can someone tell me if anyone is looking into this problem with the hopes of
> fixing it? I just noticed that despite the posted code demonstrating the
> problem, and verification on: Pentium
--- Comment #19 from whaley at cs dot utsa dot edu 2006-06-26 00:55 ---
Thanks for the info. I'm sorry to hear that no performance regression tests
are done, but I guess it kind of explains why these problems reoccur :)
As to not unrolling, the fully unrolled case is almost always comm
--- Comment #18 from rguenth at gcc dot gnu dot org 2006-06-25 20:05
---
Unfortunately we don't have infrastructure for performance regression tests.
Btw. did you check what happens if you do not unroll the innermost loop
manually but let -funroll-loops do it? For me the performance i
--- Comment #17 from whaley at cs dot utsa dot edu 2006-06-25 13:17 ---
OK, thanks for the reply. I will assume gcc 4 won't be fixed in the near
future. My guess is this will make icc an easier compiler for users, which I
kind of hate, which is why I worked as much as I did on this rep
--- Comment #16 from rguenth at gcc dot gnu dot org 2006-06-24 19:00
---
Don't hold your breath.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827
--- Comment #15 from whaley at cs dot utsa dot edu 2006-06-24 18:10 ---
Hi,
Can someone tell me if anyone is looking into this problem with the hopes of
fixing it? I just noticed that despite the posted code demonstrating the
problem, and verification on: Pentium Pro, Pentium III, Pent
--- Comment #14 from whaley at cs dot utsa dot edu 2006-06-14 02:40 ---
OK, I got access to some older machines, and it appears that Core is the only
architecture that likes gcc 4's code. More precisely, I have confirmed that
the following architectures run significantly slower using gc
--- Comment #13 from whaley at cs dot utsa dot edu 2006-06-07 22:28 ---
Subject: Re: gcc 4 produces worse x87 code on all platforms than gcc 3
Guys,
Just got access to a CoreDuo machine, and tested things there. I had to
do some hand-translation of assemblies, as I didn't have access
--- Comment #12 from whaley at cs dot utsa dot edu 2006-06-01 18:43 ---
Subject: Re: gcc 4 produces worse x87 code on all platforms than gcc 3
Uros,
>gcc version 3.4.6
>vs.
>gcc version 4.2.0 20060601 (experimental)
>
>-fomit-frame-pointer -O -msse2 -mfpmath=sse
>There is a small per
--- Comment #11 from whaley at cs dot utsa dot edu 2006-06-01 16:26 ---
Subject: Re: gcc 4 produces worse x87 code on all platforms than gcc 3
Uros,
OK, I originally replied a couple of hours ago, but that is not appearing on
bugzilla for some reason, so I'll try again, this time CCin
--- Comment #10 from whaley at cs dot utsa dot edu 2006-06-01 16:02 ---
Created an attachment (id=11571)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11571&action=view)
Same benchmark, but with single precision timing included
Here's the same benchmark, but can time single as wel
--- Comment #9 from uros at kss-loka dot si 2006-06-01 08:43 ---
The benchmark run on a Pentium4 3.2G/800MHz FSB (32bit):
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Pentium(R) 4 CPU 3.20GHz
stepping: 9
cpu MHz : 319
--- Comment #8 from whaley at cs dot utsa dot edu 2006-05-31 14:12 ---
Subject: Re: gcc 4 produces worse x87 code on all platforms than gcc 3
Uros,
>IMO the fact that gcc 3.x beats 4.x on this code could be attributed to pure
>luck.
As far as understanding from first principles, per
--- Comment #7 from uros at kss-loka dot si 2006-05-31 10:56 ---
IMO the fact that gcc 3.x beats 4.x on this code could be attributed to pure
luck.
Looking into 3.x RTL, these things can be observed:
Instruction that multiplies pA0 and rB0 is described as:
__.20.combine:
(insn 75 73
--- Comment #6 from whaley at cs dot utsa dot edu 2006-05-31 01:09 ---
Subject: Re: gcc 4 produces worse x87 code on all platforms than gcc 3
Yes, I agree it is an x86/x86_64 issue. I have not yet scoped the performance
of any of the other architectures with gcc 4 vs. 3: since 90% of
--- Comment #5 from pinskia at gcc dot gnu dot org 2006-05-31 00:55 ---
(In reply to comment #4)
> and have uploaded it as an attachment. I am not sure what you mean by
> "fully a target issue". Perhaps I have submitted to the wrong area of
> gcc performance bug? Note that it is not l
--- Comment #4 from hiclint at gmail dot com 2006-05-31 00:50 ---
Subject: Re: gcc 4 produces worse x87 code on all platforms than gcc 3
Andrew,
Thanks for the reply. For the small case demonstrating the problem, I
included it in the original message:
http://www.cs.utsa.edu/~whale
--- Comment #3 from pinskia at gcc dot gnu dot org 2006-05-31 00:41 ---
This is fully a target issue.
--
pinskia at gcc dot gnu dot org changed:
What|Removed |Added
22 matches
Mail list logo