I'm seeing a significant runtime performance regression (>15%) with snapshots
following gcc-4.0-20041205; as far as i can see there's some issues when the
register pressure builds up: in later versions the fpu gets involved when former
version didn't.

The >15% figure comes from larger application (a raytracer), branch predictions
also changed (but i've fixed that) so i'm reasonably sure the problem is what's
demonstrated in the attached testcase.

Switches: -march=k8 -mfpmath=sse -O3 -ffast-math -fomit-frame-pointer

with gcc-4.0-20041205:
[snip]
  4010f4:       movss  (%ecx,%esi,4),%xmm0
  4010f9:       movss  (%eax,%ebx,4),%xmm5
  4010fe:       movss  (%eax,%esi,4),%xmm7
  401103:       mulss  %xmm5,%xmm1
  401107:       movss  (%ecx,%ebx,4),%xmm4
  40110c:       movss  %xmm0,(%esp)
  401111:       mulss  %xmm4,%xmm2
  401115:       movaps %xmm3,%xmm0
  401118:       subss  (%ecx,%edx,4),%xmm6
  40111d:       addss  (%eax,%edx,4),%xmm1
  401122:       mulss  (%esp),%xmm3
  401127:       mulss  %xmm7,%xmm0
  40112b:       subss  %xmm2,%xmm6
  40112f:       xorps  %xmm2,%xmm2
  401132:       addss  %xmm0,%xmm1
  401136:       subss  %xmm3,%xmm6
  40113a:       divss  %xmm1,%xmm6
  40113e:       mulss  %xmm6,%xmm7
  401142:       comiss 0x0(%ebp),%xmm6
  401146:       mulss  %xmm6,%xmm5
  40114a:       addss  (%esp),%xmm7

with gcc-4.0-20050102:
[snip]
  4010ff:       movss  (%ecx,%esi,4),%xmm0
  401104:       movss  (%eax,%ebx,4),%xmm5
  401109:       movss  (%eax,%esi,4),%xmm7
  40110e:       mulss  %xmm5,%xmm1
  401112:       movss  (%ecx,%ebx,4),%xmm4
  401117:       movss  %xmm0,0x4(%esp)
  40111d:       mulss  %xmm4,%xmm2
  401121:       movaps %xmm3,%xmm0
  401124:       flds   (%ecx,%edx,4)
  401127:       addss  (%eax,%edx,4),%xmm1
  40112c:       mulss  0x4(%esp),%xmm3
  401132:       fsubrs 0xc(%edi)
  401135:       mulss  %xmm7,%xmm0
  401139:       addss  %xmm0,%xmm1
  40113d:       fstps  (%esp)
  401140:       movss  (%esp),%xmm6
  401145:       subss  %xmm2,%xmm6
  401149:       xorps  %xmm2,%xmm2
  40114c:       subss  %xmm3,%xmm6
  401150:       divss  %xmm1,%xmm6
  401154:       mulss  %xmm6,%xmm7
  401158:       comiss 0x0(%ebp),%xmm6
  40115c:       mulss  %xmm6,%xmm5
  401160:       addss  0x4(%esp),%xmm7

-- 
           Summary: runtime performance regression in floating point heavy
                    code, x86/SSE
           Product: gcc
           Version: 4.0.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: rtl-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: tbptbp at gmail dot com
                CC: gcc-bugs at gcc dot gnu dot org
  GCC host triplet: cygwin


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19240

Reply via email to