[Bug rtl-optimization/31704] New: x86_64 poor floating point register allocation across function call

ian at airs dot com Wed, 25 Apr 2007 08:09:19 -0700

When I compile this test case with -O2 for x86_64:

extern void g (void);
float
f (float sum, float mult, int *pi)
{
  int i, j;
  for (i = 0; i < 10; ++i)
    {
      g ();
      for (j = 0; j < 1000; ++j)
        sum += *pi++ * mult;
    }
  return sum;
}


I get this result:

f:
.LFB2:
        pushq   %rbp
.LCFI0:
        movaps  %xmm0, %xmm2
        xorl    %ebp, %ebp
        pushq   %rbx
.LCFI1:
        movq    %rdi, %rbx
        subq    $40, %rsp
.LCFI2:
        movss   %xmm1, 28(%rsp)
.L2:
        movss   %xmm2, (%rsp)
        call    g
        cvtsi2ss        (%rbx), %xmm0
        leaq    4(%rbx), %rax
        movl    $1, %edx
        movss   (%rsp), %xmm2
        mulss   28(%rsp), %xmm0
        addss   %xmm0, %xmm2
        .p2align 4,,7
.L3:
        cvtsi2ss        (%rax), %xmm1
        addl    $1, %edx
        addq    $4, %rax
        cmpl    $1000, %edx
        mulss   28(%rsp), %xmm1
        addss   %xmm1, %xmm2
        jne     .L3
        addl    $1, %ebp
        addq    $4000, %rbx
        cmpl    $10, %ebp
        jne     .L2
        addq    $40, %rsp
        movaps  %xmm2, %xmm0
        popq    %rbx
        popq    %rbp
        ret

In the original code, the inner loop is performance critical.  Note that this
compiles into a mulss loading a value from memory.  It would be more efficient
to have the value in a register during the inner loop.  In fact the value was
in a register, but we stored it in the stack because it crossed the function
call, and we load it from the stack once for each inner loop iteration rather
than once for each outer loop iteration.

I don't see a simple approach to fixing this.  Some sort of live range
splitting might work.


-- 
           Summary: x86_64 poor floating point register allocation across
                    function call
           Product: gcc
           Version: 4.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: ian at airs dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31704

[Bug rtl-optimization/31704] New: x86_64 poor floating point register allocation across function call

Reply via email to