> /usr/local/gcc44/bin/gcc -v
[..]
gcc version 4.4.0 20080503 (experimental) (GCC)
> gcc -O3 -mfpmath=sse -fno-pic -fno-tree-vectorize -S himenoBMTxps.c

With -O2/-O3, the inner loop in jacobi() in this program ends containing a lot
of this:
        movss   _p-4(%edi,%edx,4), %xmm0
        movl    -96(%ebp), %edi
        subss   _p-4(%edi,%edx,4), %xmm0
        movl    -108(%ebp), %edi
        subss   _p-4(%edi,%edx,4), %xmm0
        movl    -92(%ebp), %edi
        addss   _p-4(%edi,%edx,4), %xmm0
        movl    -124(%ebp), %edi

At -O1 or -Os, it instead produces:
        movss   34056(%eax), %xmm0
        subss   33024(%eax), %xmm0
        subss   -33024(%eax), %xmm0
        addss   -34056(%eax), %xmm0

which is much better. On core 2 it claims to be 40% faster at -Os.

IIRC this isn't a problem on x86-64, but IRA+-O3 was much worse again.


-- 
           Summary: bad choice of loop IVs above -Os on x86
           Product: gcc
           Version: 4.4.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: astrange at ithinksw dot com
GCC target triplet: i?86-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36127

Reply via email to