http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49064

           Summary: [x86/x64]: broken alias analysis leads vectorizer to
                    emit poor code
           Product: gcc
           Version: 4.6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: piotr.wyder...@gmail.com


On an x86 capable of SSE2 or x64 (which has SSE2 by definition) GCC tries
to vectorize as much integer code as possible, but ends up witch code much
worse than without vectorization. The SSE2-based version unnecessarily
recomputes all the m_Data pointers, as demonstrated by the following C++
snippet. I guess the reason is unsophisticated alias analysis, but the
actual reason may in fact be different.


struct X {

    __m128i*    m_Data;
    std::size_t m_Len;

    void xor_all(const X& v1, const X& v2);    
    void xor_all2(const X& v1, const X& v2);    
};


void X::xor_all(const X& v1, const X& v2) {

    for(std::size_t i = 0; i != m_Len; ++i) {

        m_Data[i] = v1.m_Data[i] ^ v2.m_Data[i];
    }
}

void X::xor_all2(const X& v1, const X& v2) {

    __m128i* p0 = m_Data;
    __m128i* p1 = v1.m_Data;
    __m128i* p2 = v2.m_Data;

    for(std::size_t i = 0; i != m_Len; ++i) {

        p0[i] = p1[i] ^ p2[i];
    }
}

As can be seen, xor_all2 produces nice code and xor_all doesn't:

0000000000447c70 <_ZN1X7xor_allERKS_S1_>:
  447c70:    48 83 7f 08 00           cmpq   $0x0,0x8(%rdi)
  447c75:    74 35                    je     447cac
<_ZN1X7xor_allERKS_S1_+0x3c>
  447c77:    31 c0                    xor    %eax,%eax
  447c79:    0f 1f 80 00 00 00 00     nopl   0x0(%rax)
  447c80:    4c 8b 12                 mov    (%rdx),%r10
  447c83:    48 89 c1                 mov    %rax,%rcx
  447c86:    48 83 c0 01              add    $0x1,%rax
  447c8a:    4c 8b 0e                 mov    (%rsi),%r9
  447c8d:    48 c1 e1 04              shl    $0x4,%rcx
  447c91:    4c 8b 07                 mov    (%rdi),%r8
  447c94:    66 41 0f 6f 04 0a        movdqa (%r10,%rcx,1),%xmm0
  447c9a:    66 41 0f ef 04 09        pxor   (%r9,%rcx,1),%xmm0
  447ca0:    66 41 0f 7f 04 08        movdqa %xmm0,(%r8,%rcx,1)
  447ca6:    48 39 47 08              cmp    %rax,0x8(%rdi)
  447caa:    75 d4                    jne    447c80
<_ZN1X7xor_allERKS_S1_+0x10>
  447cac:    f3 c3                    repz retq 


0000000000447cb0 <_ZN1X8xor_all2ERKS_S1_>:
  447cb0:    48 83 7f 08 00           cmpq   $0x0,0x8(%rdi)
  447cb5:    48 8b 0f                 mov    (%rdi),%rcx
  447cb8:    48 8b 36                 mov    (%rsi),%rsi
  447cbb:    4c 8b 02                 mov    (%rdx),%r8
  447cbe:    74 26                    je     447ce6
<_ZN1X8xor_all2ERKS_S1_+0x36>
  447cc0:    31 c0                    xor    %eax,%eax
  447cc2:    31 d2                    xor    %edx,%edx
  447cc4:    0f 1f 40 00              nopl   0x0(%rax)
  447cc8:    66 41 0f 6f 04 00        movdqa (%r8,%rax,1),%xmm0
  447cce:    48 83 c2 01              add    $0x1,%rdx
  447cd2:    66 0f ef 04 06           pxor   (%rsi,%rax,1),%xmm0
  447cd7:    66 0f 7f 04 01           movdqa %xmm0,(%rcx,%rax,1)
  447cdc:    48 83 c0 10              add    $0x10,%rax
  447ce0:    48 39 57 08              cmp    %rdx,0x8(%rdi)
  447ce4:    75 e2                    jne    447cc8
<_ZN1X8xor_all2ERKS_S1_+0x18>
  447ce6:    f3 c3                    repz retq

Reply via email to