Hello, I've posted on the mailing list because I could not get an
account on Bugzilla created, as it was disabled, and contacting the
overseer as instructed didn't quite work out: it's been two weeks
already and no response.

Anyway, this is a regression bug leading to wrong code generation with
__restrict__. I tracked down the culprit to be the  -fschedule-insns
option, so I used -O1 to track it as it is implicitly enabled in -O2
and higher. It only happens since GCC 6, version 5.4 is fine.

Simple test case of a simplistic memtest in asm:


inline void memset_test(void* a, char c, unsigned long n)
{
  // mark the pointer as a huge clobber via a local variable (which works)
  struct { char _[unsigned(~0U)>>1]; } *const m=(typeof(m))(a);
  asm("rep stosb":"+D"(a),"+c"(n),"=m"(*m):"a"(c));
}

void foo(char* __restrict__ a, int c)
{
  memset_test(a, 0, c);
  asm("xor %0, %0"::"q"(a[0]));
}


Compile the above with:  -m32 -O1 -fschedule-insns   (or -m64 which
has the same bug)

I've used Godbolt's Compiler Explorer to easily test multiple versions
and then confirmed it. Here's the example i386 output from GCC 6 or 7
snapshot (both are identical):

GCC 6.x or GCC 7:
        push    edi
        mov     eax, DWORD PTR [esp+8]
        mov     edi, eax
        mov     ecx, DWORD PTR [esp+12]
        movzx   edx, BYTE PTR [eax]        # this is WRONG
        mov     eax, 0
        rep stosb
        xor dl, dl
        pop     edi
        ret


GCC 5.4:
        push    edi
        mov     edx, DWORD PTR [esp+8]
        mov     edi, edx
        mov     ecx, DWORD PTR [esp+12]
        mov     eax, 0
        rep stosb
        movzx   eax, BYTE PTR [edx]        # CORRECT, after stosb
        xor al, al
        pop     edi
        ret

I don't have GCC 7 but this bug is confirmed on GCC 6 on my machine,
not just on Godbolt's site where I tested version 7.

Things to note:

This happens on GCC 6 and up to 7 only, GCC 5.4 generates correct output.
Other compilers, such as Clang and ICC, output correct code. So to me
the code must be correct, even old GCC agrees.

This happens once you turn on the -fschedule-insns option.
If you remove the __restrict__ from the pointer in foo's parameter,
the problem is gone.
Using "asm volatile" instead of "asm" in memset_test generates correct code.
Using "memory" clobber in that asm also generates correct code.


Most of these workarounds are not valid in this context because they
DISABLE the optimizations, so it's like preventing the problem from
popping up instead of solving it. "memory" clobber is obviously the
worst solution by far as it will kill any cached memory in registers.
"asm volatile" is probably the least bad workaround, __restrict__ is
definitely useful for same types the compiler can't otherwise know
they won't alias.


Please look into it.

Reply via email to