Hello, I've posted on the mailing list because I could not get an account on Bugzilla created, as it was disabled, and contacting the overseer as instructed didn't quite work out: it's been two weeks already and no response.
Anyway, this is a regression bug leading to wrong code generation with __restrict__. I tracked down the culprit to be the -fschedule-insns option, so I used -O1 to track it as it is implicitly enabled in -O2 and higher. It only happens since GCC 6, version 5.4 is fine. Simple test case of a simplistic memtest in asm: inline void memset_test(void* a, char c, unsigned long n) { // mark the pointer as a huge clobber via a local variable (which works) struct { char _[unsigned(~0U)>>1]; } *const m=(typeof(m))(a); asm("rep stosb":"+D"(a),"+c"(n),"=m"(*m):"a"(c)); } void foo(char* __restrict__ a, int c) { memset_test(a, 0, c); asm("xor %0, %0"::"q"(a[0])); } Compile the above with: -m32 -O1 -fschedule-insns (or -m64 which has the same bug) I've used Godbolt's Compiler Explorer to easily test multiple versions and then confirmed it. Here's the example i386 output from GCC 6 or 7 snapshot (both are identical): GCC 6.x or GCC 7: push edi mov eax, DWORD PTR [esp+8] mov edi, eax mov ecx, DWORD PTR [esp+12] movzx edx, BYTE PTR [eax] # this is WRONG mov eax, 0 rep stosb xor dl, dl pop edi ret GCC 5.4: push edi mov edx, DWORD PTR [esp+8] mov edi, edx mov ecx, DWORD PTR [esp+12] mov eax, 0 rep stosb movzx eax, BYTE PTR [edx] # CORRECT, after stosb xor al, al pop edi ret I don't have GCC 7 but this bug is confirmed on GCC 6 on my machine, not just on Godbolt's site where I tested version 7. Things to note: This happens on GCC 6 and up to 7 only, GCC 5.4 generates correct output. Other compilers, such as Clang and ICC, output correct code. So to me the code must be correct, even old GCC agrees. This happens once you turn on the -fschedule-insns option. If you remove the __restrict__ from the pointer in foo's parameter, the problem is gone. Using "asm volatile" instead of "asm" in memset_test generates correct code. Using "memory" clobber in that asm also generates correct code. Most of these workarounds are not valid in this context because they DISABLE the optimizations, so it's like preventing the problem from popping up instead of solving it. "memory" clobber is obviously the worst solution by far as it will kill any cached memory in registers. "asm volatile" is probably the least bad workaround, __restrict__ is definitely useful for same types the compiler can't otherwise know they won't alias. Please look into it.