https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69875
Bug ID: 69875 Summary: [4.9/5/6 Regression] Wrong barrier placement for 64-bit atomic loads on arm Product: gcc Version: 6.0 Status: UNCONFIRMED Keywords: wrong-code Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: ktkachov at gcc dot gnu.org Target Milestone: --- Target: arm* Consider the code: #include <stdatomic.h> atomic_ullong foo; int glob; int main (void) { atomic_load_explicit (&foo, memory_order_acquire); return glob; } Compiled for arm with flags -O2 -S -march=armv7ve --std=c11 this generates: movw r3, #:lower16:glob movt r3, #:upper16:glob dmb ish movw r2, #:lower16:foo movt r2, #:upper16:foo ldrexd r0, r1, [r2] ldr r0, [r3] bx lr This is wrong for the acquire memory order. The barrier DMB should go after the LDREXD. Furthermore, for armv7ve we should be able to relax the LDREXD into an LDRD. The same code is generated for -march=armv7-a, armv7ve and armv8-a. But for armv8-a we should be able to make use of the load-double-word-acquire instruction LDAEXD and avoid the barrier. So there is wrong-code and missed-optimization and occurs on all currently maintained branches.