https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69875

            Bug ID: 69875
           Summary: [4.9/5/6 Regression] Wrong barrier placement for
                    64-bit atomic loads on arm
           Product: gcc
           Version: 6.0
            Status: UNCONFIRMED
          Keywords: wrong-code
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---
            Target: arm*

Consider the code:
#include <stdatomic.h>

atomic_ullong foo;
int glob;

int
main (void)
{
  atomic_load_explicit (&foo, memory_order_acquire);
  return glob;
}

Compiled for arm with flags -O2 -S -march=armv7ve --std=c11 this generates:
        movw    r3, #:lower16:glob
        movt    r3, #:upper16:glob
        dmb     ish
        movw    r2, #:lower16:foo
        movt    r2, #:upper16:foo
        ldrexd  r0, r1, [r2]
        ldr     r0, [r3]
        bx      lr

This is wrong for the acquire memory order. The barrier DMB should go after the
LDREXD. Furthermore, for armv7ve we should be able to relax the LDREXD into an
LDRD.

The same code is generated for -march=armv7-a, armv7ve and armv8-a.
But for armv8-a we should be able to make use of the load-double-word-acquire
instruction LDAEXD and avoid the barrier.

So there is wrong-code and missed-optimization and occurs on all currently
maintained branches.

Reply via email to