https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069

            Bug ID: 103069
           Summary: cmpxchg isn't optimized
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: hjl.tools at gmail dot com
                CC: crazylht at gmail dot com, wwwhhhyyy333 at gmail dot com
            Blocks: 103065
  Target Milestone: ---
            Target: i386,x86-64

>From the CPU's point of view, getting a cache line for writing is more
expensive than reading.  See Appendix A.2 Spinlock in:

https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/xeon-lock-scaling-analysis-paper.pdf

The full compare and swap will grab the cache line exclusive and causes
excessive cache line bouncing.

[hjl@gnu-cfl-2 pr102566]$ cat e.c 
int
f3 (int *a)
{
  return __atomic_fetch_or (a, 0x40000000, __ATOMIC_RELAXED);
}
[hjl@gnu-cfl-2 pr102566]$ gcc -S -O2 x.c 
[hjl@gnu-cfl-2 pr102566]$ cat x.s 
        .file   "x.c"
        .text
        .p2align 4
        .globl  foo
        .type   foo, @function
foo:
.LFB0:
        .cfi_startproc
        movl    v(%rip), %eax
.L2:
        movl    %eax, %ecx
        movl    %eax, %edx
        orl     $1, %ecx
        lock cmpxchgl   %ecx, v(%rip)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
GCC should first emit a normal load, check and jump to .L2 if cmpxchgl
may fail.  Before jump to .L2, PAUSE should be inserted to to yield the
CPU to another hyperthread and to save power. It also serves to slightly
limit the rate of accesses on the processor interconnect.
        jne     .L2
        movl    %edx, %eax
        andl    $1, %eax
        ret
        .cfi_endproc
.LFE0:
        .size   foo, .-foo
        .ident  "GCC: (GNU) 11.2.1 20211019 (Red Hat 11.2.1-6)"
        .section        .note.GNU-stack,"",@progbits
[hjl@gnu-cfl-2 pr102566]$


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103065
[Bug 103065] [meta] atomic operations aren't optimized

Reply via email to