https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96062

            Bug ID: 96062
           Summary: Partial register stall caused by avoidable use of
                    SETcc, and useless MOVZBL
           Product: gcc
           Version: 10.1.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: josephcsible at gmail dot com
  Target Milestone: ---
            Target: x86_64

Consider this C code:

long ps4_syscall0(long n) {
    long ret;
    int carry;
    __asm__ __volatile__(
        "syscall"
        : "=a"(ret), "=@ccc"(carry)
        : "a"(n)
        : "rcx", "r8", "r9", "r10", "r11", "memory"
    );
    return carry ? -ret : ret;
}

With "-O3", it results in this assembly:

ps4_syscall0:
        movq    %rdi, %rax
        syscall
        setc    %dl
        movq    %rax, %rdi
        movzbl  %dl, %edx
        negq    %rdi
        testl   %edx, %edx
        cmovne  %rdi, %rax
        ret

On modern Intel CPUs, doing "setc %dl" creates a false dependency on rdx. Doing
"movzbl %dl, %edx" doesn't do anything to fix that. Here's some ways that we
could improve this code, without having to fall back to a conditional branch:

1. Get rid of "movzbl %dl, %edx" (since it doesn't help), and then do "testb
%dl, %dl" instead of "testl %edx, %edx".
2. Possibly in addition to #1, use dh instead of dl, since high-byte registers
are still renamed.
3. Instead of #1 and #2, replace the whole sequence between "syscall" and "ret"
with this:

        sbbq    %rcx, %rcx
        xorq    %rcx, %rax
        subq    %rcx, %rax

On Intel (but not AMD), the sbb has a false dependency too, but it's still a
lot less shuffling values around.
4. Instead of #1, #2, and #3, replace the whole sequence between "syscall" and
"ret" with this:

        leaq    -1(%rax), %rcx
        notq    %rcx
        cmovc   %rcx, %rax

I like this one the best. No false dependencies at all, and still way less
shuffling values around.

Reply via email to