https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96062
Bug ID: 96062 Summary: Partial register stall caused by avoidable use of SETcc, and useless MOVZBL Product: gcc Version: 10.1.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: josephcsible at gmail dot com Target Milestone: --- Target: x86_64 Consider this C code: long ps4_syscall0(long n) { long ret; int carry; __asm__ __volatile__( "syscall" : "=a"(ret), "=@ccc"(carry) : "a"(n) : "rcx", "r8", "r9", "r10", "r11", "memory" ); return carry ? -ret : ret; } With "-O3", it results in this assembly: ps4_syscall0: movq %rdi, %rax syscall setc %dl movq %rax, %rdi movzbl %dl, %edx negq %rdi testl %edx, %edx cmovne %rdi, %rax ret On modern Intel CPUs, doing "setc %dl" creates a false dependency on rdx. Doing "movzbl %dl, %edx" doesn't do anything to fix that. Here's some ways that we could improve this code, without having to fall back to a conditional branch: 1. Get rid of "movzbl %dl, %edx" (since it doesn't help), and then do "testb %dl, %dl" instead of "testl %edx, %edx". 2. Possibly in addition to #1, use dh instead of dl, since high-byte registers are still renamed. 3. Instead of #1 and #2, replace the whole sequence between "syscall" and "ret" with this: sbbq %rcx, %rcx xorq %rcx, %rax subq %rcx, %rax On Intel (but not AMD), the sbb has a false dependency too, but it's still a lot less shuffling values around. 4. Instead of #1, #2, and #3, replace the whole sequence between "syscall" and "ret" with this: leaq -1(%rax), %rcx notq %rcx cmovc %rcx, %rax I like this one the best. No false dependencies at all, and still way less shuffling values around.