https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88809
Bug ID: 88809 Summary: do not use rep-scasb for inline strlen/memchr Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: amonakov at gcc dot gnu.org Target Milestone: --- Target: x86_64-*-*, i?86-*-* Performance difference between libc strlen and x86 rep-scasb has grown too large and seems unlikely to improve anytime soon. On most x86 cores, microcode for rep-scasb is not too sophisticated and runs at 0.5 bytes per cycle or worse (according to Agner Fog's research; with SkylakeX managing 1 b/c), plus some overhead for entering/leaving the microcode loop (I think on the order of 20 cycles, but don't have exact info). Whereas libc strlen typically has small overhead for short strings and uses register-wide operations on long strings, sustaining on the order of 4-8 b/c only with integer registers or even in the ballpark of 16-64 b/c with SSE/AVX (sorry, don't have exact figures here). A call to strlen is also shorter by itself (rep-scasb needs extra instructions to setup %rax and fixup %rcx). (although to be fair, a call to strlen prevents use of redzone and clobbers more registers) Therefore I suggest we don't use rep-scasb for inline strlen anymore by default (we currently do at -Os). This is in part motivated by PR 88793 and the Redhat bug referenced from there.