https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66741
Bug ID: 66741 Summary: loops not fused nor vectorized Product: gcc Version: 6.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: aldot at gcc dot gnu.org Blocks: 53947 Target Milestone: --- I would have hoped that the strcpy loop and &~0x20 loop would be fused, perusing some masked store for the tolower. $ cat > strcpy.c <<EOF typedef __SIZE_TYPE__ size_t; char *tolower_strcpy(char *dest, const char *src) { char *ret = __builtin_strcpy(dest, src); #if 1 size_t sz = __builtin_strlen(dest); while (sz--) #else while (*ret) #endif { int ch = *ret; *ret = __builtin_tolower(ch); ++ret; } return dest; } #ifdef MAIN #include <unistd.h> #include <string.h> int main(void) { char src[128], dest[128]; int n = read(0, &src, sizeof(src)); if (n < 1) return 1; src[n] = 0; tolower_strcpy(dest, src); write(2, dest, strlen(dest)); return 0; } #endif EOF gcc-5 -S strcpy.c -o strcpy.s -Ofast -fomit-frame-pointer -minline-all-stringops -mstringop-strategy=unrolled_loop -mtune=ivybridge Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations