http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60537
Bug ID: 60537 Summary: Loop optimization code bloat for simple loops Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: olegendo at gcc dot gnu.org Target: sh*-*-* I have noticed this on SH, maybe it also applies to other targets (checked on 4.9 r208241). The following simple loop (simple strlen implementation): unsigned int test (const char* s0) { const char* s1 = s0; while (*s1) s1++; return s1 - s0; } With -O2 -m4 gets compiled to: mov.b @r4,r1 tst r1,r1 bt/s .L4 mov r4,r1 add #1,r1 .align 2 .L3: mov r1,r0 mov.b @r0,r2 tst r2,r2 bf/s .L3 add #1,r1 rts sub r4,r0 .align 1 .L4: rts mov #0,r0 With -Os -m4 it is basically just the inner loop: mov r4,r1 .L2: mov r1,r0 mov.b @r0,r2 tst r2,r2 bf/s .L2 add #1,r1 rts sub r4,r0 The additional loop test in the loop header in the -O2 version seems a bit pointless. If the loop exists at the first iteration, it simply falls through. The additional test and jump around the loop doesn't gain anything in this case but just increases code size unnecessarily.