https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70164
Bug ID: 70164 Summary: Code/performance regression due to poor register allocation on Cortex-M0 Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: andre.simoesdiasvieira at arm dot com Target Milestone: --- Created attachment 37920 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37920&action=edit current ira dump After a quick investigation of the testcase in gcc/testsuite/gcc.target/arm/pr45701-1.c for cortex-m0 on trunk I found out that the test case was failing due to a change in the register allocation after revision r226901. Before this register allocation would choose to load the global 'hist_verify' onto r6 representing 'old_verify' prior to the function call to pre_process_line. This old_verify is used after the function call. With the patch it decides to load it onto r3, a caller-saved register, which means it has to be spilled before the function call and reloaded after. Before patch: history_expand_line_internal: push {r3, r4, r5, r6, r7, lr} ldr r3, .L5 ldr r5, .L5+4 ldr r4, [r3] movs r3, #0 ldr r6, [r5] ; <--- load of 'hist_verify' onto r6 movs r7, r0 str r3, [r5] bl pre_process_line adds r6, r4, r6 str r6, [r5] movs r4, r0 cmp r7, r0 bne .L2 bl str_len adds r0, r0, #1 bl x_malloc movs r1, r4 bl str_cpy movs r4, r0 .L2: movs r0, r4 @ sp needed pop {r3, r4, r5, r6, r7, pc} Current: history_expand_line_internal: push {r0, r1, r2, r4, r5, r6, r7, lr} ldr r3, .L3 ldr r5, .L3+4 ldr r6, [r3] ldr r3, [r5] ; <--- load of 'hist_verify' onto r3 movs r7, r0 str r3, [sp, #4] ; <--- Spill movs r3, #0 str r3, [r5] bl pre_process_line ldr r3, [sp, #4] ; <--- Reload movs r4, r0 adds r6, r6, r3 str r6, [r5] cmp r7, r0 bne .L1 bl str_len adds r0, r0, #1 bl x_malloc movs r1, r4 bl str_cpy movs r4, r0 .L1: movs r0, r4 @ sp needed pop {r1, r2, r3, r4, r5, r6, r7, pc} I have also attached the dumps for ira and reload for both pre-patch and current. In the current reload dump insn 9 represents the load onto r3 and insn 62 the spill. In pre-patch ira/reload the load is in insn 10. I am not familiar with RA in GCC, so I'm not entirely sure what code to blame for this sub-optimal allocation, any comments or pointers would be most welcome.