http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60890
Bug ID: 60890 Summary: Performance regression in 4.8 for memory postinc Product: gcc Version: 4.8.1 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hariharan.gcc at gmail dot com Created attachment 32632 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32632&action=edit The source testcase The attached testcase is a simplified version of the code where i saw this problem initially. At the end of tree stages, 4.7 compiler used to generate two consecutive stores, and then add to update the pointer as shown below for the stores in the innermost loop. MEM[base: aptr_54, offset: 0B] = res1_22; MEM[base: aptr_54, offset: 4B] = res2_27; D.1771_63 = (sizetype) aptr_54; D.1772_64 = D.1771_63 + 8; 4.8 compiler generates MEM[base: base_76, offset: 0B] = res1_32; _29 = (unsigned long) base_76; _83 = _29 + 8; base_84 = (int *) _83; MEM[base: base_84, offset: -4B] = res2_37; for the same 2 stores. In our private port, which can do post-inc on load/store operations, 4.7 used to generate optimal code whereas 4.8 code is not very pretty. The problem seems to stem from the fix made to Bug 48814, which generates post-inc operations in a different order from 4.7. Should the tree optimization passes have fixed it up? At the end of tree-optimization passes, i can see the problem in x86 as well. Compile the attached code with 4.7.x and 4.8.x to see the difference.