https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97784
Bug ID: 97784 Summary: Expressions evaluated as long chain instead of as tree or the like Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: segher at gcc dot gnu.org Target Milestone: --- When compiling something like #define O + long x4(long x, long a, long b, long c, long d) { return x O a O b O c O d; } we end up with machine code like add 3,3,4 # 10 [c=4 l=4] *adddi3/0 add 3,3,5 # 11 [c=4 l=4] *adddi3/0 add 3,3,6 # 12 [c=4 l=4] *adddi3/0 add 3,3,7 # 18 [c=4 l=4] *adddi3/0 blr # 30 [c=4 l=4] simple_return Every of those "add" insns depends on the result of the previous one, making this slower than necessary: it has the latency of 4 add insns in series, while some can be done in parallel. This problem is there on gimple level already: _1 = x_4(D) + a_5(D); _2 = _1 + b_6(D); _3 = _2 + c_7(D); _9 = _3 + d_8(D); return _9; A very similar problem also happens as a result of RTL unrolling.