https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97784

            Bug ID: 97784
           Summary: Expressions evaluated as long chain instead of as tree
                    or the like
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: segher at gcc dot gnu.org
  Target Milestone: ---

When compiling something like

#define O +
long x4(long x, long a, long b, long c, long d) { return x O a O b O c O d; }

we end up with machine code like

        add 3,3,4        # 10   [c=4 l=4]  *adddi3/0
        add 3,3,5        # 11   [c=4 l=4]  *adddi3/0
        add 3,3,6        # 12   [c=4 l=4]  *adddi3/0
        add 3,3,7        # 18   [c=4 l=4]  *adddi3/0
        blr              # 30   [c=4 l=4]  simple_return

Every of those "add" insns depends on the result of the previous one,
making this slower than necessary: it has the latency of 4 add insns in
series, while some can be done in parallel.


This problem is there on gimple level already:

  _1 = x_4(D) + a_5(D);
  _2 = _1 + b_6(D);
  _3 = _2 + c_7(D);
  _9 = _3 + d_8(D);
  return _9;


A very similar problem also happens as a result of RTL unrolling.

Reply via email to