[Bug tree-optimization/113091] New: Over-estimate SLP vector-to-scalar cost for non-live pattern statement

fxue at os dot amperecomputing.com via Gcc-bugs Wed, 20 Dec 2023 01:54:07 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113091


            Bug ID: 113091
           Summary: Over-estimate SLP vector-to-scalar cost for non-live
                    pattern statement
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: fxue at os dot amperecomputing.com
  Target Milestone: ---

Gcc fails to vectorize the below testcase on aarch64.

  int test(unsigned array[8]);

  int foo(char *a, char *b)
  {
    unsigned array[8];

    array[0] = (a[0] - b[0]);
    array[1] = (a[1] - b[1]);
    array[2] = (a[2] - b[2]);
    array[3] = (a[3] - b[3]);
    array[4] = (a[4] - b[4]);
    array[5] = (a[5] - b[5]);
    array[6] = (a[6] - b[6]);
    array[7] = (a[7] - b[7]);

    return test(array);
  }

The dump shows that loads to a[i] and b[i] are considered to be live as scalar
references, which results in over-estimated vector-to-scalar cost.

*a_50(D) 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)a_50(D) + 1B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)a_50(D) + 2B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)a_50(D) + 3B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)a_50(D) + 4B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)a_50(D) + 5B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)a_50(D) + 6B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)a_50(D) + 7B] 1 times vec_to_scalar costs 2 in epilogue
*b_51(D) 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)b_51(D) + 1B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)b_51(D) + 2B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)b_51(D) + 3B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)b_51(D) + 4B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)b_51(D) + 5B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)b_51(D) + 6B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)b_51(D) + 7B] 1 times vec_to_scalar costs 2 in epilogue

Subtraction on char type is recognized as widen-sub, and involves two kinds of
pattern replacement.

 * Original
 _1 = *a_50(D);
 _2 = (int) _1;
 _3 = *b_51(D);
 _4 = (int) _3;
 _5 = _2 - _4;


 * After pattern replacement
 patt_63 = (unsigned short) _1;  //  _2 = (int) _1;
 patt_64 = (int) patt_63;        //  _2 = (int) _1;

 patt_65 = (unsigned short) _3;  //  _4 = (int) _3;
 patt_66 = (int) patt_65;        //  _4 = (int) _3;

 patt_67 = .VEC_WIDEN_MINUS (_1, _3);  //  _5 = _2 - _4;
 patt_68 = (signed short) patt_67;     //  _5 = _2 - _4;
 patt_69 = (int) patt_68;              //  _5 = _2 - _4;

For the statement "_2 = (int) _1", its vectorization representative "patt_64 =
(int) patt_63" is not marked as PURE_SLP, so it is conservatively considered to
having scalar use and being live outside of SLP bb (in the function
vect_bb_slp_mark_live_stmts). However, the pattern definition is actually dead,
should not contribute to vector-to-scalar cost. 

Those defs from pattern statements are not part of function body, we could not
track def/use chain as ordinary SSAs. Probably, we may have a quick fix for one
situation, if the original SSA "_2" has single use, its existence should be
only covered by vectorized operation, no matter what/how it would be w/o
pattern replacement.

[Bug tree-optimization/113091] New: Over-estimate SLP vector-to-scalar cost for non-live pattern statement

Reply via email to