https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91732

            Bug ID: 91732
           Summary: Adding omp simd pragma prevents vectorization
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jed at 59A2 dot org
  Target Milestone: ---

omp-simd.c:
void poisson(int Q, const double *restrict gsym, const double *restrict du,
double *restrict dv) {
#pragma omp simd
  for (int i=0; i<Q; i++) {
    const double g[2][2] = {{gsym[Q*0+i], gsym[Q*2+i]},
                            {gsym[Q*2+i], gsym[Q*1+i]}};
    for (int j=0; j<2; j++)
      dv[Q*j+i] = g[j][0] * du[Q*0+i] + g[j][1] * du[Q*1+i];
  }
}

The above fails to vectorize despite unrolling the inner loop.

$ gcc -Ofast -march=skylake-avx512 -fopenmp -fopt-info -fopt-info-missed -c
omp-simd.c
omp-simd.c:6:5: optimized: loop with 2 iterations completely unrolled (header
execution count 357878152)
omp-simd.c:4:38: missed: couldn't vectorize loop
omp-simd.c:4:18: missed: not vectorized: not suitable for scatter store
D.4095[_37][0][0] = _4;

If I remove the "#pragma omp simd", it vectorizes:

$ gcc -Ofast -march=skylake-avx512 -fopenmp -fopt-info -fopt-info-missed -c
omp-simd.c
omp-simd.c:5:5: optimized: loop with 2 iterations completely unrolled (header
execution count 357878152)
omp-simd.c:2:3: optimized: loop vectorized using 32 byte vectors
omp-simd.c:2:3: optimized:  loop versioned for vectorization because of
possible aliasing
omp-simd.c:2:3: optimized: loop with 2 iterations completely unrolled (header
execution count 18709371)

If instead, I replace "#pragma omp simd" with "#pragma GCC ivdep", it
vectorizes without possible aliasing.

$ gcc -Ofast -march=skylake-avx512 -fopenmp -fopt-info -fopt-info-missed -c
omp-simd.c
omp-simd.c:6:5: optimized: loop with 2 iterations completely unrolled (header
execution count 357878152)
omp-simd.c:3:3: optimized: loop vectorized using 32 byte vectors
omp-simd.c:3:3: optimized: loop with 2 iterations completely unrolled (header
execution count 24166268)

I think aliasing should not be a concern due to use of restrict.  Also, if I
manually unroll the inner loop (which the compiler is unrolling for me), the
original "omp simd" version vectorizes nicely.

Reproduced on trunk: https://gcc.godbolt.org/z/wKdHg0

Reply via email to