https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91732
Bug ID: 91732 Summary: Adding omp simd pragma prevents vectorization Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: jed at 59A2 dot org Target Milestone: --- omp-simd.c: void poisson(int Q, const double *restrict gsym, const double *restrict du, double *restrict dv) { #pragma omp simd for (int i=0; i<Q; i++) { const double g[2][2] = {{gsym[Q*0+i], gsym[Q*2+i]}, {gsym[Q*2+i], gsym[Q*1+i]}}; for (int j=0; j<2; j++) dv[Q*j+i] = g[j][0] * du[Q*0+i] + g[j][1] * du[Q*1+i]; } } The above fails to vectorize despite unrolling the inner loop. $ gcc -Ofast -march=skylake-avx512 -fopenmp -fopt-info -fopt-info-missed -c omp-simd.c omp-simd.c:6:5: optimized: loop with 2 iterations completely unrolled (header execution count 357878152) omp-simd.c:4:38: missed: couldn't vectorize loop omp-simd.c:4:18: missed: not vectorized: not suitable for scatter store D.4095[_37][0][0] = _4; If I remove the "#pragma omp simd", it vectorizes: $ gcc -Ofast -march=skylake-avx512 -fopenmp -fopt-info -fopt-info-missed -c omp-simd.c omp-simd.c:5:5: optimized: loop with 2 iterations completely unrolled (header execution count 357878152) omp-simd.c:2:3: optimized: loop vectorized using 32 byte vectors omp-simd.c:2:3: optimized: loop versioned for vectorization because of possible aliasing omp-simd.c:2:3: optimized: loop with 2 iterations completely unrolled (header execution count 18709371) If instead, I replace "#pragma omp simd" with "#pragma GCC ivdep", it vectorizes without possible aliasing. $ gcc -Ofast -march=skylake-avx512 -fopenmp -fopt-info -fopt-info-missed -c omp-simd.c omp-simd.c:6:5: optimized: loop with 2 iterations completely unrolled (header execution count 357878152) omp-simd.c:3:3: optimized: loop vectorized using 32 byte vectors omp-simd.c:3:3: optimized: loop with 2 iterations completely unrolled (header execution count 24166268) I think aliasing should not be a concern due to use of restrict. Also, if I manually unroll the inner loop (which the compiler is unrolling for me), the original "omp simd" version vectorizes nicely. Reproduced on trunk: https://gcc.godbolt.org/z/wKdHg0