https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80015
Bug ID: 80015 Summary: auto vectorization leave scalar code even if it is unreachable Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vanyacpp at gmail dot com Target Milestone: --- Consider these two versions of dot_product: #include <cstdlib> float dot_product(float const* a, float const* b, size_t n) { a = (float const*)__builtin_assume_aligned(a, 16); b = (float const*)__builtin_assume_aligned(b, 16); if ((n % 4) != 0) return 0.; // (1) // __builtin_unreachable(); // (2) float result = 0.f; for (size_t i = 0; i != n; ++i) result += a[i] * b[i]; return result; } The code should be compiled with flags -O3 -ffast-math. In case of (1) the return 0. is performed when n is not a multiple of 4, in (2) __builtin_unreachable() is invoked. The code (2) with __builtin_unreachable() is optimized to the point where only packed operations are used. In the code (1) with return the scalar operations are still left. The expected behavior is that gcc should not emit scalar operations in both versions.