https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82147
--- Comment #2 from Richard Biener ---
The vectorizer performs interleaving for this kind of loop, your manual one
isn't really vectorized (you only vectorize the load).
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82147
--- Comment #1 from Andrew Pinski ---
It is even worse for float*4->float*2,float*2.
Take (ignore the obvious aliasing issues):
void f(float *restrict a, float * restrict b, float * restrict c, int s)
{
for(int i = 0; i< s;i++)
{
a[i*