I have done more experiments, it seems movntps instruction gives a performance
gain only when the array is longer than about 200_000 ints (celeron CPU).
For [25, 200_000] integers the movaps is better (and better than C memset).
For n < 25 the best thing I've found is just an inlined loop.
I'll p
I've found that when lenght is small (about 50-250 4-byte items or less) the
array filling operation is quite slow compared to a normal loop:
a[] = x;
So I suggest DMD frontend to inline little loop when a.length is small.
To further improve the a[]=x; I have tried to speed up the larger case to