On 20.07.2022 12:02, Andrew Cooper wrote:
> One observation though.  We do pass -mno-sse but not -mno-mmx.  I still
> can't figure out what makes the compiler think there's any SIMD to be
> done in this function.

So this looks to be "optimization", done in a few more places. The pattern
is always the same: A 32-bit GPR is known to be zero, and there's nearby
code which wants to store two adjacent zeros. Hence they take those 32
bits of zero in the GPR, move to %mm0 (which already zeros the upper half),
unpack it to have the 32 bits of zero duplicated into the upper half, and
then use %mm0 to do the store of the pair of zeros. IOW they "auto-
vectorize" these two stores into a single V2SI (using the common notation)
one.

Besides this being quite the opposite of optimization, of course we didn't
tell the compiler anywhere that it might use any of the %mm<N> registers.

Jan


Reply via email to