[Bug c++/70708] Suboptimal code generated when using _mm_set_sd (X64)

glisse at gcc dot gnu.org Sun, 17 Apr 2016 23:22:47 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70708


--- Comment #1 from Marc Glisse <glisse at gcc dot gnu.org> ---
Looks complicated. _mm_set_sd(x) returns {x,0}, and I don't think the calling
convention guarantees anything about the unused part of SSE registers.
_mm_min_sd uses the high part of its first argument. So we need to understand
things about minsd. IIRC minsd is represented internally as a vec_merge of
minpd and the first argument, so a vec_select on that might notice that we are
only using the minpd part of the merge, but wouldn't simplify. If we
represented scalar operations as {min(x[0],y[0]),x[1]} (vec_concat), things
would likely simplify, but I believe that approach was rejected by rth (it
doesn't scale to avx, etc). On the other hand, IIRC he was in favor of a
representation as vec_merge(vec_dup(min(x[0],y[0])),x,1), which I guess would
simplify as well (it is quite possible that I am misremembering. In any case, I
probably have an early prototype lying around somewhere, maybe on
gcc-patches@).

We could expand _mm_min_sd to {min(x[0],y[0]),x[1]} (well, some variant of min)
early (in the header, or in gimple), but then we would probably fail to detect
that this is a single instruction and regress elsewhere...

(note that for a simple __m128d f(double x){return _mm_set_sd(x);}, gcc either
goes through the stack or uses movhpd, while clang happily uses movq     %xmm0,
%xmm0)

[Bug c++/70708] Suboptimal code generated when using _mm_set_sd (X64)

Reply via email to