https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70708
--- Comment #1 from Marc Glisse <glisse at gcc dot gnu.org> --- Looks complicated. _mm_set_sd(x) returns {x,0}, and I don't think the calling convention guarantees anything about the unused part of SSE registers. _mm_min_sd uses the high part of its first argument. So we need to understand things about minsd. IIRC minsd is represented internally as a vec_merge of minpd and the first argument, so a vec_select on that might notice that we are only using the minpd part of the merge, but wouldn't simplify. If we represented scalar operations as {min(x[0],y[0]),x[1]} (vec_concat), things would likely simplify, but I believe that approach was rejected by rth (it doesn't scale to avx, etc). On the other hand, IIRC he was in favor of a representation as vec_merge(vec_dup(min(x[0],y[0])),x,1), which I guess would simplify as well (it is quite possible that I am misremembering. In any case, I probably have an early prototype lying around somewhere, maybe on gcc-patches@). We could expand _mm_min_sd to {min(x[0],y[0]),x[1]} (well, some variant of min) early (in the header, or in gimple), but then we would probably fail to detect that this is a single instruction and regress elsewhere... (note that for a simple __m128d f(double x){return _mm_set_sd(x);}, gcc either goes through the stack or uses movhpd, while clang happily uses movq %xmm0, %xmm0)