[Bug target/121099] GCC doesn't optimize `_mm_set_ps()` very well

2025-07-16 Thread lh_mouse at 126 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121099

--- Comment #3 from LIU Hao  ---
Yes, INSERTPS requires SSE4.1. However code is compiled with AVX so it should
be preferred.

[Bug target/121099] GCC doesn't optimize `_mm_set_ps()` very well

2025-07-16 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121099

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2025-07-16

--- Comment #2 from Richard Biener  ---
We expand from

  _1 = -y_2(D);
  _5 = {x_4(D), 0.0, _1, 0.0};
  _6 = {y_2(D), y_2(D), x_4(D), x_4(D)};
  _7 = __builtin_ia32_cmpgtps (_6, _5);
  _8 = __builtin_ia32_movmskps (_7); [tail call]
  return _8; 

{y_2(D), y_2(D), x_4(D), x_4(D)} should be handled by target vec_init.

The quoted clang code needs more than just SSE2.

[Bug target/121099] GCC doesn't optimize `_mm_set_ps()` very well

2025-07-15 Thread lh_mouse at 126 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121099

--- Comment #1 from LIU Hao  ---
Given `y` in XMM0 and `x` in XMM1, `_mm_set_ps(x, x, y, y)` is clearly just
`vshufps xmm2, xmm0, xmm1, 0` no matter what.

[Bug target/121099] GCC doesn't optimize `_mm_set_ps()` very well

2025-07-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121099

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement