[Bug target/114944] Codegen of __builtin_shuffle for an 16-byte uint8_t vector is suboptimal on SSE2

2024-05-06 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114944 --- Comment #4 from Alexander Monakov --- Like this: pandxmm1, XMMWORD PTR .LC0[rip] movaps XMMWORD PTR [rsp-40], xmm0 xor eax, eax xor edx, edx movaps XMMWORD PTR [rsp-24], xmm1 mov

[Bug target/114944] Codegen of __builtin_shuffle for an 16-byte uint8_t vector is suboptimal on SSE2

2024-05-06 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114944 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Com

[Bug target/114944] Codegen of __builtin_shuffle for an 16-byte uint8_t vector is suboptimal on SSE2

2024-05-06 Thread john_platts at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114944 --- Comment #2 from John Platts --- Here is more optimal codegen for SSE2ShuffleI8 on x86_64: SSE2ShuffleI8(long long __vector(2), long long __vector(2)): pandxmm1, XMMWORD PTR .LC0[rip] movaps XMMWORD PTR [rsp-24], xmm0

[Bug target/114944] Codegen of __builtin_shuffle for an 16-byte uint8_t vector is suboptimal on SSE2

2024-05-04 Thread john_platts at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114944 John Platts changed: What|Removed |Added Target||x86_64-*-*, i?86-*-* --- Comment #1 from