https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102806
Bug ID: 102806 Summary: [x86] Suboptimal codegen for v4hi vector concat under -mavx512bw and -mavx512vl Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: wwwhhhyyy333 at gmail dot com Target Milestone: --- For typedef short v8hi __attribute__((vector_size (16))); typedef short v4hi __attribute__((vector_size (8))); v8hi foov (v4hi a, v4hi b) { return __builtin_shufflevector (a, b, 0, 1, 2, 3, 4, 5, 6, 7); } gcc -O2 -mavx512vl -mavx512bw: vmovq %xmm0, %xmm2 vmovq %xmm1, %xmm1 vmovdqa .LC0(%rip), %xmm0 vpermi2w %xmm1, %xmm2, %xmm0 ret While clang with same option: vmovlhps %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0],xmm1[0] retq It looks like expand order of permutation should be adjusted