https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102806

            Bug ID: 102806
           Summary: [x86] Suboptimal codegen for v4hi vector concat under
                    -mavx512bw and -mavx512vl
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: wwwhhhyyy333 at gmail dot com
  Target Milestone: ---

For

typedef short v8hi __attribute__((vector_size (16)));
typedef short v4hi __attribute__((vector_size (8))); 

v8hi foov (v4hi a, v4hi b)                                       
{                                                               
 return __builtin_shufflevector (a, b, 0, 1, 2, 3, 4, 5, 6, 7);
}

gcc -O2 -mavx512vl -mavx512bw:

        vmovq   %xmm0, %xmm2
        vmovq   %xmm1, %xmm1
        vmovdqa .LC0(%rip), %xmm0
        vpermi2w        %xmm1, %xmm2, %xmm0
        ret

While clang with same option:

        vmovlhps        %xmm1, %xmm0, %xmm0             # xmm0 =
xmm0[0],xmm1[0]
        retq

It looks like expand order of permutation should be adjusted

Reply via email to