https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102812

            Bug ID: 102812
           Summary: Unoptimal (and wrong) code for _Float16 insert
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ubizjak at gmail dot com
  Target Milestone: ---

Following code:

--cut here--
typedef _Float16 v8hf __attribute__((__vector_size__ (16)));

v8hf t (_Float16 a)
{
  return (v8hf){a, 0, 0, 0, 0, 0, 0, 0};
}
--cut here--

compiles with -msse4 to:

        pxor    %xmm15, %xmm15
        movaps  %xmm15, -56(%rsp)
        pextrw  $0, %xmm0, -56(%rsp)
        vmovdqa64       -56(%rsp), %xmm0

PBLWNDW with cleared %xmm15 would be much more optimal, and wouldn't use
memory.

Also, VMOVDQA64 is an AVX512F/AVX512VL, not a SSE4 (not even AVX) instruction.

Reply via email to