https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83250

            Bug ID: 83250
           Summary: _mm256_zextsi128_si256 missing for AVX2 zero extension
           Product: gcc
           Version: 7.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: zoltan at hidvegi dot com
  Target Milestone: ---
            Target: x86_64-*-*

I would like to zero out the upper 128-bit of a 256-bit vector. The Intel
intrinsics for that is _mm256_zextsi128_si256 but it's missing from gcc-7. The
Intel docs says that it should generate no code, but sometimes it is necessary
to generate vmovdqa %xmm0,%xmm0. Usually this is not necessary, since it is
enough to change the instruction generating the value to use 128-bit AVX on xmm
which auto-zeroes. _mm256_castsi128_si256 is similar, but leaves the upper bits
undefined.

Workaround is

__m256i get_lo(__m256i x)
{
    __m128i r;
    __asm__("vmovdqa %1,%0" : "=x" (r) : "x" (_mm256_castsi256_si128(x)));
    return _mm256_castsi128_si256(r);
}

I would like to write that without inline asm, and it seems that gcc can
sometimes generate vmov for zero-extension, but I do not know how to trigger
that.

Reply via email to