https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83250
Bug ID: 83250 Summary: _mm256_zextsi128_si256 missing for AVX2 zero extension Product: gcc Version: 7.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: zoltan at hidvegi dot com Target Milestone: --- Target: x86_64-*-* I would like to zero out the upper 128-bit of a 256-bit vector. The Intel intrinsics for that is _mm256_zextsi128_si256 but it's missing from gcc-7. The Intel docs says that it should generate no code, but sometimes it is necessary to generate vmovdqa %xmm0,%xmm0. Usually this is not necessary, since it is enough to change the instruction generating the value to use 128-bit AVX on xmm which auto-zeroes. _mm256_castsi128_si256 is similar, but leaves the upper bits undefined. Workaround is __m256i get_lo(__m256i x) { __m128i r; __asm__("vmovdqa %1,%0" : "=x" (r) : "x" (_mm256_castsi256_si128(x))); return _mm256_castsi128_si256(r); } I would like to write that without inline asm, and it seems that gcc can sometimes generate vmov for zero-extension, but I do not know how to trigger that.