http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60451
Bug ID: 60451 Summary: X86 vectorization improve: pack instead of pshufb Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: evstupac at gmail dot com Created attachment 32294 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32294&action=edit test case Currently vectorizer use 2 "pshufb" and "or" for even/odd permutation. (odd case) pshufb %xmm5, %xmm1 pshufb %xmm4, %xmm0 por %xmm0, %xmm1 where %xmm4 0 2 4 6 8 a c e -1 -1 -1 -1 -1 -1 -1 -1 %xmm5 -1 -1 -1 -1 -1 -1 -1 -1 0 2 4 6 8 a c e gcc/config/i386/i386.c (expand_vec_perm_even_odd_1): case V16QImode: if (TARGET_SSSE3) return expand_vec_perm_pshufb2 (d); However in case of even/odd we can use: 2 "pand" and 1 "packuswb" pand %xmm6, %xmm0 pand %xmm6, %xmm1 packuswb %xmm1, %xmm0 where %xmm6 is 0x00ff00ff00ff00ff This will improve performance for architectures with slow pshufb instructions and reduce code size on 1 constant. For attached test Silvermont performance improve is 30%.