https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79938

            Bug ID: 79938
           Summary: gcc unnecessarily spills xmm register to stack when
                    inserting vector items
           Product: gcc
           Version: 6.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: postmaster at raasu dot org
  Target Milestone: ---

Created attachment 40906
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40906&action=edit
assembler output

When adding together values from one vector and storing the results to another,
gcc uses two xmm registers instead of one and spills the second xmm register to
stack when it runs out of general purpose registers.

Instead of spilling the second xmm register to stack, it should use only one
xmm register as destination, because the addition is already being done using
four general purpose registers.

Using gcc -msse4.1 -O3 -S hadd.c -Wall -Wextra -fno-strict-aliasing -fwrapv -o
hadd.s

mika@LENOVO:~$ gcc --version
gcc (Ubuntu 6.2.0-3ubuntu11~14.04) 6.2.0 20160901
---
#include <x86intrin.h>
#include <inttypes.h>
#include <stdio.h>

typedef uint8_t   v1si __attribute__ ((vector_size (16)));
typedef uint16_t  v2si __attribute__ ((vector_size (16)));
typedef uint32_t  v4si __attribute__ ((vector_size (16)));
typedef uint64_t  v8si __attribute__ ((vector_size (16)));

static __m128i haddd_epu8(__m128i a)
{
  v1si b = (v1si)a;
  v4si ret;
  ret[0]  = (b[ 0] + b[ 1]) + (b[ 2] + b[ 3]);
  ret[1]  = (b[ 4] + b[ 5]) + (b[ 6] + b[ 7]);
  ret[2]  = (b[ 8] + b[ 9]) + (b[10] + b[11]);
  ret[3]  = (b[12] + b[13]) + (b[14] + b[15]);
  return (__m128i)ret;
}

int main(int argc, char *argv[])
{
  __m128i a = _mm_set1_epi8(atoi(argv[1]));
  __m128i b = haddd_epu8(a);
  v4si c = (v4si)b;
  printf("b[0] = %i, b[1] = %i, b[2] = %i, b[3] = %i\n", c[0], c[1], c[2],
c[3]);
}

Reply via email to