https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107548

            Bug ID: 107548
           Summary: STV doesn't consider vec_select
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

typedef unsigned int v4si __attribute__((vector_size(16)));

unsigned f (v4si a, v4si b)
{
  a[0] += b[0];
  return a[0] + a[1];
}

gets optimized to

f:
.LFB0:
        .cfi_startproc
        vpextrd $1, %xmm0, %edx
        vmovd   %xmm0, %eax
        addl    %edx, %eax
        vmovd   %xmm1, %edx
        addl    %edx, %eax
        ret

with znver2 arch, but similar with others while it seems to be beneficial
to shuffle a[1] to a'[0] and perform the add on the vector side eliding
two xmm->gpr moves.  STV2 sees

   19: r94:V4SI=xmm0:V4SI
      REG_DEAD xmm0:V4SI
    2: r87:V4SI=r94:V4SI
      REG_DEAD r94:V4SI
   20: r95:V4SI=xmm1:V4SI
      REG_DEAD xmm1:V4SI
    3: NOTE_INSN_DELETED
    4: NOTE_INSN_FUNCTION_BEG
    7: r90:SI=vec_select(r87:V4SI,parallel)
    8: r91:SI=vec_select(r87:V4SI,parallel)
      REG_DEAD r87:V4SI
    9: {r92:SI=r90:SI+r91:SI;clobber flags:CC;}
      REG_DEAD r91:SI
      REG_DEAD r90:SI
      REG_UNUSED flags:CC
   10: r93:SI=vec_select(r95:V4SI,parallel)
      REG_DEAD r95:V4SI
   11: {r89:SI=r92:SI+r93:SI;clobber flags:CC;}
      REG_DEAD r93:SI
      REG_DEAD r92:SI
      REG_UNUSED flags:CC
   16: ax:SI=r89:SI
      REG_DEAD r89:SI
   17: use ax:SI

but it lacks vec_select support:

Created a new instruction chain #1
Building chain #1...
  Adding insn 9 to chain #1
  Adding insn 11 into chain's #1 queue
  r90 def in insn 7 isn't convertible
  Mark r90 def in insn 7 as requiring both modes in chain #1
  r91 def in insn 8 isn't convertible
  Mark r91 def in insn 8 as requiring both modes in chain #1
  Adding insn 11 to chain #1
  r89 use in insn 16 isn't convertible
  Mark r89 def in insn 11 as requiring both modes in chain #1
  r93 def in insn 10 isn't convertible
  Mark r93 def in insn 10 as requiring both modes in chain #1
Collected chain #1...
  insns: 9, 11
  defs to convert: r89, r90, r91, r93
Computing gain for chain #1...
  Instruction conversion gain: 0
  Registers conversion cost: 24
  Total gain: -24
Chain #1 conversion is not profitable

Reply via email to