https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107548
Bug ID: 107548 Summary: STV doesn't consider vec_select Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- typedef unsigned int v4si __attribute__((vector_size(16))); unsigned f (v4si a, v4si b) { a[0] += b[0]; return a[0] + a[1]; } gets optimized to f: .LFB0: .cfi_startproc vpextrd $1, %xmm0, %edx vmovd %xmm0, %eax addl %edx, %eax vmovd %xmm1, %edx addl %edx, %eax ret with znver2 arch, but similar with others while it seems to be beneficial to shuffle a[1] to a'[0] and perform the add on the vector side eliding two xmm->gpr moves. STV2 sees 19: r94:V4SI=xmm0:V4SI REG_DEAD xmm0:V4SI 2: r87:V4SI=r94:V4SI REG_DEAD r94:V4SI 20: r95:V4SI=xmm1:V4SI REG_DEAD xmm1:V4SI 3: NOTE_INSN_DELETED 4: NOTE_INSN_FUNCTION_BEG 7: r90:SI=vec_select(r87:V4SI,parallel) 8: r91:SI=vec_select(r87:V4SI,parallel) REG_DEAD r87:V4SI 9: {r92:SI=r90:SI+r91:SI;clobber flags:CC;} REG_DEAD r91:SI REG_DEAD r90:SI REG_UNUSED flags:CC 10: r93:SI=vec_select(r95:V4SI,parallel) REG_DEAD r95:V4SI 11: {r89:SI=r92:SI+r93:SI;clobber flags:CC;} REG_DEAD r93:SI REG_DEAD r92:SI REG_UNUSED flags:CC 16: ax:SI=r89:SI REG_DEAD r89:SI 17: use ax:SI but it lacks vec_select support: Created a new instruction chain #1 Building chain #1... Adding insn 9 to chain #1 Adding insn 11 into chain's #1 queue r90 def in insn 7 isn't convertible Mark r90 def in insn 7 as requiring both modes in chain #1 r91 def in insn 8 isn't convertible Mark r91 def in insn 8 as requiring both modes in chain #1 Adding insn 11 to chain #1 r89 use in insn 16 isn't convertible Mark r89 def in insn 11 as requiring both modes in chain #1 r93 def in insn 10 isn't convertible Mark r93 def in insn 10 as requiring both modes in chain #1 Collected chain #1... insns: 9, 11 defs to convert: r89, r90, r91, r93 Computing gain for chain #1... Instruction conversion gain: 0 Registers conversion cost: 24 Total gain: -24 Chain #1 conversion is not profitable