https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110400
Bug ID: 110400 Summary: Reuse vector register for both scalar and vector value. Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: crazylht at gmail dot com Target Milestone: --- >From PR109812 #c18 Uroš Bizjak 2023-06-21 09:46:43 UTC One interesting observation: clang is able to do this: 0.09 │ │ vmovddup -0x8(%rdx,%rsi,1),%xmm3 ▒ ... 0.11 │ │ vfmadd231sd %xmm2,%xmm3,%xmm1 ▒ ... 0.74 │ │ vfmadd231pd %xmm2,%xmm3,%xmm0 ▒ It figures out that duplicated V2DFmode value in %xmm3 can also be accessed in the same register as DFmode value. OTOH, current gcc does: vmovsd (%rsi,%rax,8), %xmm1 ... vmovddup %xmm1, %xmm4 ... vfmadd231pd %xmm4, %xmm0, %xmm2 ... vfmadd231sd %xmm1, %xmm0, %xmm3 The above code needs two registers. ---------------------------------------------------- Similar with below testcase typedef double v2df __attribute__((vector_size(16))); v2df c; double d; void foo (double* __restrict a) { c = __extension__(v2df) {*a, *a}; d = *a; } with option: -O2 -mavx2 GCC generates foo(double*): vmovsd (%rdi), %xmm0 vmovddup %xmm0, %xmm1 vmovsd %xmm0, d(%rip) vmovapd %xmm1, c(%rip) Clang foo(double*): # @foo(double*) vmovddup (%rdi), %xmm0 # xmm0 = mem[0,0] vmovaps %xmm0, c(%rip) vmovlps %xmm0, d(%rip) retq