https://issues.dlang.org/show_bug.cgi?id=17484
Issue ID: 17484 Summary: high penalty for vbroadcastsd with -mcpu=avx Product: D Version: D2 Hardware: All OS: All Status: NEW Severity: normal Priority: P3 Component: dmd Assignee: nob...@puremagic.com Reporter: c...@dawg.eu With -mcpu=avx, the compiler emits vbroadcastsd ymm2, qword ptr [rsp] even when initializing only 128-bit wide double2 variables. This causes a high 50-80 cycle penalty when later some legacy SSE instruction is used with such a register value (or a derived value), because the CPU does not know that the upper bits are zero, and apparently preserves them in an internal register buffer. https://software.intel.com/en-us/articles/intel-avx-state-transitions-migrating-sse-code-to-avx We should A not write to 256-bit wide YMM registers when only 128-bit wide XMM registers are used, and B avoid mixing legacy encoded SSE instructions (movsd) with vex encoded AVX-128 instructions, i.e. use vmovsd instead of movsd. --