https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110105
Bug ID: 110105 Summary: ARM GCC: underoptimization: expected vfma.f16, actual vcvtb-vfma.f32-vcvtb Product: gcc Version: 13.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: pavel.morozkin at gmail dot com Target Milestone: --- This code: __fp16 mul(__fp16 x, __fp16 y, __fp16 z) { return x * y + z; } compiled as: gcc -O3 -mfpu=fp-armv8 -march=armv8.2-a+fp16 produces the following assembler code: mul: vcvtb.f32.f16 s0, s0 vcvtb.f32.f16 s1, s1 vcvtb.f32.f16 s2, s2 vfma.f32 s2, s0, s1 vcvtb.f16.f32 s0, s2 bx lr Here we see vcvtb-vfma.f32-vcvtb while a single vfma.f16 is expected.