Hi all,

Turns out __builtin_convertvector is not as good a fit for the widening and 
narrowing intrinsics as I had hoped.
During the veclower phase we lower most of it to bitfield operations and hope 
DCE cleans it back up into
vector pack/unpack and extend operations. I received reports that in more 
complex cases GCC fails to do that
and we're left with many vector extract operations that clutter the output.

I think veclower can be improved on that front, but for GCC 10 I'd like to just 
implement these builtins
with a good old RTL builtin rather than inline asm.

Bootstrapped and tested on aarch64-none-linux-gnu.
Pushing to trunk.
Thanks,
Kyrill

gcc/
        * config/aarch64/aarch64-simd.md (aarch64_<su>xtl<mode>): Define.
        (aarch64_xtn<mode>): Likewise.
        * config/aarch64/aarch64-simd-builtins.def (sxtl, uxtl, xtn): Define
        builtins.
        * config/aarch64/arm_neon.h (vmovl_s8): Reimplement using
        builtin.
        (vmovl_s16): Likewise.
        (vmovl_s32): Likewise.
        (vmovl_u8): Likewise.
        (vmovl_u16): Likewise.
        (vmovl_u32): Likewise.
        (vmovn_s16): Likewise.
        (vmovn_s32): Likewise.
        (vmovn_s64): Likewise.
        (vmovn_u16): Likewise.
        (vmovn_u32): Likewise.
        (vmovn_u64): Likewise.

Attachment: vmovnl.patch
Description: vmovnl.patch

Reply via email to