Hi,
Please see attached an attempt to optimise the 8-bit input to v210enc to
reduce the number of shuffles.
This comes at the cost of having to extract the middle element and perform
a DWORD shift on it and then reinserting it.
I have added a few comments but any other ideas are welcome.
Crude be
On Fri, Oct 21, 2022 at 5:41 AM Kieran Kunhya wrote:
>
> Hi,
>
> Please see attached an attempt to optimise the 8-bit input to v210enc to
> reduce the number of shuffles.
> This comes at the cost of having to extract the middle element and perform
> a DWORD shift on it and then reinserting it.
> I
I guess it could also be scaled to ymm if you're a big Skylake fan :P
(in which case you'd probably want to reorder the shuffle indices so