> Does this not render the type parameter of bilin_load useless (always h)?
> (Not a blocker for this patch.)
Yes, this was needed in the initial version, but it is no longer required.
I just sent a patch.
> Not sure if I already asked this but is this really faster than slide1?
> Normally we wan
Le torstaina 30. toukokuuta 2024, 18.26.53 EEST u...@foxmail.com a écrit :
> From: sunyuechi
>
> Since len < 64, the registers are sufficient, so it can be
> directly unrolled (a4 is even).
>
> Another benefit of unrolling is that it reduces one load operation
> vertically compared to horizontal
ping
于2024年5月30日周四 23:27写道:
> From: sunyuechi
>
> Since len < 64, the registers are sufficient, so it can be
> directly unrolled (a4 is even).
>
> Another benefit of unrolling is that it reduces one load operation
> vertically compared to horizontally.
>
> old
Well.. because scalar registers are limited, the direct unrolling will be
like this for now. We can handle different lengths separately in the future
flow gg 于2024年5月30日周四 23:36写道:
> I directly copied the VP9 modifications over... Since len <= 16, it seems
> like it can be improved a bit more
>
I directly copied the VP9 modifications over... Since len <= 16, it seems
like it can be improved a bit more
于2024年5月30日周四 23:27写道:
> From: sunyuechi
>
> Since len < 64, the registers are sufficient, so it can be
> directly unrolled (a4 is even).
>
> Another benefit of unrolling is that it redu
From: sunyuechi
Since len < 64, the registers are sufficient, so it can be
directly unrolled (a4 is even).
Another benefit of unrolling is that it reduces one load operation
vertically compared to horizontally.
old new