Re: [FFmpeg-devel] [PATCH] lavc/vp8dsp: R-V V put_bilin_h v unroll

2024-06-12 Thread flow gg
> Does this not render the type parameter of bilin_load useless (always h)? > (Not a blocker for this patch.) Yes, this was needed in the initial version, but it is no longer required. I just sent a patch. > Not sure if I already asked this but is this really faster than slide1? > Normally we wan

Re: [FFmpeg-devel] [PATCH] lavc/vp8dsp: R-V V put_bilin_h v unroll

2024-06-12 Thread Rémi Denis-Courmont
Le torstaina 30. toukokuuta 2024, 18.26.53 EEST u...@foxmail.com a écrit : > From: sunyuechi > > Since len < 64, the registers are sufficient, so it can be > directly unrolled (a4 is even). > > Another benefit of unrolling is that it reduces one load operation > vertically compared to horizontal

Re: [FFmpeg-devel] [PATCH] lavc/vp8dsp: R-V V put_bilin_h v unroll

2024-06-12 Thread flow gg
ping 于2024年5月30日周四 23:27写道: > From: sunyuechi > > Since len < 64, the registers are sufficient, so it can be > directly unrolled (a4 is even). > > Another benefit of unrolling is that it reduces one load operation > vertically compared to horizontally. > > old

Re: [FFmpeg-devel] [PATCH] lavc/vp8dsp: R-V V put_bilin_h v unroll

2024-05-30 Thread flow gg
Well.. because scalar registers are limited, the direct unrolling will be like this for now. We can handle different lengths separately in the future flow gg 于2024年5月30日周四 23:36写道: > I directly copied the VP9 modifications over... Since len <= 16, it seems > like it can be improved a bit more >

Re: [FFmpeg-devel] [PATCH] lavc/vp8dsp: R-V V put_bilin_h v unroll

2024-05-30 Thread flow gg
I directly copied the VP9 modifications over... Since len <= 16, it seems like it can be improved a bit more 于2024年5月30日周四 23:27写道: > From: sunyuechi > > Since len < 64, the registers are sufficient, so it can be > directly unrolled (a4 is even). > > Another benefit of unrolling is that it redu

[FFmpeg-devel] [PATCH] lavc/vp8dsp: R-V V put_bilin_h v unroll

2024-05-30 Thread uk7b
From: sunyuechi Since len < 64, the registers are sufficient, so it can be directly unrolled (a4 is even). Another benefit of unrolling is that it reduces one load operation vertically compared to horizontally. old new