Re: [FFmpeg-devel] [PATCH v4 1/2][GSoC 2024] libavcodec/x86/vvc: Add AVX2 DMVR SAD functions for VVC

2024-05-21 Thread Stone Chen
On Mon, May 20, 2024 at 7:23 AM Ronald S. Bultje wrote: > Hi, > > This is mostly good, the following is tiny nitpicks. > > On Sun, May 19, 2024 at 8:46 PM Stone Chen > wrote: > >> +%macro INIT_OFFSET 6 ; src1, src2, dxq, dyq, off1, off2 >> > > The macro is only used once, so you could inline it

Re: [FFmpeg-devel] [PATCH v4 1/2][GSoC 2024] libavcodec/x86/vvc: Add AVX2 DMVR SAD functions for VVC

2024-05-20 Thread Ronald S. Bultje
Hi, one more, I forgot. On Sun, May 19, 2024 at 8:46 PM Stone Chen wrote: > +pw_1: dw 1 > [..] > +vpbroadcastw m4, [pw_1] > We typically suggest to use vpbroadcastd, not w (and then pw_1: times 2 dw 1). agner shows that on e.g. Haswell, the former (d) is 1 uops with 5 cycles

Re: [FFmpeg-devel] [PATCH v4 1/2][GSoC 2024] libavcodec/x86/vvc: Add AVX2 DMVR SAD functions for VVC

2024-05-20 Thread Ronald S. Bultje
Hi, This is mostly good, the following is tiny nitpicks. On Sun, May 19, 2024 at 8:46 PM Stone Chen wrote: > +%macro INIT_OFFSET 6 ; src1, src2, dxq, dyq, off1, off2 > The macro is only used once, so you could inline it in the calling function. > > +imul%5, 128 > +imul

[FFmpeg-devel] [PATCH v4 1/2][GSoC 2024] libavcodec/x86/vvc: Add AVX2 DMVR SAD functions for VVC

2024-05-19 Thread Stone Chen
Implements AVX2 DMVR (decoder-side motion vector refinement) SAD functions. DMVR SAD is only calculated if w >= 8, h >= 8, and w * h > 128. To reduce complexity, SAD is only calculated on even rows. This is calculated for all video bitdepths, but the values passed to the function are always

[FFmpeg-devel] [PATCH v4 1/2][GSoC 2024] libavcodec/x86/vvc: Add AVX2 DMVR SAD functions for VVC

2024-05-19 Thread Stone Chen
Implements AVX2 DMVR (decoder-side motion vector refinement) SAD functions. DMVR SAD is only calculated if w >= 8, h >= 8, and w * h > 128. To reduce complexity, SAD is only calculated on even rows. This is calculated for all video bitdepths, but the values passed to the function are always