On Mon, May 20, 2024 at 7:23 AM Ronald S. Bultje wrote:
> Hi,
>
> This is mostly good, the following is tiny nitpicks.
>
> On Sun, May 19, 2024 at 8:46 PM Stone Chen
> wrote:
>
>> +%macro INIT_OFFSET 6 ; src1, src2, dxq, dyq, off1, off2
>>
>
> The macro is only used once, so you could inline it
Hi,
one more, I forgot.
On Sun, May 19, 2024 at 8:46 PM Stone Chen wrote:
> +pw_1: dw 1
>
[..]
> +vpbroadcastw m4, [pw_1]
>
We typically suggest to use vpbroadcastd, not w (and then pw_1: times 2 dw
1). agner shows that on e.g. Haswell, the former (d) is 1 uops with 5
cycles
Hi,
This is mostly good, the following is tiny nitpicks.
On Sun, May 19, 2024 at 8:46 PM Stone Chen wrote:
> +%macro INIT_OFFSET 6 ; src1, src2, dxq, dyq, off1, off2
>
The macro is only used once, so you could inline it in the calling function.
>
> +imul%5, 128
> +imul
Implements AVX2 DMVR (decoder-side motion vector refinement) SAD functions.
DMVR SAD is only calculated if w >= 8, h >= 8, and w * h > 128. To reduce
complexity, SAD is only calculated on even rows. This is calculated for all
video bitdepths, but the values passed to the function are always
Implements AVX2 DMVR (decoder-side motion vector refinement) SAD functions.
DMVR SAD is only calculated if w >= 8, h >= 8, and w * h > 128. To reduce
complexity, SAD is only calculated on even rows. This is calculated for all
video bitdepths, but the values passed to the function are always