To prepare for adding AVX2 functions for different block widths, change
VVCInterDSPContext to contain (*sad[6]) instead of (*sad). This also default
initializes the pointer array with the scalar function and the calling sites to
jump to the correct function based on block width. There's no chang
ull
+++ b/libavcodec/x86/vvc/vvc_sad.asm
@@ -0,0 +1,193 @@
+; /*
+; * Provide SIMD DMVR SAD functions for VVC decoding
+; *
+; * Copyright (c) 2024 Stone Chen
+; *
+; * This file is part of FFmpeg.
+; *
+; * FFmpeg is free software; you can redistribute it and/or
+; * modify it under the terms of the G
Adds checkasm for DMVR SAD AVX2 implementation.
Benchmarks ( AMD 7940HS )
vvc_sad_8_16bpc_c: 112.5
vvc_sad_8_16bpc_avx2: 2.5
vvc_sad_16_16bpc_c: 232.5
vvc_sad_16_16bpc_avx2: 22.5
vvc_sad_32_16bpc_c: 912.5
vvc_sad_32_16bpc_avx2: 82.5
vvc_sad_64_16bpc_c: 3582.5
vvc_sad_64_16bpc_avx2: 392.5
vvc_sad_1
On Wed, May 1, 2024 at 6:59 PM Andreas Rheinhardt <
andreas.rheinha...@outlook.com> wrote:
> Stone Chen:
> > To prepare for adding AVX2 functions for different block widths, change
> VVCInterDSPContext to contain (*sad[6]) instead of (*sad). This also
> default initializes t
1184c731c
--- /dev/null
+++ b/libavcodec/x86/vvc/vvc_sad.asm
@@ -0,0 +1,155 @@
+; /*
+; * Provide SIMD DMVR SAD functions for VVC decoding
+; *
+; * Copyright (c) 2024 Stone Chen
+; *
+; * This file is part of FFmpeg.
+; *
+; * FFmpeg is free software; you can redistribute it and/or
+; * modify it
Adds checkasm for DMVR SAD AVX2 implementation.
Benchmarks ( AMD 7940HS )
vvc_sad_8x8_c: 63.0
vvc_sad_8x8_avx2: 3.0
vvc_sad_16x16_c: 263.0
vvc_sad_16x16_avx2: 23.0
vvc_sad_32x32_c: 1003.0
vvc_sad_32x32_avx2: 83.0
vvc_sad_64x64_c: 3923.0
vvc_sad_64x64_avx2: 373.0
vvc_sad_128x128_c: 17533.0
vvc_sad_
c/x86/vvc/vvc_sad.asm
new file mode 100644
index 00..530142ad35
--- /dev/null
+++ b/libavcodec/x86/vvc/vvc_sad.asm
@@ -0,0 +1,157 @@
+; /*
+; * Provide SIMD DMVR SAD functions for VVC decoding
+; *
+; * Copyright (c) 2024 Stone Chen
+; *
+; * This file is part of FFmpeg.
+; *
+; * FFmpe
Adds checkasm for DMVR SAD AVX2 implementation.
Benchmarks ( AMD 7940HS )
vvc_sad_8x8_c: 63.0
vvc_sad_8x8_avx2: 3.0
vvc_sad_16x16_c: 263.0
vvc_sad_16x16_avx2: 23.0
vvc_sad_32x32_c: 1003.0
vvc_sad_32x32_avx2: 83.0
vvc_sad_64x64_c: 3923.0
vvc_sad_64x64_avx2: 373.0
vvc_sad_128x128_c: 17533.0
vvc_sad_
On Sat, May 18, 2024 at 9:04 AM Ronald S. Bultje wrote:
> Hi,
>
> On Tue, May 14, 2024 at 4:40 PM Stone Chen
> wrote:
>
>> Implements AVX2 DMVR (decoder-side motion vector refinement) SAD
>> functions. DMVR SAD is only calculated if w >= 8, h >= 8, and w * h &g
On Sat, May 18, 2024 at 11:33 AM Ronald S. Bultje
wrote:
> Hi,
>
> On Tue, May 14, 2024 at 4:40 PM Stone Chen
> wrote:
>
>> +vvc_sad_8:
>> +.loop_height:
>> +movu xm0, [src1q]
>> +movu xm1, [src2q]
>
codec/x86/vvc/vvc_sad.asm
new file mode 100644
index 00..58a24635d2
--- /dev/null
+++ b/libavcodec/x86/vvc/vvc_sad.asm
@@ -0,0 +1,138 @@
+; /*
+; * Provide SIMD DMVR SAD functions for VVC decoding
+; *
+; * Copyright (c) 2024 Stone Chen
+; *
+; * This file is part of FFmpeg.
+; *
+; * FFmpe
Adds checkasm for DMVR SAD AVX2 implementation.
Benchmarks ( AMD 7940HS )
vvc_sad_8x8_c: 70.0
vvc_sad_8x8_avx2: 10.0
vvc_sad_16x16_c: 280.0
vvc_sad_16x16_avx2: 20.0
vvc_sad_32x32_c: 1020.0
vvc_sad_32x32_avx2: 70.0
vvc_sad_64x64_c: 3560.0
vvc_sad_64x64_avx2: 270.0
vvc_sad_128x128_c: 13760.0
vvc_sad
codec/x86/vvc/vvc_sad.asm
new file mode 100644
index 00..58a24635d2
--- /dev/null
+++ b/libavcodec/x86/vvc/vvc_sad.asm
@@ -0,0 +1,138 @@
+; /*
+; * Provide SIMD DMVR SAD functions for VVC decoding
+; *
+; * Copyright (c) 2024 Stone Chen
+; *
+; * This file is part of FFmpeg.
+; *
+; * FFmpe
Adds checkasm for DMVR SAD AVX2 implementation.
Benchmarks ( AMD 7940HS )
vvc_sad_8x8_c: 70.0
vvc_sad_8x8_avx2: 10.0
vvc_sad_16x16_c: 280.0
vvc_sad_16x16_avx2: 20.0
vvc_sad_32x32_c: 1020.0
vvc_sad_32x32_avx2: 70.0
vvc_sad_64x64_c: 3560.0
vvc_sad_64x64_avx2: 270.0
vvc_sad_128x128_c: 13760.0
vvc_sad
codec/x86/vvc/vvc_sad.asm
new file mode 100644
index 00..9766446b11
--- /dev/null
+++ b/libavcodec/x86/vvc/vvc_sad.asm
@@ -0,0 +1,130 @@
+; /*
+; * Provide SIMD DMVR SAD functions for VVC decoding
+; *
+; * Copyright (c) 2024 Stone Chen
+; *
+; * This file is part of FFmpeg.
+; *
+; * FFmpeg is
Adds checkasm for DMVR SAD AVX2 implementation.
Benchmarks ( AMD 7940HS )
vvc_sad_8x8_c: 50.3
vvc_sad_8x8_avx2: 0.3
vvc_sad_16x16_c: 250.3
vvc_sad_16x16_avx2: 10.3
vvc_sad_32x32_c: 1020.3
vvc_sad_32x32_avx2: 60.3
vvc_sad_64x64_c: 3850.3
vvc_sad_64x64_avx2: 220.3
vvc_sad_128x128_c: 14100.3
vvc_sad_
On Mon, May 20, 2024 at 7:23 AM Ronald S. Bultje wrote:
> Hi,
>
> This is mostly good, the following is tiny nitpicks.
>
> On Sun, May 19, 2024 at 8:46 PM Stone Chen
> wrote:
>
>> +%macro INIT_OFFSET 6 ; src1, src2, dxq, dyq, off1, off2
>>
>
> The macro is
On Thu, May 23, 2024 at 9:18 AM Nuo Mi wrote:
> On Thu, May 23, 2024 at 7:38 AM James Almer wrote:
>
> > On 5/21/2024 10:01 PM, Ronald S. Bultje wrote:
> > > Hi,
> > >
> > > On Tue, May 21, 2024 at 8:01 PM Stone Chen
> > wrote:
> > >
&
According to the VVC specification (section 8.5.1), the maximum width/height of
a subblock passed for DMVR SAD is 16. This along with previous constraint
requiring width * height >= 128 means that 8x16, 16x8, and 16x16 are the only
allowed sizes. This re-labels vvc_sad_16_128 to vvc_sad_16 to r
According to the VVC specification (section 8.5.1), the maximum width/height of
a subblock passed for DMVR SAD is 16. This along with previous constraint
requiring width * height >= 128 means that 8x16, 16x8, and 16x16 are the only
allowed sizes.
This changes check_vvc_sad() to only test and b
In commit 6c45d34, a line was added that always sets rdiv to 0, overriding any
user input. This removes that line allowing user set values for 0rdiv, 1rdiv,
2rdiv, 3rdiv to apply as expected. This fixes ticket #10294.
Signed-off-by: Stone Chen
---
libavfilter/vf_convolution.c | 1 -
1 file
Sorry I think I didn't correctly attach the patch the first time.
On Sun, Feb 18, 2024 at 2:21 PM Stone Chen wrote:
> In commit 6c45d34, a line was added that always sets rdiv to 0, overriding
> any user input. This removes that line allowing user set values for 0rdiv,
> 1rdiv, 2
Hi Marton,
Thanks for the feedback!
I'm not sure what dynamic reconfiguration is, from some searching I think
it might be related to commands?
On Sun, Feb 18, 2024 at 7:08 PM Marton Balint wrote:
>
>
> On Sun, 18 Feb 2024, Stone Chen wrote:
>
> > In commit 6c45d34
Hi Marton,
Thanks for the feedback!
I'm not sure what dynamic reconfiguration is, from some searching I think
it might be related to commands?
On Sun, Feb 18, 2024 at 7:08 PM Marton Balint wrote:
>
>
> On Sun, 18 Feb 2024, Stone Chen wrote:
>
> > In commit 6c45d34
Previously to support dynamic reconfigurations of the matrix string (e.g. 0m),
the rdiv values would always be cleared to 0.f, causing the rdiv to be
recalculated based on the new filter. This however had the side effect of
always ignoring user specified rdiv values.
Instead float user_rdiv[0]
Sorry I just realized I messed up my git commit (new to git), I've attached
a patch file with that correction.
On Sat, Feb 24, 2024 at 10:49 AM Stone Chen
wrote:
> Previously to support dynamic reconfigurations of the matrix string (e.g.
> 0m), the rdiv values would always be cle
On Sat, Feb 24, 2024 at 3:56 PM Marton Balint wrote:
>
>
> On Sat, 24 Feb 2024, Stone Chen wrote:
>
> > Previously to support dynamic reconfigurations of the matrix string
> (e.g. 0m), the rdiv values would always be cleared to 0.f, causing the rdiv
> to be recalculated
On Sat, Feb 24, 2024 at 6:34 PM Marton Balint wrote:
>
>
> On Sat, 24 Feb 2024, Stone Chen wrote:
>
> > On Sat, Feb 24, 2024 at 3:56 PM Marton Balint wrote:
> >
> >>
> >>
> >> On Sat, 24 Feb 2024, Stone Chen wrote:
> >>
> >>
The documentation correctly states that the rdiv is a multiplier but
incorrectly states the default behavior is to multiply by the sum of all matrix
elements - it multiplies by 1/sum.
This changes the documentation to match the code.
---
doc/filters.texi | 2 +-
1 file changed, 1 insertion(+),
On Wed, Mar 13, 2024 at 4:26 AM Marton Balint wrote:
>
>
> On Tue, 12 Mar 2024, Stone Chen wrote:
>
> > The documentation correctly states that the rdiv is a multiplier but
> incorrectly states the default behavior is to multiply by the sum of all
> matrix elements
The documentation correctly states that the rdiv is a multiplier but
incorrectly states the default behavior is to multiply by the sum of all matrix
elements - it multiplies by 1/sum.
This changes the documentation to match the code.
Address trac #10889
---
doc/filters.texi | 2 +-
1 file chan
31 matches
Mail list logo