Re: [FFmpeg-devel] [PATCH v2 0/7] arm64 neon implementation for 8bits functions

2022-10-04 Thread Grzegorz Bernacki
Great!! Thanks a lot for your help and your review. thanks, greg wt., 4 paź 2022 o 12:57 Martin Storsjö napisał(a): > On Mon, 3 Oct 2022, Grzegorz Bernacki wrote: > > > Changes since v1: > > > > - changed tabs to spaces > > - modified branch instruction in vsse

[FFmpeg-devel] [PATCH v2 7/7] aarch64: me_cmp: Improve scheduling in vsse_intra8

2022-10-03 Thread Grzegorz Bernacki
From: Martin Storsjö Before: Cortex A53A72A73 vsse_5_neon: 74.7 31.5 26.0 After: vsse_5_neon: 62.7 32.5 25.7 --- libavcodec/aarch64/me_cmp_neon.S | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/libavcodec/aarch64/me_cmp_neon.S b/libavcodec/aa

[FFmpeg-devel] [PATCH v2 6/7] lavc/aarch64: Add neon implementation for vsse_intra8

2022-10-03 Thread Grzegorz Bernacki
Provide optimized implementation for vsse_intra8 for arm64. Performance tests are shown below. - vsse_5_c: 87.7 - vsse_5_neon: 26.2 Benchmarks and tests are run with checkasm tool on AWS Graviton 3. --- libavcodec/aarch64/me_cmp_init_aarch64.c | 4 ++ libavcodec/aarch64/me_cmp_neon.S |

[FFmpeg-devel] [PATCH v2 5/7] lavc/aarch64: Provide optimized implementation of vsse8 for arm64.

2022-10-03 Thread Grzegorz Bernacki
Provide optimized implementation of vsse8 for arm64. Performance comparison tests are shown below. - vsse_1_c: 141.5 - vsse_1_neon: 32.5 Benchmarks and tests are run with checkasm tool on AWS Graviton 3. Signed-off-by: Grzegorz Bernacki --- libavcodec/aarch64/me_cmp_init_aarch64.c | 5

[FFmpeg-devel] [PATCH v2 4/7] lavc/aarch64: Provide neon implementation of nsse8

2022-10-03 Thread Grzegorz Bernacki
Add vectorized implementation of nsse8 function. Performance comparison tests are shown below. - nsse_1_c: 256.0 - nsse_1_neon: 82.7 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Grzegorz Bernacki --- libavcodec/aarch64/me_cmp_init_aarch64.c | 15

[FFmpeg-devel] [PATCH v2 3/7] aarch64: me_cmp: Fix up the prologue of ff_pix_abs8_xy2_neon

2022-10-03 Thread Grzegorz Bernacki
From: Martin Storsjö This initializes things properly if this were to be called with h < 4. --- libavcodec/aarch64/me_cmp_neon.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libavcodec/aarch64/me_cmp_neon.S b/libavcodec/aarch64/me_cmp_neon.S index 3662419edf..cfba3eb33

[FFmpeg-devel] [PATCH v2 2/7] aarch64: me_cmp: Improve scheduling in ff_pix_abs8_y2_neon

2022-10-03 Thread Grzegorz Bernacki
From: Martin Storsjö Before: Cortex A53A72A73 pix_abs_1_2_neon: 73.7 31.0 25.7 After: pix_abs_1_2_neon: 61.7 30.2 24.7 Signed-off-by: Martin Storsjö --- libavcodec/aarch64/me_cmp_neon.S | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/

[FFmpeg-devel] [PATCH v2 1/7] lavc/aarch64: Add neon implementation for pix_abs8 functions.

2022-10-03 Thread Grzegorz Bernacki
AWS Graviton 3. Signed-off-by: Grzegorz Bernacki --- libavcodec/aarch64/me_cmp_init_aarch64.c | 9 ++ libavcodec/aarch64/me_cmp_neon.S | 193 +++ 2 files changed, 202 insertions(+) diff --git a/libavcodec/aarch64/me_cmp_init_aarch64.c b/libavcodec/aarch64

[FFmpeg-devel] [PATCH v2 0/7] arm64 neon implementation for 8bits functions

2022-10-03 Thread Grzegorz Bernacki
Changes since v1: - changed tabs to spaces - modified branch instruction in vsse8 - apply Martin's patches with improved instructions scheduling Grzegorz Bernacki (4): lavc/aarch64: Add neon implementation for pix_abs8 functions. lavc/aarch64: Provide neon implementation of nsse8

Re: [FFmpeg-devel] [PATCH 1/4] lavc/aarch64: Add neon implementation for pix_abs8 functions.

2022-09-28 Thread Grzegorz Bernacki
napisał(a): > On Mon, 26 Sep 2022, Grzegorz Bernacki wrote: > > > Provide optimized implementation of pix_abs8 function for arm64. > > > > Performance comparison tests are shown below: > > pix_abs_1_1_c: 162.5 > > pix_abs_1_1_neon: 27.0 > > pix_a

[FFmpeg-devel] [PATCH 4/4] lavc/aarch64: Add neon implementation for vsse_intra8

2022-09-26 Thread Grzegorz Bernacki
Provide optimized implementation for vsse_intra8 for arm64. Performance tests are shown below. - vsse_5_c: 87.7 - vsse_5_neon: 26.2 Benchmarks and tests are run with checkasm tool on AWS Graviton 3. --- libavcodec/aarch64/me_cmp_init_aarch64.c | 4 ++ libavcodec/aarch64/me_cmp_neon.S |

[FFmpeg-devel] [PATCH 3/4] lavc/aarch64: Provide optimized implementation of vsse8 for arm64.

2022-09-26 Thread Grzegorz Bernacki
Provide optimized implementation of vsse8 for arm64. Performance comparison tests are shown below. - vsse_1_c: 141.5 - vsse_1_neon: 32.5 Benchmarks and tests are run with checkasm tool on AWS Graviton 3. Signed-off-by: Grzegorz Bernacki --- libavcodec/aarch64/me_cmp_init_aarch64.c | 5

[FFmpeg-devel] [PATCH 2/4] lavc/aarch64: Provide neon implementation of nsse8

2022-09-26 Thread Grzegorz Bernacki
Add vectorized implementation of nsse8 function. Performance comparison tests are shown below. - nsse_1_c: 256.0 - nsse_1_neon: 82.7 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Grzegorz Bernacki --- libavcodec/aarch64/me_cmp_init_aarch64.c | 15

[FFmpeg-devel] [PATCH 1/4] lavc/aarch64: Add neon implementation for pix_abs8 functions.

2022-09-26 Thread Grzegorz Bernacki
AWS Graviton 3. Signed-off-by: Grzegorz Bernacki --- libavcodec/aarch64/me_cmp_init_aarch64.c | 9 ++ libavcodec/aarch64/me_cmp_neon.S | 193 +++ 2 files changed, 202 insertions(+) diff --git a/libavcodec/aarch64/me_cmp_init_aarch64.c b/libavcodec/aarch64