Great!! Thanks a lot for your help and your review.
thanks,
greg
wt., 4 paź 2022 o 12:57 Martin Storsjö napisał(a):
> On Mon, 3 Oct 2022, Grzegorz Bernacki wrote:
>
> > Changes since v1:
> >
> > - changed tabs to spaces
> > - modified branch instruction in vsse
From: Martin Storsjö
Before: Cortex A53A72A73
vsse_5_neon: 74.7 31.5 26.0
After:
vsse_5_neon: 62.7 32.5 25.7
---
libavcodec/aarch64/me_cmp_neon.S | 12 ++--
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/libavcodec/aarch64/me_cmp_neon.S b/libavcodec/aa
Provide optimized implementation for vsse_intra8 for arm64.
Performance tests are shown below.
- vsse_5_c: 87.7
- vsse_5_neon: 26.2
Benchmarks and tests are run with checkasm tool on AWS Graviton 3.
---
libavcodec/aarch64/me_cmp_init_aarch64.c | 4 ++
libavcodec/aarch64/me_cmp_neon.S |
Provide optimized implementation of vsse8 for arm64.
Performance comparison tests are shown below.
- vsse_1_c: 141.5
- vsse_1_neon: 32.5
Benchmarks and tests are run with checkasm tool on AWS Graviton 3.
Signed-off-by: Grzegorz Bernacki
---
libavcodec/aarch64/me_cmp_init_aarch64.c | 5
Add vectorized implementation of nsse8 function.
Performance comparison tests are shown below.
- nsse_1_c: 256.0
- nsse_1_neon: 82.7
Benchmarks and tests run with checkasm tool on AWS Graviton 3.
Signed-off-by: Grzegorz Bernacki
---
libavcodec/aarch64/me_cmp_init_aarch64.c | 15
From: Martin Storsjö
This initializes things properly if this were to be called with
h < 4.
---
libavcodec/aarch64/me_cmp_neon.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/libavcodec/aarch64/me_cmp_neon.S b/libavcodec/aarch64/me_cmp_neon.S
index 3662419edf..cfba3eb33
From: Martin Storsjö
Before: Cortex A53A72A73
pix_abs_1_2_neon: 73.7 31.0 25.7
After:
pix_abs_1_2_neon: 61.7 30.2 24.7
Signed-off-by: Martin Storsjö
---
libavcodec/aarch64/me_cmp_neon.S | 13 ++---
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/
AWS Graviton 3.
Signed-off-by: Grzegorz Bernacki
---
libavcodec/aarch64/me_cmp_init_aarch64.c | 9 ++
libavcodec/aarch64/me_cmp_neon.S | 193 +++
2 files changed, 202 insertions(+)
diff --git a/libavcodec/aarch64/me_cmp_init_aarch64.c
b/libavcodec/aarch64
Changes since v1:
- changed tabs to spaces
- modified branch instruction in vsse8
- apply Martin's patches with improved instructions scheduling
Grzegorz Bernacki (4):
lavc/aarch64: Add neon implementation for pix_abs8 functions.
lavc/aarch64: Provide neon implementation of nsse8
napisał(a):
> On Mon, 26 Sep 2022, Grzegorz Bernacki wrote:
>
> > Provide optimized implementation of pix_abs8 function for arm64.
> >
> > Performance comparison tests are shown below:
> > pix_abs_1_1_c: 162.5
> > pix_abs_1_1_neon: 27.0
> > pix_a
Provide optimized implementation for vsse_intra8 for arm64.
Performance tests are shown below.
- vsse_5_c: 87.7
- vsse_5_neon: 26.2
Benchmarks and tests are run with checkasm tool on AWS Graviton 3.
---
libavcodec/aarch64/me_cmp_init_aarch64.c | 4 ++
libavcodec/aarch64/me_cmp_neon.S |
Provide optimized implementation of vsse8 for arm64.
Performance comparison tests are shown below.
- vsse_1_c: 141.5
- vsse_1_neon: 32.5
Benchmarks and tests are run with checkasm tool on AWS Graviton 3.
Signed-off-by: Grzegorz Bernacki
---
libavcodec/aarch64/me_cmp_init_aarch64.c | 5
Add vectorized implementation of nsse8 function.
Performance comparison tests are shown below.
- nsse_1_c: 256.0
- nsse_1_neon: 82.7
Benchmarks and tests run with checkasm tool on AWS Graviton 3.
Signed-off-by: Grzegorz Bernacki
---
libavcodec/aarch64/me_cmp_init_aarch64.c | 15
AWS Graviton 3.
Signed-off-by: Grzegorz Bernacki
---
libavcodec/aarch64/me_cmp_init_aarch64.c | 9 ++
libavcodec/aarch64/me_cmp_neon.S | 193 +++
2 files changed, 202 insertions(+)
diff --git a/libavcodec/aarch64/me_cmp_init_aarch64.c
b/libavcodec/aarch64
14 matches
Mail list logo