Re: [FFmpeg-devel] [PATCH] libavcodec/hevcdsp: port SIMD idct functions from 32-bit.

2021-02-11 Thread Reimar Döffinger
Hi Martin! > On 10 Feb 2021, at 22:53, Martin Storsjö wrote: > > +.macro idct_16x16 bitdepth > +function ff_hevc_idct_16x16_\bitdepth\()_neon, export=1 > +//r0 - coeffs > +mov x15, lr > + Binutils doesn't recognize "lr" as alias for x30 >>> It didn’t

Re: [FFmpeg-devel] [PATCH] libavcodec/hevcdsp: port SIMD idct functions from 32-bit.

2021-02-10 Thread Martin Storsjö
Hi Reimar, On Sat, 16 Jan 2021, Martin Storsjö wrote: +.macro idct_16x16 bitdepth +function ff_hevc_idct_16x16_\bitdepth\()_neon, export=1 +//r0 - coeffs +mov x15, lr + Binutils doesn't recognize "lr" as alias for x30 It didn’t have an issue in the Debian unstable VM?

Re: [FFmpeg-devel] [PATCH] libavcodec/hevcdsp: port SIMD idct functions from 32-bit.

2021-01-16 Thread Martin Storsjö
On Sat, 16 Jan 2021, Reimar Döffinger wrote: On 15 Jan 2021, at 23:55, Martin Storsjö wrote: On Tue, 12 Jan 2021, reimar.doeffin...@gmx.de wrote: create mode 100644 libavcodec/aarch64/hevcdsp_idct_neon.S create mode 100644 libavcodec/aarch64/hevcdsp_init_aarch64.c This patch fails

Re: [FFmpeg-devel] [PATCH] libavcodec/hevcdsp: port SIMD idct functions from 32-bit.

2021-01-15 Thread Reimar Döffinger
> On 15 Jan 2021, at 23:55, Martin Storsjö wrote: > > On Tue, 12 Jan 2021, reimar.doeffin...@gmx.de wrote: > >> create mode 100644 libavcodec/aarch64/hevcdsp_idct_neon.S >> create mode 100644 libavcodec/aarch64/hevcdsp_init_aarch64.c > > This patch fails checkasm Fixed, one mis-translated

[FFmpeg-devel] [PATCH] libavcodec/hevcdsp: port SIMD idct functions from 32-bit.

2021-01-15 Thread Reimar . Doeffinger
From: Reimar Döffinger Makes SIMD-optimized 8x8 and 16x16 idcts for 8 and 10 bit depth available on aarch64. For a UHD HDR (10 bit) sample video these were consuming the most time and this optimization reduced overall decode time from 19.4s to 16.4s, approximately 15% speedup. Test sample was

Re: [FFmpeg-devel] [PATCH] libavcodec/hevcdsp: port SIMD idct functions from 32-bit.

2021-01-15 Thread Martin Storsjö
On Tue, 12 Jan 2021, reimar.doeffin...@gmx.de wrote: From: Reimar Döffinger Makes SIMD-optimized 8x8 and 16x16 idcts for 8 and 10 bit depth available on aarch64. For a UHD HDR (10 bit) sample video these were consuming the most time and this optimization reduced overall decode time from 19.4s

Re: [FFmpeg-devel] [PATCH] libavcodec/hevcdsp: port SIMD idct functions from 32-bit.

2021-01-13 Thread Martin Storsjö
On Tue, 12 Jan 2021, Reimar Döffinger wrote: On 12 Jan 2021, at 13:24, Josh Dekker wrote: Hi, AS libavcodec/aarch64/hevcdsp_idct_neon.o libavcodec/aarch64/hevcdsp_idct_neon.S: Assembler messages: libavcodec/aarch64/hevcdsp_idct_neon.S:418: Error: operand mismatch -- `mov v29.4S,v28.4S'

Re: [FFmpeg-devel] [PATCH] libavcodec/hevcdsp: port SIMD idct functions from 32-bit.

2021-01-12 Thread Reimar Döffinger
> On 12 Jan 2021, at 13:24, Josh Dekker wrote: > > Hi, > > On 2021-01-08 21:36, reimar.doeffin...@gmx.de wrote: >> From: Reimar Döffinger >> Makes SIMD-optimized 8x8 and 16x16 idcts for 8 and 10 bit depth >> available on aarch64. >> For a UHD HDR (10 bit) sample video these were consuming

[FFmpeg-devel] [PATCH] libavcodec/hevcdsp: port SIMD idct functions from 32-bit.

2021-01-12 Thread Reimar . Doeffinger
From: Reimar Döffinger Makes SIMD-optimized 8x8 and 16x16 idcts for 8 and 10 bit depth available on aarch64. For a UHD HDR (10 bit) sample video these were consuming the most time and this optimization reduced overall decode time from 19.4s to 16.4s, approximately 15% speedup. Test sample was

Re: [FFmpeg-devel] [PATCH] libavcodec/hevcdsp: port SIMD idct functions from 32-bit.

2021-01-12 Thread Josh Dekker
Hi, On 2021-01-08 21:36, reimar.doeffin...@gmx.de wrote: From: Reimar Döffinger Makes SIMD-optimized 8x8 and 16x16 idcts for 8 and 10 bit depth available on aarch64. For a UHD HDR (10 bit) sample video these were consuming the most time and this optimization reduced overall decode time from

[FFmpeg-devel] [PATCH] libavcodec/hevcdsp: port SIMD idct functions from 32-bit.

2021-01-08 Thread Reimar . Doeffinger
From: Reimar Döffinger Makes SIMD-optimized 8x8 and 16x16 idcts for 8 and 10 bit depth available on aarch64. For a UHD HDR (10 bit) sample video these were consuming the most time and this optimization reduced overall decode time from 19.4s to 16.4s, approximately 15% speedup. Test sample was