[FFmpeg-devel] [PATCH 07/10] avcodec/vc1: Arm 64-bit NEON inverse transform fast paths

2022-03-25 Thread Ben Avison
checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows. vc1dsp.vc1_inv_trans_4x4_c: 158.2 vc1dsp.vc1_inv_trans_4x4_neon: 65.7 vc1dsp.vc1_inv_trans_4x4_dc_c: 86.5 vc1dsp.vc1_inv_trans_4x4_dc_neon: 26.5 vc1dsp.vc1_inv_trans_4x8_c: 335.2 vc1dsp.vc1_inv_trans_4x8_neon: 106.2 vc1dsp.vc1_inv_trans_4x8

Re: [FFmpeg-devel] [PATCH 07/10] avcodec/vc1: Arm 64-bit NEON inverse transform fast paths

2022-03-30 Thread Martin Storsjö
On Fri, 25 Mar 2022, Ben Avison wrote: checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows. vc1dsp.vc1_inv_trans_4x4_c: 158.2 vc1dsp.vc1_inv_trans_4x4_neon: 65.7 vc1dsp.vc1_inv_trans_4x4_dc_c: 86.5 vc1dsp.vc1_inv_trans_4x4_dc_neon: 26.5 vc1dsp.vc1_inv_trans_4x8_c: 335.2 vc1dsp.vc1_inv_tran

Re: [FFmpeg-devel] [PATCH 07/10] avcodec/vc1: Arm 64-bit NEON inverse transform fast paths

2022-03-30 Thread Martin Storsjö
On Wed, 30 Mar 2022, Martin Storsjö wrote: On Fri, 25 Mar 2022, Ben Avison wrote: checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows. vc1dsp.vc1_inv_trans_4x4_c: 158.2 vc1dsp.vc1_inv_trans_4x4_neon: 65.7 vc1dsp.vc1_inv_trans_4x4_dc_c: 86.5 vc1dsp.vc1_inv_trans_4x4_dc_neon: 26.5 vc1dsp.v

Re: [FFmpeg-devel] [PATCH 07/10] avcodec/vc1: Arm 64-bit NEON inverse transform fast paths

2022-03-31 Thread Ben Avison
On 30/03/2022 14:49, Martin Storsjö wrote: Looks generally reasonable. Is it possible to factorize out the individual transforms (so that you'd e.g. invoke the same macro twice in the 8x8 and 4x4 functions) without too much loss? There is a close analogy here with the vertical/horizontal deblo

Re: [FFmpeg-devel] [PATCH 07/10] avcodec/vc1: Arm 64-bit NEON inverse transform fast paths

2022-03-31 Thread Martin Storsjö
On Thu, 31 Mar 2022, Ben Avison wrote: On 30/03/2022 14:49, Martin Storsjö wrote: Looks generally reasonable. Is it possible to factorize out the individual transforms (so that you'd e.g. invoke the same macro twice in the 8x8 and 4x4 functions) without too much loss? There is a close analog