[FFmpeg-devel] [PATCH 08/10] avcodec/idctdsp: Arm 64-bit NEON block add and clamp fast paths

2022-03-25 Thread Ben Avison
checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows. idctdsp.add_pixels_clamped_c: 323.0 idctdsp.add_pixels_clamped_neon: 41.5 idctdsp.put_pixels_clamped_c: 243.0 idctdsp.put_pixels_clamped_neon: 30.0 idctdsp.put_signed_pixels_clamped_c: 225.7 idctdsp.put_signed_pixels_clamped_neon: 37.7 Sig

Re: [FFmpeg-devel] [PATCH 08/10] avcodec/idctdsp: Arm 64-bit NEON block add and clamp fast paths

2022-03-30 Thread Martin Storsjö
On Fri, 25 Mar 2022, Ben Avison wrote: checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows. idctdsp.add_pixels_clamped_c: 323.0 idctdsp.add_pixels_clamped_neon: 41.5 idctdsp.put_pixels_clamped_c: 243.0 idctdsp.put_pixels_clamped_neon: 30.0 idctdsp.put_signed_pixels_clamped_c: 225.7 idctdsp

Re: [FFmpeg-devel] [PATCH 08/10] avcodec/idctdsp: Arm 64-bit NEON block add and clamp fast paths

2022-03-31 Thread Ben Avison
On 30/03/2022 15:14, Martin Storsjö wrote: On Fri, 25 Mar 2022, Ben Avison wrote: +// Clamp 16-bit signed block coefficients to signed 8-bit (biased by 128) +// On entry: +//   x0 -> array of 64x 16-bit coefficients +//   x1 -> 8-bit results +//   x2 = row stride for results, bytes +function ff

Re: [FFmpeg-devel] [PATCH 08/10] avcodec/idctdsp: Arm 64-bit NEON block add and clamp fast paths

2022-03-31 Thread Martin Storsjö
On Thu, 31 Mar 2022, Ben Avison wrote: On 30/03/2022 15:14, Martin Storsjö wrote: On Fri, 25 Mar 2022, Ben Avison wrote: +// Clamp 16-bit signed block coefficients to signed 8-bit (biased by 128) +// On entry: +//   x0 -> array of 64x 16-bit coefficients +//   x1 -> 8-bit results +//   x2 = ro