Re: [FFmpeg-devel] [PATCH 4/5] avcodec/h264: add avx 8-bit h264_idct_add

2017-04-14 Thread James Darnley
On 2017-04-06 18:06, James Almer wrote: > Your numbers are really confusing. Could you post the actual numbers for > each function instead of doing comparisons? These figures are the actual numbers! Using the figures from Haswell above: > ff_h264_idct_add_8_mmx = 52 cycles >

Re: [FFmpeg-devel] [PATCH 4/5] avcodec/h264: add avx 8-bit h264_idct_add

2017-04-06 Thread James Almer
On 4/6/2017 12:34 PM, James Darnley wrote: > On 2017-04-05 05:44, James Almer wrote: >> On 4/4/2017 10:53 PM, James Darnley wrote: >>> Haswell: >>> - 1.11x faster (522±0.4 vs. 469±1.8 decicycles) compared with mmxext >>> >>> Skylake-U: >>> - 1.21x faster (671±5.5 vs. 555±1.4 decicycles) compared

Re: [FFmpeg-devel] [PATCH 4/5] avcodec/h264: add avx 8-bit h264_idct_add

2017-04-06 Thread James Darnley
On 2017-04-05 05:44, James Almer wrote: > On 4/4/2017 10:53 PM, James Darnley wrote: >> Haswell: >> - 1.11x faster (522±0.4 vs. 469±1.8 decicycles) compared with mmxext >> >> Skylake-U: >> - 1.21x faster (671±5.5 vs. 555±1.4 decicycles) compared with mmxext > > Again, you should add an SSE2

Re: [FFmpeg-devel] [PATCH 4/5] avcodec/h264: add avx 8-bit h264_idct_add

2017-04-04 Thread James Almer
On 4/4/2017 10:53 PM, James Darnley wrote: > Haswell: > - 1.11x faster (522±0.4 vs. 469±1.8 decicycles) compared with mmxext > > Skylake-U: > - 1.21x faster (671±5.5 vs. 555±1.4 decicycles) compared with mmxext Again, you should add an SSE2 version first, then an AVX one if it's measurably

[FFmpeg-devel] [PATCH 4/5] avcodec/h264: add avx 8-bit h264_idct_add

2017-04-04 Thread James Darnley
Haswell: - 1.11x faster (522±0.4 vs. 469±1.8 decicycles) compared with mmxext Skylake-U: - 1.21x faster (671±5.5 vs. 555±1.4 decicycles) compared with mmxext --- libavcodec/x86/h264_idct.asm | 33 - libavcodec/x86/h264dsp_init.c | 3 +++ 2 files changed, 35