Re: [FFmpeg-devel] [PATCH 3/3] avcodec/h264: sse2 and avx 4:2:2 idct add8 10-bit functions

2016-11-30 Thread James Darnley
On 2016-11-30 13:57, Ronald S. Bultje wrote: > On Wed, Nov 30, 2016 at 7:10 AM, James Darnley wrote: >>> Nehalem: >>> - sse2: >>>- complex: 4.13x faster (1514 vs. 367 cycles) >>>- simple: 4.38x faster (1836 vs. 419 cycles) >>> >>> Haswell: >>> -

Re: [FFmpeg-devel] [PATCH 3/3] avcodec/h264: sse2 and avx 4:2:2 idct add8 10-bit functions

2016-11-30 Thread Ronald S. Bultje
Hi, On Wed, Nov 30, 2016 at 7:10 AM, James Darnley wrote: > On 2016-11-29 21:09, Carl Eugen Hoyos wrote: > > 2016-11-29 17:14 GMT+01:00 James Darnley : > >> On 2016-11-29 15:30, Carl Eugen Hoyos wrote: > >>> 2016-11-29 12:52 GMT+01:00 James Darnley

Re: [FFmpeg-devel] [PATCH 3/3] avcodec/h264: sse2 and avx 4:2:2 idct add8 10-bit functions

2016-11-30 Thread James Darnley
On 2016-11-29 21:09, Carl Eugen Hoyos wrote: > 2016-11-29 17:14 GMT+01:00 James Darnley : >> On 2016-11-29 15:30, Carl Eugen Hoyos wrote: >>> 2016-11-29 12:52 GMT+01:00 James Darnley : sse2: complex: 4.13x faster (1514 vs. 367 cycles) simple: 4.38x

Re: [FFmpeg-devel] [PATCH 3/3] avcodec/h264: sse2 and avx 4:2:2 idct add8 10-bit functions

2016-11-29 Thread James Darnley
On 2016-11-29 21:09, Carl Eugen Hoyos wrote: > 2016-11-29 17:14 GMT+01:00 James Darnley : >> On 2016-11-29 15:30, Carl Eugen Hoyos wrote: >>> 2016-11-29 12:52 GMT+01:00 James Darnley : sse2: complex: 4.13x faster (1514 vs. 367 cycles) simple: 4.38x

Re: [FFmpeg-devel] [PATCH 3/3] avcodec/h264: sse2 and avx 4:2:2 idct add8 10-bit functions

2016-11-29 Thread Carl Eugen Hoyos
2016-11-29 17:14 GMT+01:00 James Darnley : > On 2016-11-29 15:30, Carl Eugen Hoyos wrote: >> 2016-11-29 12:52 GMT+01:00 James Darnley : >>> sse2: >>> complex: 4.13x faster (1514 vs. 367 cycles) >>> simple: 4.38x faster (1836 vs. 419 cycles) >>> >>> avx: >>>

Re: [FFmpeg-devel] [PATCH 3/3] avcodec/h264: sse2 and avx 4:2:2 idct add8 10-bit functions

2016-11-29 Thread James Darnley
On 2016-11-29 15:30, Carl Eugen Hoyos wrote: > 2016-11-29 12:52 GMT+01:00 James Darnley : >> sse2: >> complex: 4.13x faster (1514 vs. 367 cycles) >> simple: 4.38x faster (1836 vs. 419 cycles) >> >> avx: >> complex: 1.07x faster (260 vs. 244 cycles) >> simple: 1.03x faster (284

Re: [FFmpeg-devel] [PATCH 3/3] avcodec/h264: sse2 and avx 4:2:2 idct add8 10-bit functions

2016-11-29 Thread Carl Eugen Hoyos
2016-11-29 12:52 GMT+01:00 James Darnley : > sse2: > complex: 4.13x faster (1514 vs. 367 cycles) > simple: 4.38x faster (1836 vs. 419 cycles) > > avx: > complex: 1.07x faster (260 vs. 244 cycles) > simple: 1.03x faster (284 vs. 274 cycles) What are you comparing? Carl Eugen

[FFmpeg-devel] [PATCH 3/3] avcodec/h264: sse2 and avx 4:2:2 idct add8 10-bit functions

2016-11-29 Thread James Darnley
sse2: complex: 4.13x faster (1514 vs. 367 cycles) simple: 4.38x faster (1836 vs. 419 cycles) avx: complex: 1.07x faster (260 vs. 244 cycles) simple: 1.03x faster (284 vs. 274 cycles) --- libavcodec/x86/h264_idct_10bit.asm | 53 ++