Re: [FFmpeg-devel] [PATCH 3/4] avcodec/h264: mmx2, sse2, avx 10-bit h chroma deblock/loop filter

2016-12-08 Thread Carl Eugen Hoyos
2016-12-07 14:07 GMT+01:00 James Darnley : > On 2016-12-07 11:07, Carl Eugen Hoyos wrote: >> 2016-12-05 19:32 GMT+01:00 James Darnley : >> >>> - sse2: 2.47x (170 vs. 69 cycles) >>> - avx: 2.47x (170 vs. 69 cycles) >> >> Please elaborate on why this was

Re: [FFmpeg-devel] [PATCH 3/4] avcodec/h264: mmx2, sse2, avx 10-bit h chroma deblock/loop filter

2016-12-07 Thread Henrik Gramner
On Wed, Dec 7, 2016 at 2:07 PM, James Darnley wrote: > Because a few instructions using 3 operand form should be quicker. The > fact that it doesn't show is no doubt down to the out of order execution > managing to do the moves earlier than written. Register-register moves are

Re: [FFmpeg-devel] [PATCH 3/4] avcodec/h264: mmx2, sse2, avx 10-bit h chroma deblock/loop filter

2016-12-07 Thread Carl Eugen Hoyos
2016-12-05 19:32 GMT+01:00 James Darnley : > - sse2: 2.47x (170 vs. 69 cycles) > - avx: 2.47x (170 vs. 69 cycles) Please elaborate on why this was committed. Carl Eugen ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org

[FFmpeg-devel] [PATCH 3/4] avcodec/h264: mmx2, sse2, avx 10-bit h chroma deblock/loop filter

2016-12-05 Thread James Darnley
Yorkfield: - mmx2: 2.45x (279 vs. 114 cycles) - sse2: 3.36x (279 vs. 83 cycles) Nehalem: - mmx2: 2.10x (192 vs. 92 cycles) - sse2: 2.84x (192 vs. 68 cycles) Skylake: - mmx2: 1.75x (170 vs. 97 cycles) - sse2: 2.47x (170 vs. 69 cycles) - avx: 2.47x (170 vs. 69 cycles) ---