On 2016-11-30 13:57, Ronald S. Bultje wrote:
> On Wed, Nov 30, 2016 at 7:10 AM, James Darnley wrote:
>>> Nehalem:
>>> - sse2:
>>>- complex: 4.13x faster (1514 vs. 367 cycles)
>>>- simple: 4.38x faster (1836 vs. 419 cycles)
>>>
>>> Haswell:
>>> - sse2:
>>>
Hi,
On Wed, Nov 30, 2016 at 7:10 AM, James Darnley wrote:
> On 2016-11-29 21:09, Carl Eugen Hoyos wrote:
> > 2016-11-29 17:14 GMT+01:00 James Darnley :
> >> On 2016-11-29 15:30, Carl Eugen Hoyos wrote:
> >>> 2016-11-29 12:52 GMT+01:00 James Darnley :
> sse2:
> complex: 4.13x faster (15
On 2016-11-29 21:09, Carl Eugen Hoyos wrote:
> 2016-11-29 17:14 GMT+01:00 James Darnley :
>> On 2016-11-29 15:30, Carl Eugen Hoyos wrote:
>>> 2016-11-29 12:52 GMT+01:00 James Darnley :
sse2:
complex: 4.13x faster (1514 vs. 367 cycles)
simple: 4.38x faster (1836 vs. 419 cycles)
On 2016-11-29 21:09, Carl Eugen Hoyos wrote:
> 2016-11-29 17:14 GMT+01:00 James Darnley :
>> On 2016-11-29 15:30, Carl Eugen Hoyos wrote:
>>> 2016-11-29 12:52 GMT+01:00 James Darnley :
sse2:
complex: 4.13x faster (1514 vs. 367 cycles)
simple: 4.38x faster (1836 vs. 419 cycles)
2016-11-29 17:14 GMT+01:00 James Darnley :
> On 2016-11-29 15:30, Carl Eugen Hoyos wrote:
>> 2016-11-29 12:52 GMT+01:00 James Darnley :
>>> sse2:
>>> complex: 4.13x faster (1514 vs. 367 cycles)
>>> simple: 4.38x faster (1836 vs. 419 cycles)
>>>
>>> avx:
>>> complex: 1.07x faster (260 vs. 244 cycle
On 2016-11-29 15:30, Carl Eugen Hoyos wrote:
> 2016-11-29 12:52 GMT+01:00 James Darnley :
>> sse2:
>> complex: 4.13x faster (1514 vs. 367 cycles)
>> simple: 4.38x faster (1836 vs. 419 cycles)
>>
>> avx:
>> complex: 1.07x faster (260 vs. 244 cycles)
>> simple: 1.03x faster (284 vs. 274 cycles)
>
2016-11-29 12:52 GMT+01:00 James Darnley :
> sse2:
> complex: 4.13x faster (1514 vs. 367 cycles)
> simple: 4.38x faster (1836 vs. 419 cycles)
>
> avx:
> complex: 1.07x faster (260 vs. 244 cycles)
> simple: 1.03x faster (284 vs. 274 cycles)
What are you comparing?
Carl Eugen
sse2:
complex: 4.13x faster (1514 vs. 367 cycles)
simple: 4.38x faster (1836 vs. 419 cycles)
avx:
complex: 1.07x faster (260 vs. 244 cycles)
simple: 1.03x faster (284 vs. 274 cycles)
---
libavcodec/x86/h264_idct_10bit.asm | 53 ++
libavcodec/x86/h264dsp_init.