On 2017-04-06 18:06, James Almer wrote:
> Your numbers are really confusing. Could you post the actual numbers for
> each function instead of doing comparisons?
These figures are the actual numbers!
Using the figures from Haswell above:
> ff_h264_idct_add_8_mmx = 52 cycles
>
On 4/6/2017 12:34 PM, James Darnley wrote:
> On 2017-04-05 05:44, James Almer wrote:
>> On 4/4/2017 10:53 PM, James Darnley wrote:
>>> Haswell:
>>> - 1.11x faster (522±0.4 vs. 469±1.8 decicycles) compared with mmxext
>>>
>>> Skylake-U:
>>> - 1.21x faster (671±5.5 vs. 555±1.4 decicycles) compared
On 2017-04-05 05:44, James Almer wrote:
> On 4/4/2017 10:53 PM, James Darnley wrote:
>> Haswell:
>> - 1.11x faster (522±0.4 vs. 469±1.8 decicycles) compared with mmxext
>>
>> Skylake-U:
>> - 1.21x faster (671±5.5 vs. 555±1.4 decicycles) compared with mmxext
>
> Again, you should add an SSE2
On 4/4/2017 10:53 PM, James Darnley wrote:
> Haswell:
> - 1.11x faster (522±0.4 vs. 469±1.8 decicycles) compared with mmxext
>
> Skylake-U:
> - 1.21x faster (671±5.5 vs. 555±1.4 decicycles) compared with mmxext
Again, you should add an SSE2 version first, then an AVX one if it's
measurably
Haswell:
- 1.11x faster (522±0.4 vs. 469±1.8 decicycles) compared with mmxext
Skylake-U:
- 1.21x faster (671±5.5 vs. 555±1.4 decicycles) compared with mmxext
---
libavcodec/x86/h264_idct.asm | 33 -
libavcodec/x86/h264dsp_init.c | 3 +++
2 files changed, 35