[FFmpeg-devel] [PATCH] libavcodec Adding ff_v210_planar_unpack AVX2

2019-03-16 Thread Michael Stoner
Replaced VSHUFPS with VPBLENDD to relieve port 5 bottleneck AVX2 is 1.4x faster than AVX --- libavcodec/v210dec.c | 10 +- libavcodec/x86/v210-init.c | 8 + libavcodec/x86/v210.asm| 72 +- 3 files changed, 73 insertions(+), 17 deletions(-)

[FFmpeg-devel] [PATCH] Revised ff_v210_planar_unpack AVX2

2019-03-12 Thread Michael Stoner
Replaced VSHUFPS with VPBLENDD to relieve port 5 bottleneck AVX2 is now 1.4x faster than AVX Tested on Broadwell CPU, Ubuntu 18.10 x86_64 ~/FFmpeg$ tests/checkasm/checkasm --bench --test=v210dec benchmarking with native FFmpeg timers nop: 94.1 checkasm: using random seed 3963743306 SSSE3: -

[FFmpeg-devel] [PATCH] Revised ff_v210_planar_unpack AVX2

2019-03-06 Thread Michael Stoner
--- libavcodec/v210dec.c | 10 +- libavcodec/x86/v210-init.c | 8 + libavcodec/x86/v210.asm| 63 -- 3 files changed, 64 insertions(+), 17 deletions(-) diff --git a/libavcodec/v210dec.c b/libavcodec/v210dec.c index ddc5dbe8be..26954c0df3

[FFmpeg-devel] [PATCH] Added ff_v210_planar_unpack_aligned_avx2

2019-03-01 Thread Michael Stoner
The AVX2 code leverages VPERMD to process 12 pixels/iteration. This is my first patch submission so any comments are greatly appreciated. -Mike Tested on Skylake (Win32 & Win64) 1920x1080 input frame = C code - 440 fps SSSE3 - 920 fps AVX- 930 fps AVX2 - 1040 fps