[FFmpeg-devel] [PATCH 0/2] x86: hevc_mc: port to SSSE3

2014-08-23 Thread Christophe Gisquet
As far as I can see, the only reason those functions are SSE4 is because of the pextrw needed for the following block widths: - 2, used only by chroma; - 6, used by chroma and indirectly by luma; - 12, used by both. The better solution would be to convert all chroma handling to NV12, but it is

Re: [FFmpeg-devel] [PATCH 0/2] x86: hevc_mc: port to SSSE3

2014-08-23 Thread Mickaël Raulet
For 10bits and 12bits, they should stay sse4 as well because of packusdw. You need some instructions to convert it to ssse3 see below static av_always_inline __m128i _MM_PACKUS_EPI32( __m128i a, __m128i b ) { a = _mm_slli_epi32 (a, 16); a = _mm_srai_epi32 (a, 16); b =

Re: [FFmpeg-devel] [PATCH 0/2] x86: hevc_mc: port to SSSE3

2014-08-23 Thread James Almer
On 23/08/14 11:07 AM, Mickaël Raulet wrote: For 10bits and 12bits, they should stay sse4 as well because of packusdw. You need some instructions to convert it to ssse3 see below static av_always_inline __m128i _MM_PACKUS_EPI32( __m128i a, __m128i b ) { a = _mm_slli_epi32 (a, 16);

Re: [FFmpeg-devel] [PATCH 0/2] x86: hevc_mc: port to SSSE3

2014-08-23 Thread Timothy Gu
On Aug 23, 2014 7:47 AM, James Almer jamr...@gmail.com wrote: On 23/08/14 11:07 AM, Mickaël Raulet wrote: For 10bits and 12bits, they should stay sse4 as well because of packusdw. You need some instructions to convert it to ssse3 see below static av_always_inline __m128i

Re: [FFmpeg-devel] [PATCH 0/2] x86: hevc_mc: port to SSSE3

2014-08-23 Thread Christophe Gisquet
Hi, 2014-08-23 17:01 GMT+02:00 James Almer jamr...@gmail.com: There's a PACK macro in lavfi/x86/yasm-16.asm that does this without intrinsics. You meant yadif-16, right? Timothy Oops, yes i meant that :P I expect it to be needed for the weighted pred functions, so I'll split it from

Re: [FFmpeg-devel] [PATCH 0/2] x86: hevc_mc: port to SSSE3

2014-08-23 Thread James Almer
On 23/08/14 12:15 PM, Christophe Gisquet wrote: Hi, 2014-08-23 17:01 GMT+02:00 James Almer jamr...@gmail.com: There's a PACK macro in lavfi/x86/yasm-16.asm that does this without intrinsics. You meant yadif-16, right? Timothy Oops, yes i meant that :P I expect it to be needed for