As far as I can see, the only reason those functions are SSE4 is because
of the pextrw needed for the following block widths:
- 2, used only by chroma;
- 6, used by chroma and indirectly by luma;
- 12, used by both.
The better solution would be to convert all chroma handling to NV12, but
it is
For 10bits and 12bits, they should stay sse4 as well because of packusdw. You
need some instructions to convert it to ssse3 see below
static av_always_inline __m128i _MM_PACKUS_EPI32( __m128i a, __m128i b )
{
a = _mm_slli_epi32 (a, 16);
a = _mm_srai_epi32 (a, 16);
b =
On 23/08/14 11:07 AM, Mickaël Raulet wrote:
For 10bits and 12bits, they should stay sse4 as well because of packusdw. You
need some instructions to convert it to ssse3 see below
static av_always_inline __m128i _MM_PACKUS_EPI32( __m128i a, __m128i b )
{
a = _mm_slli_epi32 (a, 16);
On Aug 23, 2014 7:47 AM, James Almer jamr...@gmail.com wrote:
On 23/08/14 11:07 AM, Mickaël Raulet wrote:
For 10bits and 12bits, they should stay sse4 as well because of
packusdw. You need some instructions to convert it to ssse3 see below
static av_always_inline __m128i
Hi,
2014-08-23 17:01 GMT+02:00 James Almer jamr...@gmail.com:
There's a PACK macro in lavfi/x86/yasm-16.asm that does this without
intrinsics.
You meant yadif-16, right?
Timothy
Oops, yes i meant that :P
I expect it to be needed for the weighted pred functions, so I'll
split it from
On 23/08/14 12:15 PM, Christophe Gisquet wrote:
Hi,
2014-08-23 17:01 GMT+02:00 James Almer jamr...@gmail.com:
There's a PACK macro in lavfi/x86/yasm-16.asm that does this without
intrinsics.
You meant yadif-16, right?
Timothy
Oops, yes i meant that :P
I expect it to be needed for