Hi,
On Sat, Jun 10, 2017 at 6:01 AM, Ilia Valiakhmetov
wrote:
> Signed-off-by: Ilia Valiakhmetov
> ---
> libavcodec/x86/vp9dsp_init_16bpp.c| 2 ++
> libavcodec/x86/vp9intrapred_16bpp.asm | 56 ++
> +
> 2 files changed, 58 insertions(+)
>
> diff --git a/liba
Signed-off-by: Ilia Valiakhmetov
---
libavcodec/x86/vp9dsp_init_16bpp.c| 2 ++
libavcodec/x86/vp9intrapred_16bpp.asm | 56 +++
2 files changed, 58 insertions(+)
diff --git a/libavcodec/x86/vp9dsp_init_16bpp.c
b/libavcodec/x86/vp9dsp_init_16bpp.c
index d1b8fc
Yes, you are right, I'll send a patch with this fixed, thanks.
On Sat, Jun 10, 2017 at 5:35 AM, Ivan Kalvachev
wrote:
> On 6/9/17, Ilia Valiakhmetov wrote:
> > Signed-off-by: Ilia Valiakhmetov
> > ---
> > libavcodec/x86/vp9dsp_init_16bpp.c| 2 ++
> > libavcodec/x86/vp9intrapred_16bpp.asm
On 6/9/17, Ilia Valiakhmetov wrote:
> Signed-off-by: Ilia Valiakhmetov
> ---
> libavcodec/x86/vp9dsp_init_16bpp.c| 2 ++
> libavcodec/x86/vp9intrapred_16bpp.asm | 56
> +++
> 2 files changed, 58 insertions(+)
>
> diff --git a/libavcodec/x86/vp9dsp_init_16bpp.
Signed-off-by: Ilia Valiakhmetov
---
libavcodec/x86/vp9dsp_init_16bpp.c| 2 ++
libavcodec/x86/vp9intrapred_16bpp.asm | 56 +++
2 files changed, 58 insertions(+)
diff --git a/libavcodec/x86/vp9dsp_init_16bpp.c
b/libavcodec/x86/vp9dsp_init_16bpp.c
index d1b8fc
>I know unaligned loads are not as slow as they used to be,
>but could m1 be produced by m2 and palignr?
I am not sure, can you clarify your question?
>From the comment I assume you don't use the extra two bytes
>that you get from the load, as you mark them as "*"
>generic undefined values
No, t
On 6/8/17, Ilia Valiakhmetov wrote:
> vp9_diag_downright_16x16_12bpp_c: 149.0
> vp9_diag_downright_16x16_12bpp_sse2: 67.8
> vp9_diag_downright_16x16_12bpp_ssse3: 45.6
> vp9_diag_downright_16x16_12bpp_avx: 36.6
> vp9_diag_downright_16x16_12bpp_avx2: 25.5
>
> ~30% faster than avx
>
> Signed-off-by:
vp9_diag_downright_16x16_12bpp_c: 149.0
vp9_diag_downright_16x16_12bpp_sse2: 67.8
vp9_diag_downright_16x16_12bpp_ssse3: 45.6
vp9_diag_downright_16x16_12bpp_avx: 36.6
vp9_diag_downright_16x16_12bpp_avx2: 25.5
~30% faster than avx
Signed-off-by: Ilia Valiakhmetov
---
libavcodec/x86/vp9dsp_init_16