Re: [FFmpeg-devel] [PATCH] huffyuvencdsp: Add ff_diff_bytes_{sse2, avx2}

2015-10-21 Thread Timothy Gu
On Wed, Oct 21, 2015 at 10:32 AM Timothy Gu wrote: > On Tue, Oct 20, 2015 at 7:36 PM James Almer wrote: > >> On 10/20/2015 10:32 PM, Timothy Gu wrote: >> > > +; mov type used for src1q, dstq, first reg, second reg >> > +%macro DIFF_BYTES_LOOP_CORE 4 >> > +%if regsize != 16 >> >> %if mmsize != 16

Re: [FFmpeg-devel] [PATCH] huffyuvencdsp: Add ff_diff_bytes_{sse2, avx2}

2015-10-21 Thread Timothy Gu
On Tue, Oct 20, 2015 at 7:36 PM James Almer wrote: > On 10/20/2015 10:32 PM, Timothy Gu wrote: > > +; mov type used for src1q, dstq, first reg, second reg > > +%macro DIFF_BYTES_LOOP_CORE 4 > > +%if regsize != 16 > > %if mmsize != 16 > > By checking regsize you're using the SSE2 version in the AV

Re: [FFmpeg-devel] [PATCH] huffyuvencdsp: Add ff_diff_bytes_{sse2, avx2}

2015-10-20 Thread James Almer
On 10/20/2015 10:32 PM, Timothy Gu wrote: > SSE2 version 4%-35% faster than MMX depending on the width. > AVX2 version 1%-13% faster than SSE2 depending on the width. > --- > > Addressed James's and Henrik's advices. Removed heuristics based on width. > Made available both aligned and unaligned ve

[FFmpeg-devel] [PATCH] huffyuvencdsp: Add ff_diff_bytes_{sse2, avx2}

2015-10-20 Thread Timothy Gu
SSE2 version 4%-35% faster than MMX depending on the width. AVX2 version 1%-13% faster than SSE2 depending on the width. --- Addressed James's and Henrik's advices. Removed heuristics based on width. Made available both aligned and unaligned versions. For AVX2 version, gracefully fall back on SSE2