[FFmpeg-devel] [PATCH] x86/flacdsp: add SSE2 and AVX decorrelate functions

2014-11-03 Thread James Almer
Two to four times faster depending on instruction set, block size and channel count. Signed-off-by: James Almer --- Now also with 16 bits indep4 and indep6. libavcodec/arm/flacdsp_init_arm.c | 2 +- libavcodec/flacdec.c | 6 +- libavcodec/flacdsp.c | 6 +- libav

[FFmpeg-devel] [PATCH] x86/flacdsp: add SSE2 and AVX decorrelate functions

2014-11-02 Thread James Almer
Two to four times faster depending on instruction set, block size and channel count. Signed-off-by: James Almer --- libavcodec/arm/flacdsp_init_arm.c | 2 +- libavcodec/flacdec.c | 6 +- libavcodec/flacdsp.c | 6 +- libavcodec/flacdsp.h | 6 +- lib

Re: [FFmpeg-devel] [PATCH] x86/flacdsp: add SSE2 and AVX decorrelate functions

2014-11-02 Thread Clément Bœsch
On Sun, Nov 02, 2014 at 07:55:35PM -0300, James Almer wrote: > On 02/11/14 7:43 PM, Clément Bœsch wrote: > > On Sun, Nov 02, 2014 at 07:31:48PM -0300, James Almer wrote: > >> Two to four times faster depending on instruction set, block size and > >> channel count. > >> > >> Signed-off-by: James Al

Re: [FFmpeg-devel] [PATCH] x86/flacdsp: add SSE2 and AVX decorrelate functions

2014-11-02 Thread James Almer
On 02/11/14 7:43 PM, Clément Bœsch wrote: > On Sun, Nov 02, 2014 at 07:31:48PM -0300, James Almer wrote: >> Two to four times faster depending on instruction set, block size and >> channel count. >> >> Signed-off-by: James Almer >> --- >> TODO: 16 bits indep for 4, 6 and 8 channels. 24/32 bits in

Re: [FFmpeg-devel] [PATCH] x86/flacdsp: add SSE2 and AVX decorrelate functions

2014-11-02 Thread Clément Bœsch
On Sun, Nov 02, 2014 at 07:31:48PM -0300, James Almer wrote: > Two to four times faster depending on instruction set, block size and channel > count. > > Signed-off-by: James Almer > --- > TODO: 16 bits indep for 4, 6 and 8 channels. 24/32 bits indep for 8 channels. > AVX2 and maybe MMX ve

[FFmpeg-devel] [PATCH] x86/flacdsp: add SSE2 and AVX decorrelate functions

2014-11-02 Thread James Almer
Two to four times faster depending on instruction set, block size and channel count. Signed-off-by: James Almer --- TODO: 16 bits indep for 4, 6 and 8 channels. 24/32 bits indep for 8 channels. AVX2 and maybe MMX versions. Planar? libavcodec/arm/flacdsp_init_arm.c | 2 +- libavco