[FFmpeg-devel] [PATCH] avcodec/h264: mmxext 4:2:2 chroma deblock/loop filter

2016-02-01 Thread James Darnley
2.6 times faster (366 vs. 142 cycles) --- Changes since last patch: - name changed to follow 420 version. - use one less reg by using r4 more (James Almer's suggestion) - don't require aligned space in the stack, use a negative value as the cglobal argument. (perhaps unnessecary now that r6

Re: [FFmpeg-devel] [PATCH] avcodec/h264: mmxext 4:2:2 chroma deblock/loop filter

2016-01-15 Thread James Darnley
On 2016-01-15 04:21, Ronald S. Bultje wrote: > If you don't need r%dm (looks like you don't, but didn't check > exhaustively), you can also use a negative stack size (0 - mmsize - > ARCH_X86_64 * 2 * mmsize), then it will not create a stack pointer. I am already using r[0-3]m for storage. (A

Re: [FFmpeg-devel] [PATCH] avcodec/h264: mmxext 4:2:2 chroma deblock/loop filter

2016-01-15 Thread James Darnley
On 2016-01-15 03:55, James Almer wrote: > On 1/14/2016 11:05 PM, James Darnley wrote: >> diff --git a/libavcodec/x86/h264_deblock.asm >> b/libavcodec/x86/h264_deblock.asm >> index 5151f3c..20f0814 100644 >> --- a/libavcodec/x86/h264_deblock.asm >> +++ b/libavcodec/x86/h264_deblock.asm >> @@

Re: [FFmpeg-devel] [PATCH] avcodec/h264: mmxext 4:2:2 chroma deblock/loop filter

2016-01-15 Thread James Darnley
On 2016-01-15 21:55, James Almer wrote: > On 1/15/2016 5:00 PM, James Darnley wrote: >> On 2016-01-15 03:55, James Almer wrote: >>> On 1/14/2016 11:05 PM, James Darnley wrote: diff --git a/libavcodec/x86/h264_deblock.asm b/libavcodec/x86/h264_deblock.asm index 5151f3c..20f0814

Re: [FFmpeg-devel] [PATCH] avcodec/h264: mmxext 4:2:2 chroma deblock/loop filter

2016-01-15 Thread Ronald S. Bultje
Hi, On Fri, Jan 15, 2016 at 4:47 PM, James Darnley wrote: > On 2016-01-15 21:55, James Almer wrote: > > On 1/15/2016 5:00 PM, James Darnley wrote: > >> On 2016-01-15 03:55, James Almer wrote: > >>> On 1/14/2016 11:05 PM, James Darnley wrote: > diff --git

[FFmpeg-devel] [PATCH] avcodec/h264: mmxext 4:2:2 chroma deblock/loop filter

2016-01-14 Thread James Darnley
2.6 times faster --- I have one question now. Should I make the function name match the assembly existing deblock/loop filter functions? I took the current name from the C (as I was originally trying to use a gather instruction but that didn't offer any benefit). ---

Re: [FFmpeg-devel] [PATCH] avcodec/h264: mmxext 4:2:2 chroma deblock/loop filter

2016-01-14 Thread Ronald S. Bultje
Hi, On Thu, Jan 14, 2016 at 9:55 PM, James Almer wrote: > On 1/14/2016 11:05 PM, James Darnley wrote: > > 2.6 times faster > > --- > > I have one question now. Should I make the function name match the > assembly > > existing deblock/loop filter functions? I took the

Re: [FFmpeg-devel] [PATCH] avcodec/h264: mmxext 4:2:2 chroma deblock/loop filter

2016-01-14 Thread James Almer
On 1/14/2016 11:05 PM, James Darnley wrote: > 2.6 times faster > --- > I have one question now. Should I make the function name match the assembly > existing deblock/loop filter functions? I took the current name from the C > (as > I was originally trying to use a gather instruction but that