[FFmpeg-devel] [PATCH 5/5] x86: hevc_mc: put_pixels and 1d epel for x86_32

2015-02-07 Thread Christophe Gisquet
Now that the xmm register and gpr count has decreased, it is possible to port to x86_32. To save on code, x86_32 with or without PIC is handled as if PIC. --- libavcodec/x86/hevc_mc.asm| 39 +++ libavcodec/x86/hevcdsp.h | 4 +++- libavcodec/x86/hevcdsp_init.c

[FFmpeg-devel] [PATCH 3/5] x86: hevc_mc: save 1 gpr in epel filter loading

2015-02-07 Thread Christophe Gisquet
The 3*stride value stored in r3src can be loaded much later, so use r3src instead of a dedicated gpr when possible. --- libavcodec/x86/hevc_mc.asm | 65 ++ 1 file changed, 31 insertions(+), 34 deletions(-) diff --git a/libavcodec/x86/hevc_mc.asm

[FFmpeg-devel] [PATCH 2/5] x86: hevc_mc: remove lea in EPEL_LOAD

2015-02-07 Thread Christophe Gisquet
The second parameter to the macro is always an immediate address, so no lea is needed. --- libavcodec/x86/hevc_mc.asm | 19 +++ 1 file changed, 7 insertions(+), 12 deletions(-) diff --git a/libavcodec/x86/hevc_mc.asm b/libavcodec/x86/hevc_mc.asm index f6dff4c..aab69dd 100644 ---

[FFmpeg-devel] [PATCH 0/5] x86: hevc_mc: some simd for x86_32

2015-02-07 Thread Christophe Gisquet
it, as that's 1-4% runtime improvement for something far from realtime decoding. Christophe Gisquet (5): x86: hevc_mc: fewer gpr autoloads for _v filters x86: hevc_mc: remove lea in EPEL_LOAD x86: hevc_mc: save 1 gpr in epel filter loading x86: hevc_mc: fewer xmm regs used in epel h/v x86: hevc_mc

[FFmpeg-devel] [PATCH 1/5] x86: hevc_mc: fewer gpr autoloads for _v filters

2015-02-07 Thread Christophe Gisquet
In that case, it's just to load my, but mx/r3src is not used. --- libavcodec/x86/hevc_mc.asm | 18 -- 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/libavcodec/x86/hevc_mc.asm b/libavcodec/x86/hevc_mc.asm index 3f56782..f6dff4c 100644 ---

[FFmpeg-devel] [PATCH 4/5] x86: hevc_mc: fewer xmm regs used in epel h/v

2015-02-07 Thread Christophe Gisquet
--- libavcodec/x86/hevc_mc.asm | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/libavcodec/x86/hevc_mc.asm b/libavcodec/x86/hevc_mc.asm index 74e08d4..a127a4d 100644 --- a/libavcodec/x86/hevc_mc.asm +++ b/libavcodec/x86/hevc_mc.asm @@ -734,7 +734,7 @@ cglobal

Re: [FFmpeg-devel] [PATCH 0/5] x86: hevc_mc: some simd for x86_32

2015-02-07 Thread Christophe Gisquet
2015-02-07 19:49 GMT+01:00 Christophe Gisquet christophe.gisq...@gmail.com: Christophe Gisquet (5): x86: hevc_mc: fewer gpr autoloads for _v filters x86: hevc_mc: remove lea in EPEL_LOAD x86: hevc_mc: save 1 gpr in epel filter loading x86: hevc_mc: fewer xmm regs used in epel h/v

Re: [FFmpeg-devel] [PATCH]Correctly read RLE encoding from dpx files

2015-02-07 Thread Christophe Gisquet
Hi, 2015-02-07 16:47 GMT+01:00 Carl Eugen Hoyos ceho...@ag.or.at: Attached patch intends to fix reading the RLE-attribute from dpx files. I don't think this is valid. You are skipping a byte, as if there was an additional element to skip. But there is none. The bitstream is really: bits per

[FFmpeg-devel] [PATCH] x86: lavu/x264asm: fix ymm register instanciation

2015-02-03 Thread Christophe Gisquet
for the patch. This patch is mostly meant to check that it does fix generated assembly for other users. -- Christophe From 018f5122d53fd0514644b86169d593c28a6924db Mon Sep 17 00:00:00 2001 From: Christophe Gisquet christophe.gisq...@gmail.com Date: Tue, 3 Feb 2015 13:03:48 +0100 Subject: [PATCH] x86: lavu

Re: [FFmpeg-devel] [PATCH] x86: hevc_mc: remove non necessary moves

2015-02-03 Thread Christophe Gisquet
2015-02-03 12:57 GMT+01:00 Christophe Gisquet christophe.gisq...@gmail.com: Actually, 940300945 does need to be reverted for the patch to work, as Mickael stated. It miscompiles hevc_mc.asm, more particularly the [eq]pel_hv functions. No idea why. The patch in [PATCH] x86: lavu/x264asm: fix

Re: [FFmpeg-devel] [PATCH] x86: hevc_mc: remove non necessary moves

2015-02-03 Thread Christophe Gisquet
Hi, 2015-02-02 18:23 GMT+01:00 James Almer jamr...@gmail.com: https://github.com/OpenHEVC/FFmpeg/commit/940300945995c20f7583394ebe6907e72829b4a [...] Tested it. Doesn't pass with avx2. Actually, 940300945 does need to be reverted for the patch to work, as Mickael stated. It miscompiles

Re: [FFmpeg-devel] [PATCH 5/6] x86/hevcdsp: add ff_hevc_sao_edge_filter_8_{ssse3, avx2}

2015-02-04 Thread Christophe Gisquet
Hi, 2015-02-04 4:55 GMT+01:00 James Almer jamr...@gmail.com: Original x86 intrinsics code and initial yasm port by Pierre-Edouard Lepere. Refactoring and optimizations by James Almer. Add your own copyright to this file then. Width 32 158583 decicycles in edge, sao_edge_filter_8 runs, 0

Re: [FFmpeg-devel] [PATCH 6/6] x86/hevcdsp: add ff_hevc_sao_edge_filter_{10, 12}_{sse2, avx2}

2015-02-04 Thread Christophe Gisquet
Hi, 2015-02-04 4:55 GMT+01:00 James Almer jamr...@gmail.com: -DECLARE_ALIGNED(16, const xmm_reg, ff_pw_1)= { 0x0001000100010001ULL, 0x0001000100010001ULL }; -DECLARE_ALIGNED(16, const xmm_reg, ff_pw_2)= { 0x0002000200020002ULL, 0x0002000200020002ULL }; +DECLARE_ALIGNED(32, const

Re: [FFmpeg-devel] [PATCH] x86: hevc_mc: remove non necessary moves

2015-02-03 Thread Christophe Gisquet
​​Hi, Le 3 févr. 2015 18:47, James Almer jamr...@gmail.com a écrit : On 02/02/15 2:11 PM, Christophe Gisquet wrote: @@ -87,11 +95,22 @@ QPEL_TABLE 12, 4, w, sse4 %elif %1 = 8 movdqa%3, [%2] ; load data from source2 %elif %1

Re: [FFmpeg-devel] [PATCH] avcodec/hevc: reduce memory used by the SAO

2015-02-05 Thread Christophe Gisquet
Hi, 2015-02-05 11:13 GMT+01:00 Christophe Gisquet christophe.gisq...@gmail.com: 2015-02-05 10:13 GMT+01:00 Christophe Gisquet christophe.gisq...@gmail.com: The patch breaks make fate-hevc THREADS=3, so needs more thought. Compilation issue, running make clean first passes fate-hevc THREADS=3

Re: [FFmpeg-devel] DSP function ARM NEON patches for hevc

2015-02-05 Thread Christophe Gisquet
Hi, 2015-02-05 14:22 GMT+01:00 Mickaël Raulet mrau...@gmail.com: Michael, Please find some commits that can be cherry picked from https://github.com/OpenHEVC/FFmpeg/commits/ffmpeg_patch Optimized deblocking filter (8bits only) 1b9ee47d2f43b0a029a9468233626102eb1473b8 Optimzed transform

Re: [FFmpeg-devel] [PATCH 2/3] x86/hevcdsp: add ff_hevc_sao_edge_filter_8_{ssse3, avx2}

2015-02-05 Thread Christophe Gisquet
Hi, 2015-02-05 5:18 GMT+01:00 James Almer jamr...@gmail.com: Original x86 intrinsics code and initial yasm port by Pierre-Edouard Lepere. Refactoring and optimizations by James Almer. No further comment from me. -- Christophe ___ ffmpeg-devel

Re: [FFmpeg-devel] [PATCH 3/3] x86/hevcdsp: add ff_hevc_sao_edge_filter_{10, 12}_{sse2, avx2}

2015-02-05 Thread Christophe Gisquet
2015-02-05 5:18 GMT+01:00 James Almer jamr...@gmail.com: Original x86 intrinsics code by Pierre-Edouard Lepere. Yasm port, refactoring and optimizations by James Almer. OK. -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org

[FFmpeg-devel] [PATCH] lavc/pthread_slice: release entries

2015-02-05 Thread Christophe Gisquet
1c4004c78c0b33ad2d30d6ef8baf5eb099c5b1eb Mon Sep 17 00:00:00 2001 From: Christophe Gisquet christophe.gisq...@gmail.com Date: Thu, 5 Feb 2015 16:00:11 +0100 Subject: [PATCH] lavc/pthread_slice: release entries When calling ff_alloc_entries, a number of entries are created. They are never freed

[FFmpeg-devel] [PATCH 1/7] x86: hevc_mc: add AVX2 optimizations

2015-02-05 Thread Christophe Gisquet
From: plepere pierre-edouard.lep...@insa-rennes.fr before 33304 decicycles in luma_bi_1, 523066 runs, 1222 skips 38138 decicycles in luma_bi_2, 523427 runs, 861 skips 13490 decicycles in luma_uni, 516138 runs, 8150 skips after 20185 decicycles in luma_bi_1, 519970 runs, 4318 skips 24620

[FFmpeg-devel] [PATCH 3/7] x86/hevc: use CLIPW macro when possible

2015-02-05 Thread Christophe Gisquet
From: Mickaël Raulet mrau...@insa-rennes.fr Conflicts: libavcodec/x86/hevc_mc.asm --- libavcodec/x86/hevc_mc.asm | 12 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/libavcodec/x86/hevc_mc.asm b/libavcodec/x86/hevc_mc.asm index efb4d1f..e8a5032 100644 ---

[FFmpeg-devel] [PATCH 6/7] x86: lavc: share more constant through defines

2015-02-05 Thread Christophe Gisquet
--- libavcodec/x86/constants.c | 22 -- libavcodec/x86/constants.h | 16 +--- libavcodec/x86/h264_deblock_10bit.asm | 6 ++ libavcodec/x86/h264_idct_10bit.asm | 4 +++- libavcodec/x86/h264_intrapred_10bit.asm | 3 ++-

[FFmpeg-devel] [PATCH 4/7] x86/hevc_mc: use aligned loads

2015-02-05 Thread Christophe Gisquet
From: Mickaël Raulet mrau...@insa-rennes.fr --- libavcodec/hevc.h | 2 +- libavcodec/x86/hevc_mc.asm | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/libavcodec/hevc.h b/libavcodec/hevc.h index ae9a32a..e0af6f1 100644 --- a/libavcodec/hevc.h +++

[FFmpeg-devel] [PATCH 0/7] x86: hevc MC and constants

2015-02-05 Thread Christophe Gisquet
not affect the DSP API. Christophe Gisquet (4): x86: hevc_mc: use epel_hv 16-wide function x86: lavc: share more constants x86: lavc: share more constant through defines x86: hevc: remove a parameter to WP internals Mickaël Raulet (2): x86/hevc: use CLIPW macro when possible x86/hevc_mc: use

[FFmpeg-devel] [PATCH 7/7] x86: hevc: remove a parameter to WP internals

2015-02-05 Thread Christophe Gisquet
The second stride is always the internal buffer one, MAX_PB_SIZE (times 2 to get the value in bytes). --- libavcodec/x86/hevc_mc.asm| 30 +++--- libavcodec/x86/hevcdsp.h | 4 ++-- libavcodec/x86/hevcdsp_init.c | 16 3 files changed, 25

[FFmpeg-devel] [PATCH 5/7] x86: lavc: share more constants

2015-02-05 Thread Christophe Gisquet
--- libavcodec/x86/ac3dsp.asm | 2 +- libavcodec/x86/constants.c | 9 - libavcodec/x86/constants.h | 4 libavcodec/x86/h264_qpel_10bit.asm | 2 +- libavcodec/x86/hevc_mc.asm | 14 +++--- libavcodec/x86/hevc_sao.asm| 2 +-

Re: [FFmpeg-devel] DSP function ARM NEON patches for hevc

2015-02-05 Thread Christophe Gisquet
2015-02-05 18:28 GMT+01:00 James Almer jamr...@gmail.com: On 05/02/15 10:22 AM, Mickaël Raulet wrote: More coming soon for epel and SAO! The SAO prototypes got some slight changes with my patches. The author of this ARM code (or someone else) will have to revise it before it can be merged.

Re: [FFmpeg-devel] [PATCH 1/3] hevcdsp: remove compilation-time-fixed parameter from sao_edge_filter

2015-02-05 Thread Christophe Gisquet
2015-02-05 5:18 GMT+01:00 James Almer jamr...@gmail.com: The stride_src parameter is always 2*MAX_PB_SIZE + FF_INPUT_BUFFER_PADDING_SIZE. OK. That's the change the SAO implementations should be aware of. Good to commit. -- Christophe ___

[FFmpeg-devel] [PATCH] hevc/sao: do in-place band filtering when possible

2015-02-05 Thread Christophe Gisquet
On the other hand, the stride is known at compilation time, so the asm could use that to reduce the number of gprs and therefore helps having a x86_32 version. -- Christophe From 55047bbb991c95f126d597bbe05e424406af4ec4 Mon Sep 17 00:00:00 2001 From: Christophe Gisquet christophe.gisq

Re: [FFmpeg-devel] [PATCH 7/7] x86: hevc: remove a parameter to WP internals

2015-02-06 Thread Christophe Gisquet
2015-02-06 16:18 GMT+01:00 James Almer jamr...@gmail.com: mov r5d, denomm is loading it on Win64. It's also a superfluous instruction on UNIX64, where it translates to mov r5d, r5d. At least if i'm reading this right. movifnidn then? -- Christophe

Re: [FFmpeg-devel] [PATCH] x86/hevc_sao: fix loading of RIP address

2015-02-06 Thread Christophe Gisquet
Hi, 2015-02-06 17:54 GMT+01:00 James Almer jamr...@gmail.com: pb_eo must be handled as a rip relative address for MSVC64, so an intermediate register is needed. Should fix link failures. Seems ok on principle, passes fate on mingw64. I'm always wary of those ABI, so if anyone could verify for

Re: [FFmpeg-devel] [PATCH 2/3] x86/hevcdsp: add ff_hevc_sao_edge_filter_8_{ssse3, avx2}

2015-02-06 Thread Christophe Gisquet
having the patch, but I'm not completely sure it will apply fine. -- Christophe From f0997ceac461add7dddbb1c0a75797bf462bf16e Mon Sep 17 00:00:00 2001 From: Christophe Gisquet christophe.gisq...@gmail.com Date: Fri, 6 Feb 2015 13:43:45 +0100 Subject: [PATCH] x86: hevc_sao: fix loading of RIP

[FFmpeg-devel] [PATCH] x86/doc/Makefile: DBG=1 to preprocess external asm

2015-02-08 Thread Christophe Gisquet
Suggestions for more efficient make rules, clearer documentation and a better macro than the generic 'DBG' welcome. -- Christophe From 8e2adea82a4ec440744b701716a93e6ecaf211d6 Mon Sep 17 00:00:00 2001 From: Christophe Gisquet christophe.gisq...@gmail.com Date: Sun, 8 Feb 2015 12:18:27 +0100

Re: [FFmpeg-devel] [PATCH] x86/doc/Makefile: DBG=1 to preprocess external asm

2015-02-08 Thread Christophe Gisquet
17 00:00:00 2001 From: Christophe Gisquet christophe.gisq...@gmail.com Date: Sun, 8 Feb 2015 12:18:27 +0100 Subject: [PATCH] x86/doc/Makefile: DBG=1 to preprocess external asm The macro hell sometimes make it difficult to trace the source of an error, so it is easier to analyze the preprocessed

Re: [FFmpeg-devel] [PATCH] x86inc: also warn on sse2 instruction

2015-02-08 Thread Christophe Gisquet
by the first warn commit, because of differing insn set (haven't checked though). This one is specifically for might be insn set a or b, but reg size makes it clearer. -- Christophe From cc1f681defcde77bf1a371633c0eba155b93f235 Mon Sep 17 00:00:00 2001 From: Christophe Gisquet christophe.gisq

Re: [FFmpeg-devel] [PATCH] x86/hevc_sao: make sao_band_filter work on x86_32

2015-02-08 Thread Christophe Gisquet
Hi, 2015-02-07 23:06 GMT+01:00 James Almer jamr...@gmail.com: Signed-off-by: James Almer jamr...@gmail.com --- libavcodec/x86/hevc_sao.asm | 40 libavcodec/x86/hevcdsp_init.c | 24 2 files changed, 48 insertions(+), 16

[FFmpeg-devel] [PATCH] x86inc: also warn on sse2 instruction

2015-02-08 Thread Christophe Gisquet
;a=commitdiff;h=fc7e02f0ff345d5331b7c78f2400668d2c79a8b0;hp=4ccd7cb45b9aa46d94c29dbd1c065b652bda2319 It is currently in x264 'sandbox', ie not in a released version. -- Christophe From 1ade9345fbb392c0612ff0053523857abd5da563 Mon Sep 17 00:00:00 2001 From: Christophe Gisquet christophe.gisq

Re: [FFmpeg-devel] [PATCH] x86/hevc_sao: make sao_band_filter work on x86_32

2015-02-08 Thread Christophe Gisquet
Hi, 2015-02-08 18:48 GMT+01:00 James Almer jamr...@gmail.com: +%assign MMSIZE mmsize Why do that? Not a big deal: it's only for my education, if there's something I'm missing. For width 48, the COMPUTE macro is last run after an INIT_XMM cpuname, so mmsize becomes 16 and in the avx2

[FFmpeg-devel] [PATCH] ffmpeg_opt: expand format for strftime

2015-01-21 Thread Christophe Gisquet
Hi, another warning under MinGW fixed, because the format specifiers are not supported, cf. for instance: https://msdn.microsoft.com/en-us/library/fe06s4ak.aspx -- Christophe From 6d3e4eb5df0eacf3df011dbe3e05a82f814f63b1 Mon Sep 17 00:00:00 2001 From: Christophe Gisquet christophe.gisq

Re: [FFmpeg-devel] [PATCH] libavformat/img2dec: fix warning when !HAVE_GLOB

2015-01-21 Thread Christophe Gisquet
2015-01-21 20:06 GMT+01:00 Christophe Gisquet christophe.gisq...@gmail.com: Hi, the attached patch fixes a warning under MinGW (no idea about msys2). With aforementioned patch. -- Christophe From 54b6d1b588d4a15f51345f8ca1fc5b8ab0660b7e Mon Sep 17 00:00:00 2001 From: Christophe Gisquet

Re: [FFmpeg-devel] [PATCH] Port mp=eq/eq2 to FFmpeg

2015-01-22 Thread Christophe Gisquet
So... 2015-01-21 21:08 GMT+01:00 arwa arif arwaarif1...@gmail.com: Updated the patch. There are trailing spaces, and the patch does not apply here (error on libavfilter/x86/Makefile) Furthemore, I think that hunk is incorrect: +set_gamma(eq); +set_contrast(eq); +

Re: [FFmpeg-devel] [PATCH 3/5] x86: hevc_mc: save 1 gpr in epel filter loading

2015-02-16 Thread Christophe Gisquet
2015-02-16 10:43 GMT+01:00 Christophe Gisquet christophe.gisq...@gmail.com: Obviously I shouldn't unconditionally use r3srcq or equivalent, as !PIC just directly access the %%table I probably need to define an intermediate, say TABLE, which is either r3srcq or %%table, and use it for loading

Re: [FFmpeg-devel] [PATCH] x86/doc/Makefile: DBG=1 to preprocess external asm

2015-02-18 Thread Christophe Gisquet
with the attached patch. But I wonder if a dependency rule is really needed, as I can see it causing issues... (does it depend on .asm or .dbg.asm etc) So I don't think we there yet. -- Christophe From f3365ec79b096dad0ccd7246b78ea9b7074f3b49 Mon Sep 17 00:00:00 2001 From: Christophe Gisquet

Re: [FFmpeg-devel] [PATCH] x86/hevc_sao: make sao_edge_filter_{10, 12} work on x86_32

2015-02-12 Thread Christophe Gisquet
Hi, 2015-02-12 11:28 GMT+01:00 Michael Niedermayer michae...@gmx.at: On Tue, Feb 10, 2015 at 01:44:26AM -0300, James Almer wrote: Signed-off-by: James Almer jamr...@gmail.com --- libavcodec/x86/hevc_sao.asm | 106 ++ libavcodec/x86/hevcdsp_init.c

Re: [FFmpeg-devel] [PATCH] x86/hevc_mc: optimize AVX2 mc functions

2015-02-12 Thread Christophe Gisquet
Hi, 2015-02-12 7:29 GMT+01:00 James Almer jamr...@gmail.com: Before 40766 decicycles in ff_hevc_put_hevc_qpel_h64_8_avx2, 8192 runs, 0 skips After 37975 decicycles in ff_hevc_put_hevc_qpel_h64_8_avx2, 8192 runs, 0 skips Looks straightforward. But now I understand why we declare using 11 xmm

Re: [FFmpeg-devel] [PATCH] x86inc: also warn on sse2 instruction

2015-02-16 Thread Christophe Gisquet
Hi, 2015-02-08 14:53 GMT+01:00 Christophe Gisquet christophe.gisq...@gmail.com: This one is specifically for might be insn set a or b, but reg size makes it clearer. Ping? -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http

Re: [FFmpeg-devel] [PATCH] x86/doc/Makefile: DBG=1 to preprocess external asm

2015-02-16 Thread Christophe Gisquet
Hi, 2015-02-08 14:54 GMT+01:00 Christophe Gisquet christophe.gisq...@gmail.com: 2015-02-08 14:07 GMT+01:00 Carl Eugen Hoyos ceho...@ag.or.at: Doesn't this also need an update for make clean? Right, here's an updated one, also improving dependency generation and allowing more genering

Re: [FFmpeg-devel] [PATCH 5/5] x86: hevc_mc: put_pixels and 1d epel for x86_32

2015-02-16 Thread Christophe Gisquet
Hi, 2015-02-07 23:11 GMT+01:00 Michael Niedermayer michae...@gmx.at: this breaks building shared libs: /usr/bin/ld: libavcodec/x86/hevc_mc.o: relocation R_X86_64_32 against `.text' can not be used when making a shared object; recompile with -fPIC libavcodec/x86/hevc_mc.o: could not read

Re: [FFmpeg-devel] [PATCH 4/5] x86: hevc_mc: fewer xmm regs used in epel h/v

2015-02-16 Thread Christophe Gisquet
here's a patch with the values swapped out. From 2955eea46501d096a47fbf2fb1824daa622f6031 Mon Sep 17 00:00:00 2001 From: Christophe Gisquet christophe.gisq...@gmail.com Date: Mon, 16 Feb 2015 20:12:04 +0100 Subject: [PATCH 2/3] x86: hevc_mc: fewer xmm regs used in epel h/v 11 xmm regs seem only required

Re: [FFmpeg-devel] [PATCH] x86/doc/Makefile: DBG=1 to preprocess external asm

2015-02-17 Thread Christophe Gisquet
Hi, 2015-02-17 13:29 GMT+01:00 Michael Niedermayer michae...@gmx.at: also, i wonder, should this be extended to also work with C files ? Again, with asm, macro hell sometimes gives you that. So it makes sense. I don't know well enough icc/llvm/whatever we handle, so the main issue might be to

Re: [FFmpeg-devel] [PATCH] x86/doc/Makefile: DBG=1 to preprocess external asm

2015-02-17 Thread Christophe Gisquet
Hi, 2015-02-17 13:21 GMT+01:00 Michael Niedermayer michae...@gmx.at: btw, this works too: (without DBG=1) make V=2 libavcodec/x86/vp8dsp_loopfilter.dbg.o Yeah, once I'm working on a specific file, that's what I do. But iirc, you can just do make DBG=1 and wait for any error to mention the

Re: [FFmpeg-devel] [PATCH 2/3] hevcdsp: replace the SAOParams struct parameter from sao_band_filter

2015-01-30 Thread Christophe Gisquet
Hi, 2015-01-30 19:50 GMT+01:00 James Almer jamr...@gmail.com: Pass instead the two variables from the struct needed in the function. This simplifies writing asm optimized versions of the function ok, no impact by itself but... void (*sao_band_filter)(uint8_t *_dst, uint8_t *_src,

Re: [FFmpeg-devel] [PATCH] x86/sbrdsp: add ff_sbr_autocorrelate_{sse, sse3}

2015-01-25 Thread Christophe Gisquet
the start/tail parts. -- Christophe From 49e41dd86eb65a774f3561420dd5e9de83f328f2 Mon Sep 17 00:00:00 2001 From: Christophe Gisquet christophe.gisq...@gmail.com Date: Sun, 25 Jan 2015 13:52:16 +0100 Subject: [PATCH 2/2] Use different mem moves. --- libavcodec/x86/sbrdsp.asm | 28

Re: [FFmpeg-devel] [PATCH] nvenc: Propagate desired number of reference frames.

2015-01-23 Thread Christophe Gisquet
Hi, 2015-01-23 10:54 GMT+01:00 Timo Rothenpieler t...@rothenpieler.org: This would also forward the ffmpeg default, 1, to nvenc, instead of the nvenc default, 0, which lets the driver decide what is best. I'm not sure if this is desireable and how much it affects the quality. In that case,

[FFmpeg-devel] [PATCH] avcodec/hevc: reduce memory used by the SAO

2015-02-01 Thread Christophe Gisquet
adaptation from Fabrice's version is to match our alignment requirements, and abuse the edge emu buffers instead of adding a new buffer. Decicycles: 26772-26220 (BO32), 83803-80942 (BO64) Signed-off-by: Christophe Gisquet christophe.gisq...@gmail.com --- libavcodec/hevc.c| 44

Re: [FFmpeg-devel] [PATCH 3/3] x86/hevc: add ff_hevc_sao_band_filter_{8, 10, 12}_{sse2, avx2}

2015-01-31 Thread Christophe Gisquet
Hi, 2015-01-30 19:50 GMT+01:00 James Almer jamr...@gmail.com: +%macro HEVC_SAO_BAND_FILTER_COMPUTE 3 +psraw %2, %3, %1-5 +pcmpeqw m10, %2, m0 +pcmpeqw m11, %2, m1 +pcmpeqw m12, %2, m2 +pcmpeqw %2, m3 +pand

Re: [FFmpeg-devel] [PATCH 2/2] ffmpeg: don't use deprecated av_log_ask_for_sample

2015-01-30 Thread Christophe Gisquet
Hi, 2015-01-30 14:59 GMT+01:00 Michael Niedermayer michae...@gmx.at: On Fri, Jan 30, 2015 at 01:09:03PM +, Christophe Gisquet wrote: The solution requires accessing an lavu internal, which may not be a good example for ffmpeg as a library user. [...] ffmpeg.c as a user application should

Re: [FFmpeg-devel] [PATCH 3/3] x86/hevc: add ff_hevc_sao_band_filter_{8, 10, 12}_{sse2, avx2}

2015-01-31 Thread Christophe Gisquet
Hi, 2015-01-31 11:33 GMT+01:00 Christophe Gisquet christophe.gisq...@gmail.com: 2015-01-30 19:50 GMT+01:00 James Almer jamr...@gmail.com: +%macro HEVC_SAO_BAND_FILTER_COMPUTE 3 +psraw %2, %3, %1-5 +pcmpeqw m10, %2, m0 +pcmpeqw m11, %2, m1

Re: [FFmpeg-devel] [PATCH 3/5] avcodec/vc1_pred: few branchless optimizations

2015-02-14 Thread Christophe Gisquet
2015-02-14 17:14 GMT+01:00 Michael Niedermayer michae...@gmx.at: On Sat, Feb 14, 2015 at 11:03:13PM +0800, zhaoxiu.zeng wrote: From 7e4038fe1291b857261584e69323486fc955cfb2 Mon Sep 17 00:00:00 2001 From: Zeng Zhaoxiu zhaoxiu.z...@gmail.com Date: Sat, 14 Feb 2015 20:08:48 +0800 Subject: [PATCH

Re: [FFmpeg-devel] [PATCH 1/3] avcodec/wmalosslessdec: change type of acfilter_coeffs from int64_t to int16_t

2015-02-14 Thread Christophe Gisquet
Hi, 2015-02-13 17:49 GMT+01:00 zhaoxiu.zeng zhaoxiu.z...@gmail.com: int8_t acfilter_order; int8_t acfilter_scaling; -int64_t acfilter_coeffs[16]; +int16_t acfilter_coeffs[16]; int acfilter_prevvalues[WMALL_MAX_CHANNELS][16]; int8_t mclms_order; @@ -818,7

Re: [FFmpeg-devel] [PATCH] x86/hevc_mc: optimize AVX2 mc functions

2015-02-12 Thread Christophe Gisquet
2015-02-12 11:47 GMT+01:00 Christophe Gisquet christophe.gisq...@gmail.com: Looks straightforward. But now I understand why we declare using 11 xmm regs in some places, which impacts a patch that has been reviewed and needs updating. A patch of mine for x86_32. Just ignore me, I'm speaking

Re: [FFmpeg-devel] [PATCH 3/5] x86: hevc_mc: save 1 gpr in epel filter loading

2015-02-16 Thread Christophe Gisquet
Hi, 2015-02-16 4:51 GMT+01:00 Michael Niedermayer michae...@gmx.at: On Sat, Feb 07, 2015 at 06:49:38PM +, Christophe Gisquet wrote: The 3*stride value stored in r3src can be loaded much later, so use r3src instead of a dedicated gpr when possible. --- libavcodec/x86/hevc_mc.asm | 65

Re: [FFmpeg-devel] [PATCH] avcodec/hevc: reduce memory used by the SAO

2015-02-05 Thread Christophe Gisquet
: Christophe Gisquet christophe.gisq...@gmail.com Date: Thu, 5 Feb 2015 19:51:22 +0100 Subject: [PATCH] hevc: free sao buffers when receiving a new SPS The buffer pointers would be otherwise overwritten, causing a leak on e.g. PERSIST_RPARAM_A_RExt_Sony_1. --- libavcodec/hevc.c | 9 - 1 file

Re: [FFmpeg-devel] [PATCH] avcodec/hevc: reduce memory used by the SAO

2015-02-05 Thread Christophe Gisquet
Hi, 2015-02-05 9:48 GMT+01:00 Mickaël Raulet mrau...@insa-rennes.fr: WPP try it out with thread_type=slice. Does it mean these buffer should rather be per thread? If having 2 ctb lines of buffer fixes this, does this mean having 2 instances of a single line/column, one per ctb line number

Re: [FFmpeg-devel] [PATCH 1/5] lavc: use LOCAL_ALIGNED instead of DECLARE_ALIGNED

2015-03-14 Thread Christophe Gisquet
Hi, 2015-03-14 18:38 GMT+01:00 Michael Niedermayer michae...@gmx.at: static void ff_prores_idct_wrap(int16_t *dst){ -DECLARE_ALIGNED(16, static int16_t, qmat)[64]; +LOCAL_ALIGNED(16, static int16_t, qmat, [64]); int i; this seems to break build ffmpeg/libavcodec/dct-test.c:

Re: [FFmpeg-devel] [PATCH 2/5] x86: lavc: use LOCAL_ALIGNED instead of DECLARE_ALIGNED

2015-03-14 Thread Christophe Gisquet
, static int16_t, qmat, [64]); \ +LOCAL_ALIGNED(16, static int16_t, tmp, [64]); \ int i; \ LOCAL_ALIGNED + static looks unintended Same fix then. Best regards, Christophe From 93a8d803f6b87e2c5bd062724630e1d67804da29 Mon Sep 17 00:00:00 2001 From: Christophe Gisquet christophe.gisq

Re: [FFmpeg-devel] [PATCH 1/5] lavc: use LOCAL_ALIGNED instead of DECLARE_ALIGNED

2015-03-14 Thread Christophe Gisquet
2001 From: Christophe Gisquet christophe.gisq...@gmail.com Date: Sat, 14 Mar 2015 14:26:16 +0100 Subject: [PATCH 1/3] lavc: use LOCAL_ALIGNED instead of DECLARE_ALIGNED The later may yield incorrect code for on-stack variables. --- libavcodec/dct-test.c | 2 +- libavcodec

[FFmpeg-devel] [PATCH 2/3] lavu: LOCAL_ALIGNED is for arrays

2015-03-15 Thread Christophe Gisquet
Force an additional parameter then. --- libavutil/internal.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libavutil/internal.h b/libavutil/internal.h index 9ba2ea0..8081fdb 100644 --- a/libavutil/internal.h +++ b/libavutil/internal.h @@ -107,9 +107,9 @@ t (*v) o =

[FFmpeg-devel] [PATCH 3/3] ppc: lavc: use LOCAL_ALIGNED instead of DECLARE_ALIGNED

2015-03-15 Thread Christophe Gisquet
The later may yield incorrect code for on-stack variables. --- libavcodec/ppc/h264dsp.c| 8 +++ libavcodec/ppc/h264qpel.c | 50 - libavcodec/ppc/vp8dsp_altivec.c | 2 +- 3 files changed, 30 insertions(+), 30 deletions(-) diff --git

[FFmpeg-devel] [PATCH 1/3] lavc/lavu: remove LOCAL_ALIGNED_*

2015-03-15 Thread Christophe Gisquet
They were duplicating LOCAL_ALIGNED() without benefit. --- configure | 8 +++- libavcodec/aacps.c | 6 +++--- libavcodec/aacsbr.c| 6 +++--- libavcodec/ac3enc.c| 2 +- libavcodec/ac3enc_template.c

[FFmpeg-devel] [PATCH 0/3] Clean-up for LOCAL_ALIGNED

2015-03-15 Thread Christophe Gisquet
The second patch is the most interesting, because it prevents incorrect uses of LOCAL_ALIGNED that may have only caused warnings. This patch is of course smaller because of the code duplication removal of the first patch. Christophe Gisquet (3): lavc/lavu: remove LOCAL_ALIGNED_* lavu

Re: [FFmpeg-devel] [PATCH 0/3] Clean-up for LOCAL_ALIGNED

2015-03-15 Thread Christophe Gisquet
2015-03-15 8:48 GMT+01:00 Christophe Gisquet christophe.gisq...@gmail.com: The second patch is the most interesting, because it prevents incorrect uses of LOCAL_ALIGNED that may have only caused warnings. This patch is of course smaller because of the code duplication removal of the first

Re: [FFmpeg-devel] [PATCH 2/4] x86: xvid_idct: port MMX IDCT to yasm

2015-03-12 Thread Christophe Gisquet
2015-03-11 2:26 GMT+01:00 Michael Niedermayer michae...@gmx.at: this breaks make libavcodec/dct-test ... LD libavcodec/dct-test libavcodec/dct-test.o:(.rodata+0xe8): undefined reference to `ff_xvid_idct_mmx' libavcodec/dct-test.o:(.rodata+0x108): undefined reference to

Re: [FFmpeg-devel] [PATCH] avcodec/h264_mb: Fix undefined shifts

2015-03-12 Thread Christophe Gisquet
Hi, 2015-03-12 14:37 GMT+01:00 Michael Niedermayer michae...@gmx.at: const int mx = h-mv_cache[list][scan8[n]][0] + src_x_offset * 8; int my= h-mv_cache[list][scan8[n]][1] + src_y_offset * 8; const int luma_xy = (mx 3) + ((my 3) 2); -ptrdiff_t offset

Re: [FFmpeg-devel] [PATCH 2/4] x86: xvid_idct: port MMX IDCT to yasm

2015-03-13 Thread Christophe Gisquet
Hi, 2015-03-13 14:34 GMT+01:00 Michael Niedermayer michae...@gmx.at: They will not be used in any real world scenario. well it could be usefull in debuging in same cases especially if all mmx code is disabled on x86_64, for any single function its not too likely I believe fate and proper

Re: [FFmpeg-devel] [PATCH] ac3dec_fixed: always use the USE_FIXED=1 variant of the AC3DecodeContext

2015-03-13 Thread Christophe Gisquet
Hi, 2015-03-13 22:28 GMT+01:00 Andreas Cadhalpun andreas.cadhal...@googlemail.com: -int ff_eac3_parse_header(AC3DecodeContext *s); +static int ff_eac3_parse_header(AC3DecodeContext *s); It's somewhat cosmetics, but if these functions become static, they would better drop the ff_ prefix. -

[FFmpeg-devel] [PATCH 2/2] x86: dct-test: evaluate prores idct avx version

2015-03-14 Thread Christophe Gisquet
--- libavcodec/x86/dct-test.c | 39 ++- 1 file changed, 26 insertions(+), 13 deletions(-) diff --git a/libavcodec/x86/dct-test.c b/libavcodec/x86/dct-test.c index 63a9aeb..d1a5067 100644 --- a/libavcodec/x86/dct-test.c +++ b/libavcodec/x86/dct-test.c @@ -26,21

[FFmpeg-devel] [PATCH 0/2] dct-test and prores

2015-03-14 Thread Christophe Gisquet
Some issues found while touching dct-test. Depends on the patch series (rather its first 2 patches, of which one was already applied) for XviD iDCTs. Christophe Gisquet (2): x86: dct-test: fix compilation for prores x86: dct-test: evaluate prores idct avx version libavcodec/x86/dct-test.c

[FFmpeg-devel] [PATCH 1/2] x86: dct-test: fix compilation for prores

2015-03-14 Thread Christophe Gisquet
When the decoder is deactivated, the x86-optimized versions are not compiled, resulting in a link error. The C version is unaffected, as it is part of the idctdsp subsystem. --- libavcodec/x86/dct-test.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git

[FFmpeg-devel] [PATCH 3/4] eac3dec: fix scaling

2015-03-14 Thread Christophe Gisquet
This is the remaining error, the output on the SPX samples, respectively csi_miami_stereo_128_spx.eac3 and csi_miami_5.1_256_spx.eac3, goes from: stddev:8.71 PSNR: 77.52 MAXDIFF: 235 stddev:24270.51 PSNR: 22.17 MAXDIFF:47166 to: stddev:0.12 PSNR:114.12 MAXDIFF:1 stddev:0.12

[FFmpeg-devel] [PATCH 2/4] ac3_fixed: fix computation of spx_noise_blend

2015-03-14 Thread Christophe Gisquet
It was set to 1 instead of sqrt(3) --- libavcodec/ac3dec.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavcodec/ac3dec.c b/libavcodec/ac3dec.c index ce45186..ae4129f 100644 --- a/libavcodec/ac3dec.c +++ b/libavcodec/ac3dec.c @@ -939,7 +939,7 @@ static int

[FFmpeg-devel] [PATCH 1/4] ac3_fixed: fix out-of-bound read

2015-03-14 Thread Christophe Gisquet
Should also improve decoding, but actually doesn't... --- libavcodec/ac3dec.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavcodec/ac3dec.c b/libavcodec/ac3dec.c index 2f78d73..ce45186 100644 --- a/libavcodec/ac3dec.c +++ b/libavcodec/ac3dec.c @@ -872,7 +872,7 @@ static

[FFmpeg-devel] [PATCH 4/4] ac3dec: cosmetics

2015-03-14 Thread Christophe Gisquet
--- libavcodec/ac3dec.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/libavcodec/ac3dec.c b/libavcodec/ac3dec.c index ae4129f..ac53bdc 100644 --- a/libavcodec/ac3dec.c +++ b/libavcodec/ac3dec.c @@ -924,14 +924,13 @@ static int decode_audio_block(AC3DecodeContext *s,

[FFmpeg-devel] [PATCH 0/4] Further fixes to EAC3 FP decoder

2015-03-14 Thread Christophe Gisquet
The SPX blend factors were incorrectly computed. For now, still use floating point operations, with proper scaling, but the interested parties may want to write the proper FP.23 code. Christophe Gisquet (4): ac3_fixed: fix out-of-bound read ac3_fixed: fix computation of spx_noise_blend

[FFmpeg-devel] [PATCH 3/5] ppc: libswscale: use LOCAL_ALIGNED instead of DECLARE_ALIGNED

2015-03-14 Thread Christophe Gisquet
The later may yield incorrect code for on-stack variables. --- libswscale/ppc/swscale_altivec.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libswscale/ppc/swscale_altivec.c b/libswscale/ppc/swscale_altivec.c index a1548a7..3034c72 100644 ---

[FFmpeg-devel] [PATCH 2/5] x86: lavc: use LOCAL_ALIGNED instead of DECLARE_ALIGNED

2015-03-14 Thread Christophe Gisquet
The later may yield incorrect code for on-stack variables. --- libavcodec/x86/ac3dsp_init.c | 2 +- libavcodec/x86/cavsdsp.c | 4 ++-- libavcodec/x86/dct-test.c | 4 ++-- libavcodec/x86/h264_qpel.c| 22 +++--- libavcodec/x86/rv40dsp_init.c | 2 +-

[FFmpeg-devel] [PATCH 5/5] lavc/lavu: remove LOCAL_ALIGNED_*

2015-03-14 Thread Christophe Gisquet
They were duplicating LOCAL_ALIGNED() without benefit. --- configure | 8 +++- libavcodec/aacps.c | 6 +++--- libavcodec/aacsbr.c| 6 +++--- libavcodec/ac3enc.c| 2 +- libavcodec/ac3enc_template.c

[FFmpeg-devel] [PATCH 1/5] lavc: use LOCAL_ALIGNED instead of DECLARE_ALIGNED

2015-03-14 Thread Christophe Gisquet
The later may yield incorrect code for on-stack variables. --- libavcodec/dct-test.c | 2 +- libavcodec/h264_loopfilter.c| 10 +- libavcodec/proresenc_anatoliy.c | 3 ++- libavcodec/vp8.c| 2 +- 4 files changed, 9 insertions(+), 8 deletions(-) diff --git

[FFmpeg-devel] [PATCH 4/5] ppc: lavc: use LOCAL_ALIGNED instead of DECLARE_ALIGNED

2015-03-14 Thread Christophe Gisquet
The later may yield incorrect code for on-stack variables. --- libavcodec/ppc/h264dsp.c| 10 - libavcodec/ppc/h264qpel.c | 50 - libavcodec/ppc/vp8dsp_altivec.c | 2 +- 3 files changed, 31 insertions(+), 31 deletions(-) diff --git

[FFmpeg-devel] [PATCH 0/5] Of {LOCAL_,DECLARE_}ALIGNED

2015-03-14 Thread Christophe Gisquet
* macros. All changes (core+x86) tested with: fate-vfilter fate-vcodec fate-h264 fate-ac3 fate-vp8 fate-cavs fate-vc1 for Win32 and Win64. Christophe Gisquet (5): lavc: use LOCAL_ALIGNED instead of DECLARE_ALIGNED x86: lavc: use LOCAL_ALIGNED instead of DECLARE_ALIGNED ppc: libswscale: use

Re: [FFmpeg-devel] [PATCH 2/4] x86: xvid_idct: port MMX IDCT to yasm

2015-03-12 Thread Christophe Gisquet
of the dct-test stuff. Incidentally, I found other issues with dct-test, but they are outside the scope of this patch series. -- Christophe From 866b481ecab3369712ff854ce6c0857b875b50e6 Mon Sep 17 00:00:00 2001 From: Christophe Gisquet christophe.gisq...@gmail.com Date: Tue, 10 Mar 2015 23:11:52

Re: [FFmpeg-devel] [PATCH 3/4] x86: xvid_idct: merged idct_put SSE2 versions

2015-03-12 Thread Christophe Gisquet
2015-03-11 0:11 GMT+01:00 Christophe Gisquet christophe.gisq...@gmail.com: --- libavcodec/x86/xvididct.asm| 202 - libavcodec/x86/xvididct_init.c | 8 +- 2 files changed, 140 insertions(+), 70 deletions(-) Not sure it needed a refresh, but here

Re: [FFmpeg-devel] [PATCH 1/4] x86: xvid: port SSE2 idct to yasm

2015-03-12 Thread Christophe Gisquet
-xvid-idct and dct-test builds and runs on both Win32 and Win64. -- Christophe From 86da5a1f111f9f36318daa906c3245d6b883feb3 Mon Sep 17 00:00:00 2001 From: Christophe Gisquet christophe.gisq...@gmail.com Date: Tue, 10 Mar 2015 23:11:51 + Subject: [PATCH 1/4] x86: xvid_idct: port SSE2 iDCT

Re: [FFmpeg-devel] [PATCH 4/4] x86: xvid_idct: SSE2 merged add version

2015-03-12 Thread Christophe Gisquet
2015-03-11 0:11 GMT+01:00 Christophe Gisquet christophe.gisq...@gmail.com: --- libavcodec/x86/xvididct.asm| 92 -- libavcodec/x86/xvididct_init.c | 9 + 2 files changed, 91 insertions(+), 10 deletions(-) Another refresh. From

Re: [FFmpeg-devel] [PATCH 3/4] x86: xvid_idct: merged idct_put SSE2 versions

2015-03-12 Thread Christophe Gisquet
Hi, 2015-03-12 20:14 GMT+01:00 Christophe Gisquet christophe.gisq...@gmail.com: Not sure it needed a refresh, but here is one. btw, the incorrect %warning code is actually dead, placeholder code for the following patch 4/4 of the series. -- Christophe

[FFmpeg-devel] [PATCH] x86: Makefile: fix DBG parameter evaluation

2015-03-10 Thread Christophe Gisquet
This recently caused me some issues, as in, being ignored. -- Christophe From 5c1b07147502135a9f6a04a1edcf060a1575efd3 Mon Sep 17 00:00:00 2001 From: Christophe Gisquet christophe.gisq...@gmail.com Date: Sun, 8 Mar 2015 17:54:25 +0100 Subject: [PATCH] x86: Makefile: fix DBG parameter evaluation

[FFmpeg-devel] [PATCH 4/4] x86: xvid_idct: SSE2 merged add version

2015-03-10 Thread Christophe Gisquet
--- libavcodec/x86/xvididct.asm| 92 -- libavcodec/x86/xvididct_init.c | 9 + 2 files changed, 91 insertions(+), 10 deletions(-) diff --git a/libavcodec/x86/xvididct.asm b/libavcodec/x86/xvididct.asm index 58ffb11..0220885 100644 ---

[FFmpeg-devel] [PATCH 2/4] x86: xvid_idct: port MMX IDCT to yasm

2015-03-10 Thread Christophe Gisquet
100644 --- a/libavcodec/x86/xvididct.asm +++ b/libavcodec/x86/xvididct.asm @@ -1,5 +1,9 @@ ; XVID MPEG-4 VIDEO CODEC -; - SSE2 inverse discrete cosine transform - +; +; Conversion from gcc syntax to x264asm syntax with modifications +; by Christophe Gisquet christophe.gisq...@gmail.com

[FFmpeg-devel] [PATCH 1/4] x86: xvid: port SSE2 idct to yasm

2015-03-10 Thread Christophe Gisquet
The main difference consists in renaming properly labels, and letting yasm select the gprs for skipping 1D transforms. --- libavcodec/x86/Makefile| 2 +- libavcodec/x86/xvididct.asm| 379 ++ libavcodec/x86/xvididct_init.c | 18 +-

[FFmpeg-devel] [PATCH 3/4] x86: xvid_idct: merged idct_put SSE2 versions

2015-03-10 Thread Christophe Gisquet
--- libavcodec/x86/xvididct.asm| 202 - libavcodec/x86/xvididct_init.c | 8 +- 2 files changed, 140 insertions(+), 70 deletions(-) diff --git a/libavcodec/x86/xvididct.asm b/libavcodec/x86/xvididct.asm index 4c52bf1..58ffb11 100644 ---

<    1   2   3   4   5   6   >