Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-12-21 Thread Christophe Gisquet
2012/12/20 Justin Ruggles justin.rugg...@gmail.com: putting [ps_neg] in a register and switching m0 and m2 in the unpacking would allow some 3-arg XMM AVX to be used, like so: [...] Tested that, no change in generated code, and indeed no change speedwise/fate result. Updated patch attached. --

Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-12-21 Thread Justin Ruggles
On 12/21/2012 12:39 PM, Christophe Gisquet wrote: 2012/12/20 Justin Ruggles justin.rugg...@gmail.com: putting [ps_neg] in a register and switching m0 and m2 in the unpacking would allow some 3-arg XMM AVX to be used, like so: [...] Tested that, no change in generated code, and indeed no

Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-12-21 Thread Christophe Gisquet
2012/12/21 Justin Ruggles justin.rugg...@gmail.com: If you are unable to test AVX, I can go through your functions after they're committed and test to see if using AVX helps. If you have some set of samples which utilize specific functions that would be helpful. It would be simpler for

Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-12-20 Thread Christophe Gisquet
2012/12/7 Christophe Gisquet christophe.gisq...@gmail.com: 2012/12/1 Christophe Gisquet christophe.gisq...@gmail.com: 2012/11/30 Christophe Gisquet christophe.gisq...@gmail.com: 4 space tabs. Done. Given the results for the parts where a change was investigated, is there a need for another

Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-12-20 Thread Justin Ruggles
Hi, On 12/01/2012 06:17 AM, Christophe Gisquet wrote: +cglobal sbr_qmf_post_shuffle, 2,3,3,W,z INIT_XMM sse +lea r2q, [zq + (64-4)*4] +.loop: +mova m0, [r2q] +mova m1, [zq ] +xorps m0, [ps_neg] +shufps m0, m0, 0x1B +mova m2, m0 +

Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-12-06 Thread Christophe Gisquet
2012/12/1 Christophe Gisquet christophe.gisq...@gmail.com: 2012/11/30 Christophe Gisquet christophe.gisq...@gmail.com: 4 space tabs. Done. Given the results for the parts where a change was investigated, is there a need for another review? -- Christophe

Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-12-01 Thread Christophe Gisquet
2012/12/1 Ronald S. Bultje rsbul...@gmail.com: Try adding an ALIGN 16 just above .loop:, maybe that fixes it? No change. On that cpu and os, I observed such strange facts: - complex addressing costs sometimes noticeably - aligning jump positions rarely helps - trying to hide latency of

Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-12-01 Thread Christophe Gisquet
2012/11/30 Christophe Gisquet christophe.gisq...@gmail.com: 4 space tabs. Done. -- Christophe 0003-SBR-DSP-x86-implement-SSE-qmf_post_shuffle.patch Description: Binary data ___ libav-devel mailing list libav-devel@libav.org

[libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-11-30 Thread Christophe Gisquet
On penrynn, from 255 to 174c. Unrolling yields no gain. --- libavcodec/x86/sbrdsp.asm| 21 + libavcodec/x86/sbrdsp_init.c |2 ++ 2 files changed, 23 insertions(+), 0 deletions(-) diff --git a/libavcodec/x86/sbrdsp.asm b/libavcodec/x86/sbrdsp.asm index

Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-11-30 Thread Loren Merritt
On Fri, 30 Nov 2012, Christophe Gisquet wrote: +cglobal sbr_qmf_post_shuffle, 2,3,3,W,z + lea r2q, [zq + (64-4)*4] +.loop: + mova m0, [r2q] + mova m1, [zq ] + xorps m0, [ps_neg] + shufps m0, m0, 0x1B + mova m2, m0 + unpcklps m0, m1 + unpckhps

Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-11-30 Thread Christophe Gisquet
Hello, 2012/11/30 Loren Merritt lor...@u.washington.edu: If you increment an index into W and z rather than the pointers themselves, then you can eliminate an add and a cmp. I add already tested that, and redid it: cglobal sbr_qmf_post_shuffle, 2,4,3,W,z mov r3q, 32*4 lea r2q,

Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-11-30 Thread Måns Rullgård
Christophe Gisquet christophe.gisq...@gmail.com writes: 4 space tabs. OK, I was a bit puzzled and looking for trailing whitespaces/... You mean style change then. A bit cumbersome to redo all patches because of that. :%s/^ // -- Måns Rullgård m...@mansr.com

Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-11-30 Thread Ronald S. Bultje
Hi, On Fri, Nov 30, 2012 at 1:14 PM, Christophe Gisquet christophe.gisq...@gmail.com wrote: Hello, 2012/11/30 Loren Merritt lor...@u.washington.edu: If you increment an index into W and z rather than the pointers themselves, then you can eliminate an add and a cmp. I add already tested