Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-12-21 Thread Christophe Gisquet
2012/12/21 Justin Ruggles : > If you are unable to test AVX, I can go through your functions after > they're committed and test to see if using AVX helps. If you have some > set of samples which utilize specific functions that would be helpful. It would be simpler for everyone if I could, and it's

Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-12-21 Thread Justin Ruggles
On 12/21/2012 12:39 PM, Christophe Gisquet wrote: > 2012/12/20 Justin Ruggles : >> putting [ps_neg] in a register and switching m0 and m2 in the unpacking >> would allow some 3-arg XMM AVX to be used, like so: > [...] > Tested that, no change in generated code, and indeed no change > speedwise/fate

Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-12-21 Thread Christophe Gisquet
2012/12/20 Justin Ruggles : > putting [ps_neg] in a register and switching m0 and m2 in the unpacking > would allow some 3-arg XMM AVX to be used, like so: [...] Tested that, no change in generated code, and indeed no change speedwise/fate result. Updated patch attached. -- Christophe 0002-SBR-

Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-12-20 Thread Justin Ruggles
Hi, On 12/01/2012 06:17 AM, Christophe Gisquet wrote: > +cglobal sbr_qmf_post_shuffle, 2,3,3,W,z INIT_XMM sse > +lea r2q, [zq + (64-4)*4] > +.loop: > +mova m0, [r2q] > +mova m1, [zq ] > +xorps m0, [ps_neg] > +shufps m0, m0, 0x1B > +mova m2

Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-12-20 Thread Christophe Gisquet
2012/12/7 Christophe Gisquet : > 2012/12/1 Christophe Gisquet : >> 2012/11/30 Christophe Gisquet : 4 space tabs. >> >> Done. > > Given the results for the parts where a change was investigated, is > there a need for another review? Ping2 -- Christophe ___

Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-12-06 Thread Christophe Gisquet
2012/12/1 Christophe Gisquet : > 2012/11/30 Christophe Gisquet : >>> 4 space tabs. > > Done. Given the results for the parts where a change was investigated, is there a need for another review? -- Christophe ___ libav-devel mailing list libav-devel@lib

Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-12-01 Thread Christophe Gisquet
2012/11/30 Christophe Gisquet : >> 4 space tabs. Done. -- Christophe 0003-SBR-DSP-x86-implement-SSE-qmf_post_shuffle.patch Description: Binary data ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-deve

Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-12-01 Thread Christophe Gisquet
2012/12/1 Ronald S. Bultje : > Try adding an "ALIGN 16" just above ".loop:", maybe that fixes it? No change. On that cpu and os, I observed such strange facts: - complex addressing costs sometimes noticeably - aligning jump positions rarely helps - trying to hide latency of operations often doesn

Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-11-30 Thread Ronald S. Bultje
Hi, On Fri, Nov 30, 2012 at 1:14 PM, Christophe Gisquet wrote: > Hello, > > 2012/11/30 Loren Merritt : >> If you increment an index into W and z rather than the pointers >> themselves, then you can eliminate an add and a cmp. > > I add already tested that, and redid it: > cglobal sbr_qmf_post_shu

Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-11-30 Thread Måns Rullgård
Christophe Gisquet writes: >> 4 space tabs. > > OK, I was a bit puzzled and looking for trailing whitespaces/... You > mean style change then. > A bit cumbersome to redo all patches because of that. :%s/^ // -- Måns Rullgård m...@mansr.com ___ l

Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-11-30 Thread Christophe Gisquet
Hello, 2012/11/30 Loren Merritt : > If you increment an index into W and z rather than the pointers > themselves, then you can eliminate an add and a cmp. I add already tested that, and redid it: cglobal sbr_qmf_post_shuffle, 2,4,3,W,z mov r3q, 32*4 lea r2q, [zq + (64-4)*4] add

Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-11-30 Thread Loren Merritt
On Fri, 30 Nov 2012, Christophe Gisquet wrote: > +cglobal sbr_qmf_post_shuffle, 2,3,3,W,z > + lea r2q, [zq + (64-4)*4] > +.loop: > + mova m0, [r2q] > + mova m1, [zq ] > + xorps m0, [ps_neg] > + shufps m0, m0, 0x1B > + mova m2, m0 > + unpcklps m0, m1 > + u

[libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-11-30 Thread Christophe Gisquet
On penrynn, from 255 to 174c. Unrolling yields no gain. --- libavcodec/x86/sbrdsp.asm| 21 + libavcodec/x86/sbrdsp_init.c |2 ++ 2 files changed, 23 insertions(+), 0 deletions(-) diff --git a/libavcodec/x86/sbrdsp.asm b/libavcodec/x86/sbrdsp.asm index 11a6faf..2b9010