2012/12/20 Justin Ruggles justin.rugg...@gmail.com:
putting [ps_neg] in a register and switching m0 and m2 in the unpacking
would allow some 3-arg XMM AVX to be used, like so:
[...]
Tested that, no change in generated code, and indeed no change
speedwise/fate result.
Updated patch attached.
--
On 12/21/2012 12:39 PM, Christophe Gisquet wrote:
2012/12/20 Justin Ruggles justin.rugg...@gmail.com:
putting [ps_neg] in a register and switching m0 and m2 in the unpacking
would allow some 3-arg XMM AVX to be used, like so:
[...]
Tested that, no change in generated code, and indeed no
2012/12/21 Justin Ruggles justin.rugg...@gmail.com:
If you are unable to test AVX, I can go through your functions after
they're committed and test to see if using AVX helps. If you have some
set of samples which utilize specific functions that would be helpful.
It would be simpler for
2012/12/7 Christophe Gisquet christophe.gisq...@gmail.com:
2012/12/1 Christophe Gisquet christophe.gisq...@gmail.com:
2012/11/30 Christophe Gisquet christophe.gisq...@gmail.com:
4 space tabs.
Done.
Given the results for the parts where a change was investigated, is
there a need for another
Hi,
On 12/01/2012 06:17 AM, Christophe Gisquet wrote:
+cglobal sbr_qmf_post_shuffle, 2,3,3,W,z
INIT_XMM sse
+lea r2q, [zq + (64-4)*4]
+.loop:
+mova m0, [r2q]
+mova m1, [zq ]
+xorps m0, [ps_neg]
+shufps m0, m0, 0x1B
+mova m2, m0
+
2012/12/1 Christophe Gisquet christophe.gisq...@gmail.com:
2012/11/30 Christophe Gisquet christophe.gisq...@gmail.com:
4 space tabs.
Done.
Given the results for the parts where a change was investigated, is
there a need for another review?
--
Christophe
2012/12/1 Ronald S. Bultje rsbul...@gmail.com:
Try adding an ALIGN 16 just above .loop:, maybe that fixes it?
No change.
On that cpu and os, I observed such strange facts:
- complex addressing costs sometimes noticeably
- aligning jump positions rarely helps
- trying to hide latency of
2012/11/30 Christophe Gisquet christophe.gisq...@gmail.com:
4 space tabs.
Done.
--
Christophe
0003-SBR-DSP-x86-implement-SSE-qmf_post_shuffle.patch
Description: Binary data
___
libav-devel mailing list
libav-devel@libav.org
On penrynn, from 255 to 174c. Unrolling yields no gain.
---
libavcodec/x86/sbrdsp.asm| 21 +
libavcodec/x86/sbrdsp_init.c |2 ++
2 files changed, 23 insertions(+), 0 deletions(-)
diff --git a/libavcodec/x86/sbrdsp.asm b/libavcodec/x86/sbrdsp.asm
index
On Fri, 30 Nov 2012, Christophe Gisquet wrote:
+cglobal sbr_qmf_post_shuffle, 2,3,3,W,z
+ lea r2q, [zq + (64-4)*4]
+.loop:
+ mova m0, [r2q]
+ mova m1, [zq ]
+ xorps m0, [ps_neg]
+ shufps m0, m0, 0x1B
+ mova m2, m0
+ unpcklps m0, m1
+ unpckhps
Hello,
2012/11/30 Loren Merritt lor...@u.washington.edu:
If you increment an index into W and z rather than the pointers
themselves, then you can eliminate an add and a cmp.
I add already tested that, and redid it:
cglobal sbr_qmf_post_shuffle, 2,4,3,W,z
mov r3q, 32*4
lea r2q,
Christophe Gisquet christophe.gisq...@gmail.com writes:
4 space tabs.
OK, I was a bit puzzled and looking for trailing whitespaces/... You
mean style change then.
A bit cumbersome to redo all patches because of that.
:%s/^ //
--
Måns Rullgård
m...@mansr.com
Hi,
On Fri, Nov 30, 2012 at 1:14 PM, Christophe Gisquet
christophe.gisq...@gmail.com wrote:
Hello,
2012/11/30 Loren Merritt lor...@u.washington.edu:
If you increment an index into W and z rather than the pointers
themselves, then you can eliminate an add and a cmp.
I add already tested
13 matches
Mail list logo