2012/12/21 Justin Ruggles :
> If you are unable to test AVX, I can go through your functions after
> they're committed and test to see if using AVX helps. If you have some
> set of samples which utilize specific functions that would be helpful.
It would be simpler for everyone if I could, and it's
On 12/21/2012 12:39 PM, Christophe Gisquet wrote:
> 2012/12/20 Justin Ruggles :
>> putting [ps_neg] in a register and switching m0 and m2 in the unpacking
>> would allow some 3-arg XMM AVX to be used, like so:
> [...]
> Tested that, no change in generated code, and indeed no change
> speedwise/fate
2012/12/20 Justin Ruggles :
> putting [ps_neg] in a register and switching m0 and m2 in the unpacking
> would allow some 3-arg XMM AVX to be used, like so:
[...]
Tested that, no change in generated code, and indeed no change
speedwise/fate result.
Updated patch attached.
--
Christophe
0002-SBR-
Hi,
On 12/01/2012 06:17 AM, Christophe Gisquet wrote:
> +cglobal sbr_qmf_post_shuffle, 2,3,3,W,z
INIT_XMM sse
> +lea r2q, [zq + (64-4)*4]
> +.loop:
> +mova m0, [r2q]
> +mova m1, [zq ]
> +xorps m0, [ps_neg]
> +shufps m0, m0, 0x1B
> +mova m2
2012/12/7 Christophe Gisquet :
> 2012/12/1 Christophe Gisquet :
>> 2012/11/30 Christophe Gisquet :
4 space tabs.
>>
>> Done.
>
> Given the results for the parts where a change was investigated, is
> there a need for another review?
Ping2
--
Christophe
___
2012/12/1 Christophe Gisquet :
> 2012/11/30 Christophe Gisquet :
>>> 4 space tabs.
>
> Done.
Given the results for the parts where a change was investigated, is
there a need for another review?
--
Christophe
___
libav-devel mailing list
libav-devel@lib
2012/11/30 Christophe Gisquet :
>> 4 space tabs.
Done.
--
Christophe
0003-SBR-DSP-x86-implement-SSE-qmf_post_shuffle.patch
Description: Binary data
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-deve
2012/12/1 Ronald S. Bultje :
> Try adding an "ALIGN 16" just above ".loop:", maybe that fixes it?
No change.
On that cpu and os, I observed such strange facts:
- complex addressing costs sometimes noticeably
- aligning jump positions rarely helps
- trying to hide latency of operations often doesn
Hi,
On Fri, Nov 30, 2012 at 1:14 PM, Christophe Gisquet
wrote:
> Hello,
>
> 2012/11/30 Loren Merritt :
>> If you increment an index into W and z rather than the pointers
>> themselves, then you can eliminate an add and a cmp.
>
> I add already tested that, and redid it:
> cglobal sbr_qmf_post_shu
Christophe Gisquet writes:
>> 4 space tabs.
>
> OK, I was a bit puzzled and looking for trailing whitespaces/... You
> mean style change then.
> A bit cumbersome to redo all patches because of that.
:%s/^ //
--
Måns Rullgård
m...@mansr.com
___
l
Hello,
2012/11/30 Loren Merritt :
> If you increment an index into W and z rather than the pointers
> themselves, then you can eliminate an add and a cmp.
I add already tested that, and redid it:
cglobal sbr_qmf_post_shuffle, 2,4,3,W,z
mov r3q, 32*4
lea r2q, [zq + (64-4)*4]
add
On Fri, 30 Nov 2012, Christophe Gisquet wrote:
> +cglobal sbr_qmf_post_shuffle, 2,3,3,W,z
> + lea r2q, [zq + (64-4)*4]
> +.loop:
> + mova m0, [r2q]
> + mova m1, [zq ]
> + xorps m0, [ps_neg]
> + shufps m0, m0, 0x1B
> + mova m2, m0
> + unpcklps m0, m1
> + u
On penrynn, from 255 to 174c. Unrolling yields no gain.
---
libavcodec/x86/sbrdsp.asm| 21 +
libavcodec/x86/sbrdsp_init.c |2 ++
2 files changed, 23 insertions(+), 0 deletions(-)
diff --git a/libavcodec/x86/sbrdsp.asm b/libavcodec/x86/sbrdsp.asm
index 11a6faf..2b9010
13 matches
Mail list logo