Re: [libav-devel] [PATCH 15/15] lavr: x86: optimized 6-channel flt to fltp conversion

2012-07-24 Thread Ronald S. Bultje
Hi, On Tue, Jul 17, 2012 at 6:16 AM, Justin Ruggles justin.rugg...@gmail.com wrote: --- libavresample/x86/audio_convert.asm| 63 libavresample/x86/audio_convert_init.c |9 + 2 files changed, 72 insertions(+), 0 deletions(-) (I'm going to

Re: [libav-devel] [PATCH 15/15] lavr: x86: optimized 6-channel flt to fltp conversion

2012-07-18 Thread Loren Merritt
On Tue, 17 Jul 2012, Loren Merritt wrote: 25% faster on penryn (even though I didn't predict that by counting uops). 25% faster on sandybridge. No change on bulldozer. But even though I successfully predicted that this is an improvement, I don't understand its performance. 6x load, 12x

Re: [libav-devel] [PATCH 15/15] lavr: x86: optimized 6-channel flt to fltp conversion

2012-07-18 Thread Justin Ruggles
On 07/18/2012 05:15 AM, Loren Merritt wrote: On Tue, 17 Jul 2012, Loren Merritt wrote: 25% faster on penryn (even though I didn't predict that by counting uops). 25% faster on sandybridge. No change on bulldozer. But even though I successfully predicted that this is an improvement, I

Re: [libav-devel] [PATCH 15/15] lavr: x86: optimized 6-channel flt to fltp conversion

2012-07-18 Thread Loren Merritt
On Wed, 18 Jul 2012, Justin Ruggles wrote: On 07/18/2012 05:15 AM, Loren Merritt wrote: Aha, a large part of the discrepancy is due to cache aliasing, when the offsets between the 6 output streams are divisible by some large power of 2. This would have to be fixed in whatever piece of code

Re: [libav-devel] [PATCH 15/15] lavr: x86: optimized 6-channel flt to fltp conversion

2012-07-18 Thread Justin Ruggles
On 07/18/2012 03:06 PM, Loren Merritt wrote: On Wed, 18 Jul 2012, Justin Ruggles wrote: On 07/18/2012 05:15 AM, Loren Merritt wrote: Aha, a large part of the discrepancy is due to cache aliasing, when the offsets between the 6 output streams are divisible by some large power of 2. This would

Re: [libav-devel] [PATCH 15/15] lavr: x86: optimized 6-channel flt to fltp conversion

2012-07-17 Thread Loren Merritt
25% faster on penryn (even though I didn't predict that by counting uops). 25% faster on sandybridge. No change on bulldozer. But even though I successfully predicted that this is an improvement, I don't understand its performance. 6x load, 12x punpckldq, 6x store, 4x scalar: Should take 12

Re: [libav-devel] [PATCH 15/15] lavr: x86: optimized 6-channel flt to fltp conversion

2012-07-17 Thread Justin Ruggles
On 07/17/2012 07:02 AM, Loren Merritt wrote: 25% faster on penryn (even though I didn't predict that by counting uops). 25% faster on sandybridge. No change on bulldozer. But even though I successfully predicted that this is an improvement, I don't understand its performance. 6x load, 12x

[libav-devel] [PATCH 15/15] lavr: x86: optimized 6-channel flt to fltp conversion

2012-07-17 Thread Justin Ruggles
--- libavresample/x86/audio_convert.asm| 63 libavresample/x86/audio_convert_init.c |9 + 2 files changed, 72 insertions(+), 0 deletions(-) diff --git a/libavresample/x86/audio_convert.asm b/libavresample/x86/audio_convert.asm index

Re: [libav-devel] [PATCH 15/15] lavr: x86: optimized 6-channel flt to fltp conversion

2012-07-16 Thread Loren Merritt
On Sun, 15 Jul 2012, Justin Ruggles wrote: +.loop: +mova m0, [srcq ] ; m0 = 0/0, 1/0, 2/0, 3/0 +mova m1, [srcq+ mmsize] ; m1 = 4/0, 5/0, 0/1, 1/1 +mova m2, [srcq+2*mmsize] ; m2 = 2/1, 3/1, 4/1, 5/1 +mova m3, [srcq+3*mmsize] ; m3 = 0/2,

Re: [libav-devel] [PATCH 15/15] lavr: x86: optimized 6-channel flt to fltp conversion

2012-07-16 Thread Justin Ruggles
On 07/16/2012 08:49 AM, Loren Merritt wrote: On Sun, 15 Jul 2012, Justin Ruggles wrote: +.loop: +mova m0, [srcq ] ; m0 = 0/0, 1/0, 2/0, 3/0 +mova m1, [srcq+ mmsize] ; m1 = 4/0, 5/0, 0/1, 1/1 +mova m2, [srcq+2*mmsize] ; m2 = 2/1, 3/1, 4/1, 5/1 +

Re: [libav-devel] [PATCH 15/15] lavr: x86: optimized 6-channel flt to fltp conversion

2012-07-16 Thread Justin Ruggles
On 07/16/2012 08:49 AM, Loren Merritt wrote: On Sun, 15 Jul 2012, Justin Ruggles wrote: +.loop: +mova m0, [srcq ] ; m0 = 0/0, 1/0, 2/0, 3/0 +mova m1, [srcq+ mmsize] ; m1 = 4/0, 5/0, 0/1, 1/1 +mova m2, [srcq+2*mmsize] ; m2 = 2/1, 3/1, 4/1, 5/1 +

[libav-devel] [PATCH 15/15] lavr: x86: optimized 6-channel flt to fltp conversion

2012-07-14 Thread Justin Ruggles
--- libavresample/x86/audio_convert.asm| 68 libavresample/x86/audio_convert_init.c |9 2 files changed, 77 insertions(+), 0 deletions(-) diff --git a/libavresample/x86/audio_convert.asm b/libavresample/x86/audio_convert.asm index 4899d91..de95151