subject:"\[libav\-devel\] \[PATCH 15\/15\] lavr\: x86\: optimized 6\-channel flt to fltp conversion"

Re: [libav-devel] [PATCH 15/15] lavr: x86: optimized 6-channel flt to fltp conversion

2012-07-24 Thread Ronald S. Bultje

Hi, On Tue, Jul 17, 2012 at 6:16 AM, Justin Ruggles wrote: > --- > libavresample/x86/audio_convert.asm| 63 > > libavresample/x86/audio_convert_init.c |9 + > 2 files changed, 72 insertions(+), 0 deletions(-) (I'm going to assume Loren had no furt

Re: [libav-devel] [PATCH 15/15] lavr: x86: optimized 6-channel flt to fltp conversion

2012-07-18 Thread Justin Ruggles

On 07/18/2012 03:06 PM, Loren Merritt wrote: > On Wed, 18 Jul 2012, Justin Ruggles wrote: >> On 07/18/2012 05:15 AM, Loren Merritt wrote: >>> >>> Aha, a large part of the discrepancy is due to cache aliasing, when the >>> offsets between the 6 output streams are divisible by some large power of >>>

Re: [libav-devel] [PATCH 15/15] lavr: x86: optimized 6-channel flt to fltp conversion

2012-07-18 Thread Loren Merritt

On Wed, 18 Jul 2012, Justin Ruggles wrote: > On 07/18/2012 05:15 AM, Loren Merritt wrote: >> >> Aha, a large part of the discrepancy is due to cache aliasing, when the >> offsets between the 6 output streams are divisible by some large power of >> 2. This would have to be fixed in whatever piece of

Re: [libav-devel] [PATCH 15/15] lavr: x86: optimized 6-channel flt to fltp conversion

2012-07-18 Thread Justin Ruggles

On 07/18/2012 05:15 AM, Loren Merritt wrote: > On Tue, 17 Jul 2012, Loren Merritt wrote: > >> 25% faster on penryn (even though I didn't predict that by counting uops). >> 25% faster on sandybridge. >> No change on bulldozer. >> >> But even though I successfully predicted that this is an improveme

Re: [libav-devel] [PATCH 15/15] lavr: x86: optimized 6-channel flt to fltp conversion

2012-07-18 Thread Loren Merritt

On Tue, 17 Jul 2012, Loren Merritt wrote: > 25% faster on penryn (even though I didn't predict that by counting uops). > 25% faster on sandybridge. > No change on bulldozer. > > But even though I successfully predicted that this is an improvement, I don't > understand its performance. > 6x load,

[libav-devel] [PATCH 15/15] lavr: x86: optimized 6-channel flt to fltp conversion

2012-07-17 Thread Justin Ruggles

--- libavresample/x86/audio_convert.asm| 63 libavresample/x86/audio_convert_init.c |9 + 2 files changed, 72 insertions(+), 0 deletions(-) diff --git a/libavresample/x86/audio_convert.asm b/libavresample/x86/audio_convert.asm index 4899d91..cdd9824

Re: [libav-devel] [PATCH 15/15] lavr: x86: optimized 6-channel flt to fltp conversion

2012-07-17 Thread Justin Ruggles

On 07/17/2012 07:02 AM, Loren Merritt wrote: > 25% faster on penryn (even though I didn't predict that by counting uops). > 25% faster on sandybridge. > No change on bulldozer. > > But even though I successfully predicted that this is an improvement, I don't > understand its performance. > 6x loa

Re: [libav-devel] [PATCH 15/15] lavr: x86: optimized 6-channel flt to fltp conversion

2012-07-17 Thread Loren Merritt

25% faster on penryn (even though I didn't predict that by counting uops). 25% faster on sandybridge. No change on bulldozer. But even though I successfully predicted that this is an improvement, I don't understand its performance. 6x load, 12x punpckldq, 6x store, 4x scalar: Should take 12 cycle

Re: [libav-devel] [PATCH 15/15] lavr: x86: optimized 6-channel flt to fltp conversion

2012-07-16 Thread Justin Ruggles

On 07/16/2012 08:49 AM, Loren Merritt wrote: > On Sun, 15 Jul 2012, Justin Ruggles wrote: > >> +.loop: >> +mova m0, [srcq ] ; m0 = 0/0, 1/0, 2/0, 3/0 >> +mova m1, [srcq+ mmsize] ; m1 = 4/0, 5/0, 0/1, 1/1 >> +mova m2, [srcq+2*mmsize] ; m2 = 2/1, 3/1, 4/1, 5

Re: [libav-devel] [PATCH 15/15] lavr: x86: optimized 6-channel flt to fltp conversion

2012-07-16 Thread Justin Ruggles

On 07/16/2012 08:49 AM, Loren Merritt wrote: > On Sun, 15 Jul 2012, Justin Ruggles wrote: > >> +.loop: >> +mova m0, [srcq ] ; m0 = 0/0, 1/0, 2/0, 3/0 >> +mova m1, [srcq+ mmsize] ; m1 = 4/0, 5/0, 0/1, 1/1 >> +mova m2, [srcq+2*mmsize] ; m2 = 2/1, 3/1, 4/1, 5

Re: [libav-devel] [PATCH 15/15] lavr: x86: optimized 6-channel flt to fltp conversion

2012-07-16 Thread Loren Merritt

On Sun, 15 Jul 2012, Justin Ruggles wrote: > +.loop: > +mova m0, [srcq ] ; m0 = 0/0, 1/0, 2/0, 3/0 > +mova m1, [srcq+ mmsize] ; m1 = 4/0, 5/0, 0/1, 1/1 > +mova m2, [srcq+2*mmsize] ; m2 = 2/1, 3/1, 4/1, 5/1 > +mova m3, [srcq+3*mmsize] ; m3 = 0/2,

[libav-devel] [PATCH 15/15] lavr: x86: optimized 6-channel flt to fltp conversion

2012-07-14 Thread Justin Ruggles

--- libavresample/x86/audio_convert.asm| 68 libavresample/x86/audio_convert_init.c |9 2 files changed, 77 insertions(+), 0 deletions(-) diff --git a/libavresample/x86/audio_convert.asm b/libavresample/x86/audio_convert.asm index 4899d91..de95151

Re: [libav-devel] [PATCH 15/15] lavr: x86: optimized 6-channel flt to fltp conversion

Re: [libav-devel] [PATCH 15/15] lavr: x86: optimized 6-channel flt to fltp conversion

Re: [libav-devel] [PATCH 15/15] lavr: x86: optimized 6-channel flt to fltp conversion

Re: [libav-devel] [PATCH 15/15] lavr: x86: optimized 6-channel flt to fltp conversion

Re: [libav-devel] [PATCH 15/15] lavr: x86: optimized 6-channel flt to fltp conversion

[libav-devel] [PATCH 15/15] lavr: x86: optimized 6-channel flt to fltp conversion

Re: [libav-devel] [PATCH 15/15] lavr: x86: optimized 6-channel flt to fltp conversion

Re: [libav-devel] [PATCH 15/15] lavr: x86: optimized 6-channel flt to fltp conversion

Re: [libav-devel] [PATCH 15/15] lavr: x86: optimized 6-channel flt to fltp conversion

Re: [libav-devel] [PATCH 15/15] lavr: x86: optimized 6-channel flt to fltp conversion

Re: [libav-devel] [PATCH 15/15] lavr: x86: optimized 6-channel flt to fltp conversion

[libav-devel] [PATCH 15/15] lavr: x86: optimized 6-channel flt to fltp conversion

12 matches

Site Navigation

Mail list logo

Footer information