Hi,
On Tue, Jul 17, 2012 at 6:16 AM, Justin Ruggles
justin.rugg...@gmail.com wrote:
---
libavresample/x86/audio_convert.asm| 63
libavresample/x86/audio_convert_init.c |9 +
2 files changed, 72 insertions(+), 0 deletions(-)
(I'm going to
On Tue, 17 Jul 2012, Loren Merritt wrote:
25% faster on penryn (even though I didn't predict that by counting uops).
25% faster on sandybridge.
No change on bulldozer.
But even though I successfully predicted that this is an improvement, I don't
understand its performance.
6x load, 12x
On 07/18/2012 05:15 AM, Loren Merritt wrote:
On Tue, 17 Jul 2012, Loren Merritt wrote:
25% faster on penryn (even though I didn't predict that by counting uops).
25% faster on sandybridge.
No change on bulldozer.
But even though I successfully predicted that this is an improvement, I
On Wed, 18 Jul 2012, Justin Ruggles wrote:
On 07/18/2012 05:15 AM, Loren Merritt wrote:
Aha, a large part of the discrepancy is due to cache aliasing, when the
offsets between the 6 output streams are divisible by some large power of
2. This would have to be fixed in whatever piece of code
On 07/18/2012 03:06 PM, Loren Merritt wrote:
On Wed, 18 Jul 2012, Justin Ruggles wrote:
On 07/18/2012 05:15 AM, Loren Merritt wrote:
Aha, a large part of the discrepancy is due to cache aliasing, when the
offsets between the 6 output streams are divisible by some large power of
2. This would
25% faster on penryn (even though I didn't predict that by counting uops).
25% faster on sandybridge.
No change on bulldozer.
But even though I successfully predicted that this is an improvement, I don't
understand its performance.
6x load, 12x punpckldq, 6x store, 4x scalar:
Should take 12
On 07/17/2012 07:02 AM, Loren Merritt wrote:
25% faster on penryn (even though I didn't predict that by counting uops).
25% faster on sandybridge.
No change on bulldozer.
But even though I successfully predicted that this is an improvement, I don't
understand its performance.
6x load, 12x
---
libavresample/x86/audio_convert.asm| 63
libavresample/x86/audio_convert_init.c |9 +
2 files changed, 72 insertions(+), 0 deletions(-)
diff --git a/libavresample/x86/audio_convert.asm
b/libavresample/x86/audio_convert.asm
index
On Sun, 15 Jul 2012, Justin Ruggles wrote:
+.loop:
+mova m0, [srcq ] ; m0 = 0/0, 1/0, 2/0, 3/0
+mova m1, [srcq+ mmsize] ; m1 = 4/0, 5/0, 0/1, 1/1
+mova m2, [srcq+2*mmsize] ; m2 = 2/1, 3/1, 4/1, 5/1
+mova m3, [srcq+3*mmsize] ; m3 = 0/2,
On 07/16/2012 08:49 AM, Loren Merritt wrote:
On Sun, 15 Jul 2012, Justin Ruggles wrote:
+.loop:
+mova m0, [srcq ] ; m0 = 0/0, 1/0, 2/0, 3/0
+mova m1, [srcq+ mmsize] ; m1 = 4/0, 5/0, 0/1, 1/1
+mova m2, [srcq+2*mmsize] ; m2 = 2/1, 3/1, 4/1, 5/1
+
On 07/16/2012 08:49 AM, Loren Merritt wrote:
On Sun, 15 Jul 2012, Justin Ruggles wrote:
+.loop:
+mova m0, [srcq ] ; m0 = 0/0, 1/0, 2/0, 3/0
+mova m1, [srcq+ mmsize] ; m1 = 4/0, 5/0, 0/1, 1/1
+mova m2, [srcq+2*mmsize] ; m2 = 2/1, 3/1, 4/1, 5/1
+
---
libavresample/x86/audio_convert.asm| 68
libavresample/x86/audio_convert_init.c |9
2 files changed, 77 insertions(+), 0 deletions(-)
diff --git a/libavresample/x86/audio_convert.asm
b/libavresample/x86/audio_convert.asm
index 4899d91..de95151
12 matches
Mail list logo