Wed, 08 Jul 2009 22:16:22 +, Pauli Virtanen kirjoitti:
[clip]
On an older CPU (slower, smaller cache), the situation is slightly
different:
http://www.iki.fi/pav/tmp/athlon.png
http://www.iki.fi/pav/tmp/athlon.txt
On average, it's still an improvement in many cases. However,
Hi Pauli,
in my PC I have tried this and some of the regressions disappear,
maybe you can give it a try.
At the present state is compiler- and architecture-dependent,
therefore not the best choice. But it may be worth trying.
Best,
Luca
/* My additions are unindented */
/*
2009/7/9 Pauli Virtanen pav...@iki.fi:
On 2009-07-08, Stéfan van der Walt ste...@sun.ac.za wrote:
I know very little about cache optimality, so excuse the triviality of
this question: Is it possible to design this loop optimally (taking
into account certain build-time measurable parameters),
Matthieu Brucher wrote:
Unfortunately, this is not possible. We've been playing with blocking
loops for a long time in finite difference schemes, and it is always
compiler dependent
You mean CPU dependent, right ? I can't see how a reasonable optimizing
compiler could make a big difference on
On 8-Jul-09, at 6:16 PM, Pauli Virtanen wrote:
Just to tickle some interest, a pathological case before
optimization:
In [1]: import numpy as np
In [2]: x = np.zeros((8, 256))
In [3]: %timeit x.sum(axis=0)
10 loops, best of 3: 850 ms per loop
After optimization:
In
2009/7/9 David Cournapeau da...@ar.media.kyoto-u.ac.jp:
Matthieu Brucher wrote:
Unfortunately, this is not possible. We've been playing with blocking
loops for a long time in finite difference schemes, and it is always
compiler dependent
You mean CPU dependent, right ? I can't see how a
Thu, 09 Jul 2009 09:54:26 +0200, Matthieu Brucher kirjoitti:
2009/7/9 Pauli Virtanen pav...@iki.fi:
[clip]
I'm still kind of hoping that it's possible to make some minimal
assumptions about CPU caches in general, and have a rule that decides a
code path that is good enough, if not optimal.
Hi,
Ticket #1143 points out that Numpy's reduction operations are not
always cache friendly. I worked a bit on tuning them.
Just to tickle some interest, a pathological case before optimization:
In [1]: import numpy as np
In [2]: x = np.zeros((8, 256))
In [3]: %timeit
On Wed, Jul 8, 2009 at 4:16 PM, Pauli Virtanen pav...@iki.fipav%2...@iki.fi
wrote:
Hi,
Ticket #1143 points out that Numpy's reduction operations are not
always cache friendly. I worked a bit on tuning them.
Just to tickle some interest, a pathological case before optimization:
In
Hi Pauli
2009/7/9 Pauli Virtanen pav...@iki.fi:
Unfortunately, improving the performance using the above scheme
comes at the cost of some slightly murky heuristics. I didn't
manage to come up with an optimal decision rule, so they are
partly empirical. There is one parameter tuning the
On 2009-07-08, Charles R Harris charlesr.har...@gmail.com wrote:
[clip]
How do the benchmarks compare with just making contiguous copies? Which is
blocking of sort, I suppose.
I think that's slower than just walking over the discontiguous
array:
1) The new code: (on the Athlon
On 2009-07-08, Stéfan van der Walt ste...@sun.ac.za wrote:
I know very little about cache optimality, so excuse the triviality of
this question: Is it possible to design this loop optimally (taking
into account certain build-time measurable parameters), or is it the
kind of thing that can only
On Wed, Jul 8, 2009 at 18:02, Pauli Virtanenpav...@iki.fi wrote:
On 2009-07-08, Stéfan van der Walt ste...@sun.ac.za wrote:
I know very little about cache optimality, so excuse the triviality of
this question: Is it possible to design this loop optimally (taking
into account certain build-time
On Thu, Jul 9, 2009 at 8:02 AM, Pauli Virtanenpav...@iki.fi wrote:
I don't think we want to go the ATNumPy route, or even have
tunable parameters chosen at build or compile time.
Detecting things like cache size at compile time should not be too
difficult, at least for common platforms. Even
On Wed, Jul 8, 2009 at 5:02 PM, Pauli Virtanen pav...@iki.fipav%2...@iki.fi
wrote:
On 2009-07-08, Stéfan van der Walt ste...@sun.ac.za wrote:
I know very little about cache optimality, so excuse the triviality of
this question: Is it possible to design this loop optimally (taking
into
15 matches
Mail list logo