subject:"\[Numpy\-discussion\] Optimizing reduction loops \(sum\(\), prod\(\), et al.\)"

Re: [Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

2009-07-13 Thread Pauli Virtanen

Wed, 08 Jul 2009 22:16:22 +, Pauli Virtanen kirjoitti: [clip] On an older CPU (slower, smaller cache), the situation is slightly different: http://www.iki.fi/pav/tmp/athlon.png http://www.iki.fi/pav/tmp/athlon.txt On average, it's still an improvement in many cases. However,

Re: [Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

2009-07-13 Thread Citi, Luca

Hi Pauli, in my PC I have tried this and some of the regressions disappear, maybe you can give it a try. At the present state is compiler- and architecture-dependent, therefore not the best choice. But it may be worth trying. Best, Luca /* My additions are unindented */ /*

Re: [Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

2009-07-09 Thread Matthieu Brucher

2009/7/9 Pauli Virtanen pav...@iki.fi: On 2009-07-08, Stéfan van der Walt ste...@sun.ac.za wrote: I know very little about cache optimality, so excuse the triviality of this question: Is it possible to design this loop optimally (taking into account certain build-time measurable parameters),

Re: [Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

2009-07-09 Thread David Cournapeau

Matthieu Brucher wrote: Unfortunately, this is not possible. We've been playing with blocking loops for a long time in finite difference schemes, and it is always compiler dependent You mean CPU dependent, right ? I can't see how a reasonable optimizing compiler could make a big difference on

Re: [Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

2009-07-09 Thread David Warde-Farley

On 8-Jul-09, at 6:16 PM, Pauli Virtanen wrote: Just to tickle some interest, a pathological case before optimization: In [1]: import numpy as np In [2]: x = np.zeros((8, 256)) In [3]: %timeit x.sum(axis=0) 10 loops, best of 3: 850 ms per loop After optimization: In

Re: [Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

2009-07-09 Thread Matthieu Brucher

2009/7/9 David Cournapeau da...@ar.media.kyoto-u.ac.jp: Matthieu Brucher wrote: Unfortunately, this is not possible. We've been playing with blocking loops for a long time in finite difference schemes, and it is always compiler dependent You mean CPU dependent, right ? I can't see how a

Re: [Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

2009-07-09 Thread Pauli Virtanen

Thu, 09 Jul 2009 09:54:26 +0200, Matthieu Brucher kirjoitti: 2009/7/9 Pauli Virtanen pav...@iki.fi: [clip] I'm still kind of hoping that it's possible to make some minimal assumptions about CPU caches in general, and have a rule that decides a code path that is good enough, if not optimal.

[Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

2009-07-08 Thread Pauli Virtanen

Hi, Ticket #1143 points out that Numpy's reduction operations are not always cache friendly. I worked a bit on tuning them. Just to tickle some interest, a pathological case before optimization: In [1]: import numpy as np In [2]: x = np.zeros((8, 256)) In [3]: %timeit

Re: [Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

2009-07-08 Thread Charles R Harris

On Wed, Jul 8, 2009 at 4:16 PM, Pauli Virtanen pav...@iki.fipav%2...@iki.fi wrote: Hi, Ticket #1143 points out that Numpy's reduction operations are not always cache friendly. I worked a bit on tuning them. Just to tickle some interest, a pathological case before optimization: In

Re: [Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

2009-07-08 Thread Stéfan van der Walt

Hi Pauli 2009/7/9 Pauli Virtanen pav...@iki.fi: Unfortunately, improving the performance using the above scheme comes at the cost of some slightly murky heuristics. I didn't manage to come up with an optimal decision rule, so they are partly empirical. There is one parameter tuning the

Re: [Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

2009-07-08 Thread Pauli Virtanen

On 2009-07-08, Charles R Harris charlesr.har...@gmail.com wrote: [clip] How do the benchmarks compare with just making contiguous copies? Which is blocking of sort, I suppose. I think that's slower than just walking over the discontiguous array: 1) The new code: (on the Athlon

Re: [Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

2009-07-08 Thread Pauli Virtanen

On 2009-07-08, Stéfan van der Walt ste...@sun.ac.za wrote: I know very little about cache optimality, so excuse the triviality of this question: Is it possible to design this loop optimally (taking into account certain build-time measurable parameters), or is it the kind of thing that can only

Re: [Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

2009-07-08 Thread Robert Kern

On Wed, Jul 8, 2009 at 18:02, Pauli Virtanenpav...@iki.fi wrote: On 2009-07-08, Stéfan van der Walt ste...@sun.ac.za wrote: I know very little about cache optimality, so excuse the triviality of this question: Is it possible to design this loop optimally (taking into account certain build-time

Re: [Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

2009-07-08 Thread David Cournapeau

On Thu, Jul 9, 2009 at 8:02 AM, Pauli Virtanenpav...@iki.fi wrote: I don't think we want to go the ATNumPy route, or even have tunable parameters chosen at build or compile time. Detecting things like cache size at compile time should not be too difficult, at least for common platforms. Even

Re: [Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

2009-07-08 Thread Charles R Harris

On Wed, Jul 8, 2009 at 5:02 PM, Pauli Virtanen pav...@iki.fipav%2...@iki.fi wrote: On 2009-07-08, Stéfan van der Walt ste...@sun.ac.za wrote: I know very little about cache optimality, so excuse the triviality of this question: Is it possible to design this loop optimally (taking into

Re: [Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

Re: [Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

Re: [Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

Re: [Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

Re: [Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

Re: [Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

Re: [Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

[Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

Re: [Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

Re: [Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

Re: [Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

Re: [Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

Re: [Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

Re: [Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

Re: [Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

15 matches

Site Navigation

Mail list logo

Footer information