Re: [Numpy-discussion] Scipy dot

2012-11-12 Thread Nicolas SCHEFFER
I've pushed my code to a branch here https://github.com/leschef/numpy/tree/faster_dot with the commit https://github.com/leschef/numpy/commit/ea037770e03f23aca1a06274a1a8e8bf0e0e2ee4 Let me know if that's enough to create a pull request. Thanks, -nicolas On Sat, Nov 10, 2012 at 4:39 AM, George

Re: [Numpy-discussion] Scipy dot

2012-11-12 Thread Nathaniel Smith
On Mon, Nov 12, 2012 at 9:08 PM, Nicolas SCHEFFER scheffer.nico...@gmail.com wrote: I've pushed my code to a branch here https://github.com/leschef/numpy/tree/faster_dot with the commit https://github.com/leschef/numpy/commit/ea037770e03f23aca1a06274a1a8e8bf0e0e2ee4 Let me know if that's

Re: [Numpy-discussion] Scipy dot

2012-11-12 Thread Nicolas SCHEFFER
Yep exactly. I just want to make sure that we talked enough on the principle first (ie. goals and technical approach), and that indeed the code is good enough to look at. I get it from your answer that it is, so I went ahead https://github.com/numpy/numpy/pull/2730 Thanks -nicolas On Mon, Nov

Re: [Numpy-discussion] Scipy dot

2012-11-10 Thread George Nurser
Note also that OpenBlas claims performance as good as MKL with Sandy Bridge processors. https://github.com/xianyi/OpenBLAS/wiki/faq#wiki-sandybridge_perf George Nurser. On 10 November 2012 00:38, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.nowrote: On 11/09/2012 11:57 PM, Matthieu Brucher

Re: [Numpy-discussion] Scipy dot

2012-11-09 Thread Nathaniel Smith
On Fri, Nov 9, 2012 at 6:18 AM, Nicolas SCHEFFER scheffer.nico...@gmail.com wrote: Fred, Thanks for the advice. The code will only affect the part in _dotblas.c where gemm is called. There's tons of check before that make sure both matrices are of ndim 2. We should check though if we can do

Re: [Numpy-discussion] Scipy dot

2012-11-09 Thread Gael Varoquaux
On Thu, Nov 08, 2012 at 11:29:19AM -0600, Anthony Scopatz wrote: Indeed, there is no reason not to make this available in NumPy. +1, I agree, this should be a fix in numpy, not scipy. I agree. My point was a bit of a side issue: given a user's computer, I have no garantee that numpy will

Re: [Numpy-discussion] Scipy dot

2012-11-09 Thread George Nurser
Hi, It's really good to see this work being done. It would be great if this could somehow be put also into the np.einsum function, which currently doesn't even use blas, and is consequently substantially slower than current np.dot (timings on

Re: [Numpy-discussion] Scipy dot

2012-11-09 Thread Frédéric Bastien
On Fri, Nov 9, 2012 at 3:32 AM, Nathaniel Smith n...@pobox.com wrote: On Fri, Nov 9, 2012 at 6:18 AM, Nicolas SCHEFFER scheffer.nico...@gmail.com wrote: Fred, Thanks for the advice. The code will only affect the part in _dotblas.c where gemm is called. There's tons of check before

Re: [Numpy-discussion] Scipy dot

2012-11-09 Thread Nathaniel Smith
On 9 Nov 2012 08:48, Gael Varoquaux gael.varoqu...@normalesup.org wrote: On Thu, Nov 08, 2012 at 11:29:19AM -0600, Anthony Scopatz wrote: Indeed, there is no reason not to make this available in NumPy. +1, I agree, this should be a fix in numpy, not scipy. I agree. My point was a

Re: [Numpy-discussion] Scipy dot

2012-11-09 Thread Nathaniel Smith
On Fri, Nov 9, 2012 at 2:45 PM, Frédéric Bastien no...@nouiz.org wrote: On Fri, Nov 9, 2012 at 3:32 AM, Nathaniel Smith n...@pobox.com wrote: On Fri, Nov 9, 2012 at 6:18 AM, Nicolas SCHEFFER scheffer.nico...@gmail.com wrote: Fred, Thanks for the advice. The code will only affect the

Re: [Numpy-discussion] Scipy dot

2012-11-09 Thread Frédéric Bastien
On Fri, Nov 9, 2012 at 10:20 AM, Nathaniel Smith n...@pobox.com wrote: On Fri, Nov 9, 2012 at 2:45 PM, Frédéric Bastien no...@nouiz.org wrote: On Fri, Nov 9, 2012 at 3:32 AM, Nathaniel Smith n...@pobox.com wrote: On Fri, Nov 9, 2012 at 6:18 AM, Nicolas SCHEFFER

Re: [Numpy-discussion] Scipy dot

2012-11-09 Thread Gael Varoquaux
On Fri, Nov 09, 2012 at 03:12:42PM +, Nathaniel Smith wrote: But what if someone compiles numpy against an optimized blas (mkl, say) and then compiles SciPy against the reference blas? What do you do then!? ;-) This could happen. But the converse happens very often. What happens is that

Re: [Numpy-discussion] Scipy dot

2012-11-09 Thread Nathaniel Smith
On Fri, Nov 9, 2012 at 4:25 PM, Gael Varoquaux gael.varoqu...@normalesup.org wrote: On Fri, Nov 09, 2012 at 03:12:42PM +, Nathaniel Smith wrote: But what if someone compiles numpy against an optimized blas (mkl, say) and then compiles SciPy against the reference blas? What do you do then!?

Re: [Numpy-discussion] Scipy dot

2012-11-09 Thread Nicolas SCHEFFER
I too encourage users to use scipy.linalg for speed and robustness (hence calling this scipy.dot), but it just brings so much confusion! When using the scipy + numpy ecosystem, you'd almost want everything be done with scipy so that you get the best implementation in all cases: scipy.zeros(),

Re: [Numpy-discussion] Scipy dot

2012-11-09 Thread Nicolas SCHEFFER
Ok: comparing apples to apples. I'm clueless on my observations and would need input from you guys. Using ATLAS 3.10, numpy with and without my changes, I'm getting these timings and comparisons. # #I. Generate matrices using regular dot: # big = np.array(np.random.randn(2000, 2000), 'f');

Re: [Numpy-discussion] Scipy dot

2012-11-09 Thread Matthieu Brucher
Oh, about the differences. If there is something like cache blocking inside ATLAS (which would make sense), the MAD are not in exactly the same order and this would lead to some differences. You need to compare the MSE/sum of values squared to the machine precision. Cheers, 2012/11/9 Matthieu

Re: [Numpy-discussion] Scipy dot

2012-11-09 Thread Sebastian Berg
On Fri, 2012-11-09 at 14:52 -0800, Nicolas SCHEFFER wrote: Ok: comparing apples to apples. I'm clueless on my observations and would need input from you guys. Using ATLAS 3.10, numpy with and without my changes, I'm getting these timings and comparisons. # #I. Generate matrices using

Re: [Numpy-discussion] Scipy dot

2012-11-09 Thread Nathaniel Smith
In this case it's even possible the blas is being smart enough to notice that you've passed it the same pointer twice with the transpose switch on one of them, and is just skipping half the multiplies since the output is provably symmetric. Don't know if they actually do that but... -n On 9 Nov

Re: [Numpy-discussion] Scipy dot

2012-11-09 Thread Nicolas SCHEFFER
Well I have proof we're going things right! And it also shows the power of MKL... I checked with EPD and my python snippet (see also my first email), and all 4 timings are the same... However, if this behavior is the right one, then my python snippet should get the same trend. But it's not. So I

Re: [Numpy-discussion] Scipy dot

2012-11-09 Thread Dag Sverre Seljebotn
On 11/09/2012 11:57 PM, Matthieu Brucher wrote: Hi, A.A slower than A.A' is not a surprise for me. The latter is far more cache friendly that the former. Everything follows cache lines, so it is faster than something that will use one element from each cache line. In fact it is exactly what

Re: [Numpy-discussion] Scipy dot

2012-11-08 Thread Nathaniel Smith
On Wed, Nov 7, 2012 at 10:41 PM, Nicolas SCHEFFER scheffer.nico...@gmail.com wrote: Hi, I've written a snippet of code that we could call scipy.dot, a drop-in replacement for numpy.dot. It's dead easy, and just answer the need of calling the right blas function depending on the type of

Re: [Numpy-discussion] Scipy dot

2012-11-08 Thread Gael Varoquaux
On Thu, Nov 08, 2012 at 11:28:21AM +, Nathaniel Smith wrote: I think everyone would be very happy to see numpy.dot modified to do this automatically. But adding a scipy.dot IMHO would be fixing things in the wrong place and just create extra confusion. I am not sure I agree: numpy is often

Re: [Numpy-discussion] Scipy dot

2012-11-08 Thread Dag Sverre Seljebotn
On 11/08/2012 01:07 PM, Gael Varoquaux wrote: On Thu, Nov 08, 2012 at 11:28:21AM +, Nathaniel Smith wrote: I think everyone would be very happy to see numpy.dot modified to do this automatically. But adding a scipy.dot IMHO would be fixing things in the wrong place and just create extra

Re: [Numpy-discussion] Scipy dot

2012-11-08 Thread David Cournapeau
On Thu, Nov 8, 2012 at 12:12 PM, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no wrote: On 11/08/2012 01:07 PM, Gael Varoquaux wrote: On Thu, Nov 08, 2012 at 11:28:21AM +, Nathaniel Smith wrote: I think everyone would be very happy to see numpy.dot modified to do this automatically. But

Re: [Numpy-discussion] Scipy dot

2012-11-08 Thread Anthony Scopatz
On Thu, Nov 8, 2012 at 7:06 AM, David Cournapeau courn...@gmail.com wrote: On Thu, Nov 8, 2012 at 12:12 PM, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no wrote: On 11/08/2012 01:07 PM, Gael Varoquaux wrote: On Thu, Nov 08, 2012 at 11:28:21AM +, Nathaniel Smith wrote: I think

Re: [Numpy-discussion] Scipy dot

2012-11-08 Thread Frédéric Bastien
Hi, I also think it should go into numpy.dot and that the output order should not be changed. A new point, what about the additional overhead for small ndarray? To remove this, I would suggest to put this code into the C function that do the actual work (at least, from memory it is a c function,

Re: [Numpy-discussion] Scipy dot

2012-11-08 Thread Nicolas SCHEFFER
Thanks for all the responses folks. This is indeed a nice problem to solve. Few points: I. Change the order from 'F' to 'C': I'll look into it. II. Integration with scipy / numpy: opinions are diverging here. Let's wait a bit to get more responses on what people think. One thing though: I'd need

Re: [Numpy-discussion] Scipy dot

2012-11-08 Thread Nicolas SCHEFFER
I've made the necessary changes to get the proper order for the output array. Also, a pass of pep8 and some tests (fixmes are in failing tests) http://pastebin.com/M8TfbURi -n On Thu, Nov 8, 2012 at 11:38 AM, Nicolas SCHEFFER scheffer.nico...@gmail.com wrote: Thanks for all the responses folks.

Re: [Numpy-discussion] Scipy dot

2012-11-08 Thread Nicolas SCHEFFER
Well, hinted by what Fabien said, I looked at the C level dot function. Quite verbose! But starting line 757, we can see that it shouldn't be too much work to fix that bug (well there is even a comment there that states just that)

Re: [Numpy-discussion] Scipy dot

2012-11-08 Thread Sebastian Berg
Hey, On Thu, 2012-11-08 at 14:44 -0800, Nicolas SCHEFFER wrote: Well, hinted by what Fabien said, I looked at the C level dot function. Quite verbose! But starting line 757, we can see that it shouldn't be too much work to fix that bug (well there is even a comment there that states just

Re: [Numpy-discussion] Scipy dot

2012-11-08 Thread Sebastian Berg
On Fri, 2012-11-09 at 00:24 +0100, Sebastian Berg wrote: Hey, On Thu, 2012-11-08 at 14:44 -0800, Nicolas SCHEFFER wrote: Well, hinted by what Fabien said, I looked at the C level dot function. Quite verbose! But starting line 757, we can see that it shouldn't be too much work to fix

Re: [Numpy-discussion] Scipy dot

2012-11-08 Thread Nicolas SCHEFFER
Thanks Sebastien, didn't think of that. Well I went ahead and tried the change, and it's indeed straightforward. I've run some tests, among which: nosetests numpy/numpy/core/tests/test_blasdot.py and it looks ok. I'm assuming this is good news. I've copy-pasting the diff below, but I have that

Re: [Numpy-discussion] Scipy dot

2012-11-08 Thread Frédéric Bastien
Hi, I suspect the current tests are not enought. You need to test all the combination for the 3 inputs with thoses strides: c-contiguous f-contiguous something else like strided. Also, try with matrix with shape of 1 in each dimensions. Not all blas libraries accept the strides that numpy use

Re: [Numpy-discussion] Scipy dot

2012-11-08 Thread Nicolas SCHEFFER
Fred, Thanks for the advice. The code will only affect the part in _dotblas.c where gemm is called. There's tons of check before that make sure both matrices are of ndim 2. We should check though if we can do these tricks in other parts of the function. Otherwise: - I've built against ATLAS 3.10