I've pushed my code to a branch here
https://github.com/leschef/numpy/tree/faster_dot
with the commit
https://github.com/leschef/numpy/commit/ea037770e03f23aca1a06274a1a8e8bf0e0e2ee4
Let me know if that's enough to create a pull request.
Thanks,
-nicolas
On Sat, Nov 10, 2012 at 4:39 AM, George
On Mon, Nov 12, 2012 at 9:08 PM, Nicolas SCHEFFER
scheffer.nico...@gmail.com wrote:
I've pushed my code to a branch here
https://github.com/leschef/numpy/tree/faster_dot
with the commit
https://github.com/leschef/numpy/commit/ea037770e03f23aca1a06274a1a8e8bf0e0e2ee4
Let me know if that's
Yep exactly.
I just want to make sure that we talked enough on the principle first
(ie. goals and technical approach), and that indeed the code is good
enough to look at.
I get it from your answer that it is, so I went ahead
https://github.com/numpy/numpy/pull/2730
Thanks
-nicolas
On Mon, Nov
Note also that OpenBlas claims performance as good as MKL with Sandy Bridge
processors.
https://github.com/xianyi/OpenBLAS/wiki/faq#wiki-sandybridge_perf
George Nurser.
On 10 November 2012 00:38, Dag Sverre Seljebotn
d.s.seljeb...@astro.uio.nowrote:
On 11/09/2012 11:57 PM, Matthieu Brucher
On Fri, Nov 9, 2012 at 6:18 AM, Nicolas SCHEFFER
scheffer.nico...@gmail.com wrote:
Fred,
Thanks for the advice.
The code will only affect the part in _dotblas.c where gemm is called.
There's tons of check before that make sure both matrices are of ndim 2.
We should check though if we can do
On Thu, Nov 08, 2012 at 11:29:19AM -0600, Anthony Scopatz wrote:
Indeed, there is no reason not to make this available in NumPy.
+1, I agree, this should be a fix in numpy, not scipy.
I agree.
My point was a bit of a side issue: given a user's computer, I have no
garantee that numpy will
Hi,
It's really good to see this work being done.
It would be great if this could somehow be put also into the np.einsum
function, which currently doesn't even use blas, and is consequently
substantially slower than current np.dot (timings on
On Fri, Nov 9, 2012 at 3:32 AM, Nathaniel Smith n...@pobox.com wrote:
On Fri, Nov 9, 2012 at 6:18 AM, Nicolas SCHEFFER
scheffer.nico...@gmail.com wrote:
Fred,
Thanks for the advice.
The code will only affect the part in _dotblas.c where gemm is called.
There's tons of check before
On 9 Nov 2012 08:48, Gael Varoquaux gael.varoqu...@normalesup.org wrote:
On Thu, Nov 08, 2012 at 11:29:19AM -0600, Anthony Scopatz wrote:
Indeed, there is no reason not to make this available in NumPy.
+1, I agree, this should be a fix in numpy, not scipy.
I agree.
My point was a
On Fri, Nov 9, 2012 at 2:45 PM, Frédéric Bastien no...@nouiz.org wrote:
On Fri, Nov 9, 2012 at 3:32 AM, Nathaniel Smith n...@pobox.com wrote:
On Fri, Nov 9, 2012 at 6:18 AM, Nicolas SCHEFFER
scheffer.nico...@gmail.com wrote:
Fred,
Thanks for the advice.
The code will only affect the
On Fri, Nov 9, 2012 at 10:20 AM, Nathaniel Smith n...@pobox.com wrote:
On Fri, Nov 9, 2012 at 2:45 PM, Frédéric Bastien no...@nouiz.org wrote:
On Fri, Nov 9, 2012 at 3:32 AM, Nathaniel Smith n...@pobox.com wrote:
On Fri, Nov 9, 2012 at 6:18 AM, Nicolas SCHEFFER
On Fri, Nov 09, 2012 at 03:12:42PM +, Nathaniel Smith wrote:
But what if someone compiles numpy against an optimized blas (mkl,
say) and then compiles SciPy against the reference blas? What do you
do then!? ;-)
This could happen. But the converse happens very often. What happens is
that
On Fri, Nov 9, 2012 at 4:25 PM, Gael Varoquaux
gael.varoqu...@normalesup.org wrote:
On Fri, Nov 09, 2012 at 03:12:42PM +, Nathaniel Smith wrote:
But what if someone compiles numpy against an optimized blas (mkl,
say) and then compiles SciPy against the reference blas? What do you
do then!?
I too encourage users to use scipy.linalg for speed and robustness
(hence calling this scipy.dot), but it just brings so much confusion!
When using the scipy + numpy ecosystem, you'd almost want everything
be done with scipy so that you get the best implementation in all
cases: scipy.zeros(),
Ok: comparing apples to apples. I'm clueless on my observations and
would need input from you guys.
Using ATLAS 3.10, numpy with and without my changes, I'm getting these
timings and comparisons.
#
#I. Generate matrices using regular dot:
#
big = np.array(np.random.randn(2000, 2000), 'f');
Oh, about the differences. If there is something like cache blocking inside
ATLAS (which would make sense), the MAD are not in exactly the same order
and this would lead to some differences. You need to compare the MSE/sum of
values squared to the machine precision.
Cheers,
2012/11/9 Matthieu
On Fri, 2012-11-09 at 14:52 -0800, Nicolas SCHEFFER wrote:
Ok: comparing apples to apples. I'm clueless on my observations and
would need input from you guys.
Using ATLAS 3.10, numpy with and without my changes, I'm getting these
timings and comparisons.
#
#I. Generate matrices using
In this case it's even possible the blas is being smart enough to notice
that you've passed it the same pointer twice with the transpose switch on
one of them, and is just skipping half the multiplies since the output is
provably symmetric. Don't know if they actually do that but...
-n
On 9 Nov
Well I have proof we're going things right! And it also shows the
power of MKL...
I checked with EPD and my python snippet (see also my first email),
and all 4 timings are the same...
However, if this behavior is the right one, then my python snippet
should get the same trend. But it's not.
So I
On 11/09/2012 11:57 PM, Matthieu Brucher wrote:
Hi,
A.A slower than A.A' is not a surprise for me. The latter is far more
cache friendly that the former. Everything follows cache lines, so it is
faster than something that will use one element from each cache line. In
fact it is exactly what
On Wed, Nov 7, 2012 at 10:41 PM, Nicolas SCHEFFER
scheffer.nico...@gmail.com wrote:
Hi,
I've written a snippet of code that we could call scipy.dot, a drop-in
replacement for numpy.dot.
It's dead easy, and just answer the need of calling the right blas
function depending on the type of
On Thu, Nov 08, 2012 at 11:28:21AM +, Nathaniel Smith wrote:
I think everyone would be very happy to see numpy.dot modified to do
this automatically. But adding a scipy.dot IMHO would be fixing things
in the wrong place and just create extra confusion.
I am not sure I agree: numpy is often
On 11/08/2012 01:07 PM, Gael Varoquaux wrote:
On Thu, Nov 08, 2012 at 11:28:21AM +, Nathaniel Smith wrote:
I think everyone would be very happy to see numpy.dot modified to do
this automatically. But adding a scipy.dot IMHO would be fixing things
in the wrong place and just create extra
On Thu, Nov 8, 2012 at 12:12 PM, Dag Sverre Seljebotn
d.s.seljeb...@astro.uio.no wrote:
On 11/08/2012 01:07 PM, Gael Varoquaux wrote:
On Thu, Nov 08, 2012 at 11:28:21AM +, Nathaniel Smith wrote:
I think everyone would be very happy to see numpy.dot modified to do
this automatically. But
On Thu, Nov 8, 2012 at 7:06 AM, David Cournapeau courn...@gmail.com wrote:
On Thu, Nov 8, 2012 at 12:12 PM, Dag Sverre Seljebotn
d.s.seljeb...@astro.uio.no wrote:
On 11/08/2012 01:07 PM, Gael Varoquaux wrote:
On Thu, Nov 08, 2012 at 11:28:21AM +, Nathaniel Smith wrote:
I think
Hi,
I also think it should go into numpy.dot and that the output order should
not be changed.
A new point, what about the additional overhead for small ndarray? To
remove this, I would suggest to put this code into the C function that do
the actual work (at least, from memory it is a c function,
Thanks for all the responses folks. This is indeed a nice problem to solve.
Few points:
I. Change the order from 'F' to 'C': I'll look into it.
II. Integration with scipy / numpy: opinions are diverging here.
Let's wait a bit to get more responses on what people think.
One thing though: I'd need
I've made the necessary changes to get the proper order for the output array.
Also, a pass of pep8 and some tests (fixmes are in failing tests)
http://pastebin.com/M8TfbURi
-n
On Thu, Nov 8, 2012 at 11:38 AM, Nicolas SCHEFFER
scheffer.nico...@gmail.com wrote:
Thanks for all the responses folks.
Well, hinted by what Fabien said, I looked at the C level dot function.
Quite verbose!
But starting line 757, we can see that it shouldn't be too much work
to fix that bug (well there is even a comment there that states just
that)
Hey,
On Thu, 2012-11-08 at 14:44 -0800, Nicolas SCHEFFER wrote:
Well, hinted by what Fabien said, I looked at the C level dot function.
Quite verbose!
But starting line 757, we can see that it shouldn't be too much work
to fix that bug (well there is even a comment there that states just
On Fri, 2012-11-09 at 00:24 +0100, Sebastian Berg wrote:
Hey,
On Thu, 2012-11-08 at 14:44 -0800, Nicolas SCHEFFER wrote:
Well, hinted by what Fabien said, I looked at the C level dot function.
Quite verbose!
But starting line 757, we can see that it shouldn't be too much work
to fix
Thanks Sebastien, didn't think of that.
Well I went ahead and tried the change, and it's indeed straightforward.
I've run some tests, among which:
nosetests numpy/numpy/core/tests/test_blasdot.py
and it looks ok. I'm assuming this is good news.
I've copy-pasting the diff below, but I have that
Hi,
I suspect the current tests are not enought. You need to test all the
combination for the 3 inputs with thoses strides:
c-contiguous
f-contiguous
something else like strided.
Also, try with matrix with shape of 1 in each dimensions. Not all blas
libraries accept the strides that numpy use
Fred,
Thanks for the advice.
The code will only affect the part in _dotblas.c where gemm is called.
There's tons of check before that make sure both matrices are of ndim 2.
We should check though if we can do these tricks in other parts of the function.
Otherwise:
- I've built against ATLAS 3.10
34 matches
Mail list logo