In case anyone needs this in the future, here is what I managed to put
together, and please let me know if I am doing something reckless or wrong.
It is slightly faster than numpy.asfortranarray and it doesn't show any
cache miss symptoms but can't say I did a thorough bencmark testing. I
chose the
Ah I see. Thank you Sebastian, I was hoping to avoid all that blocking
(since HW dependency leaves some performance at many tables) or recursive
zooming stuff with some off-the-shelf tool but apparently I'm walking in
the dusty corners again collecting spider webs :) As you said, there are
quite a
On Thu, 2021-11-11 at 01:04 +0100, Ilhan Polat wrote:
> Hmm not sure I understand the question but this is what I mean by naive
> looping, suppose I allocate a scratch register work3, then
>
> for i in range(n): for j in range(n): work3[j*n+i] = work2[i*n+j]
>
NumPy does not end up doing anythin
Here are some actual numbers within the context of operation (nogil removed
and def'd for linetracing)
Line # Hits Time Per Hit % Time Line Contents
== 80 #
Bilinear identity to shave off some flops 81 # inv(V-U) (V+U) = inv(V-U)
(V-U+2V
Hmm not sure I understand the question but this is what I mean by naive
looping, suppose I allocate a scratch register work3, then
for i in range(n): for j in range(n): work3[j*n+i] = work2[i*n+j]
This basically doing the row to column based indexing and obviously we
create a lot of cache misse
On Thursday, November 11, 2021, Ilhan Polat wrote:
> I've asked this in Cython mailing list but probably I should also get some
> feedback here too.
>
> I have the following function defined in Cython and using flat memory
> pointers to hold n by n array data.
>
>
> cdef some_C_layout_func(double
Indeed for matrix multiplication and many other L3 BLAS functions, we are
lucky however for linear solve function ?getrs unfortunately no avail.
On Thu, Nov 11, 2021 at 12:31 AM Benjamin Root wrote:
> I have found that a bunch of lapack functions seem to have arguments for
> stating whether or n
I have found that a bunch of lapack functions seem to have arguments for
stating whether or not the given arrays are C or F ordered. Then you
wouldn't need to worry about handling the layout yourself. For example, I
have some C++ code like so:
extern "C" {
/**
* Forward declaration for LAPACK's