[julia-users] Re: Efficient way to compute X' diag(w) Y

2015-07-08 Thread David Gold
Jutho & Andreas, thank you both for the resources. On Tuesday, July 7, 2015 at 3:42:52 PM UTC-4, Matthieu wrote: > > A lot of statistical algorithms require to compute X' diag(w) Y where X > and Y are two matrices and w is a vector of weight (Y may or may not equal > X). Is there a way to comput

Re: [julia-users] Re: Efficient way to compute X' diag(w) Y

2015-07-08 Thread Jutho
That's true, and therefore also not in Julia, unless using some command to inline assembly. However, in C it might be possible to get to a factor 2 of BLAS speed. This might be sufficient if you want to implement something slightly different from matrix multiplication (like maybe this case) and

Re: [julia-users] Re: Efficient way to compute X' diag(w) Y

2015-07-08 Thread Andreas Noack
The OpenBLAS framework is described in http://dl.acm.org/citation.cfm?id=2503219 and builds on top of GotoBLAS, http://dl.acm.org/citation.cfm?id=1377607 Jutho: Part of the conclusion from your link is that you cannot write a fast matrix multiplication in C, but that you'll have to write it in ass

Re: [julia-users] Re: Efficient way to compute X' diag(w) Y

2015-07-08 Thread Jutho
I found a rather instructive tutorial about the kind of optimisations going into matrix multiplication in BLIS (not OpenBLAS but related) here: http://apfel.mathematik.uni-ulm.de/%7Elehn/sghpc/gemm/index.html It's not something one can implement in Julia (yet). Hopefully further work in the dire

Re: [julia-users] Re: Efficient way to compute X' diag(w) Y

2015-07-08 Thread David Gold
Andreas, do you know offhand which matrix multiplication algorithm OpenBLAS routine uses? On Wednesday, July 8, 2015 at 11:37:51 AM UTC-4, Andreas Noack wrote: > > It can be quite large. With > > julia> function mymul(A,B) >m, n = size(A, 1), size(B, 2) >C = promote_type(typeof(A)

Re: [julia-users] Re: Efficient way to compute X' diag(w) Y

2015-07-08 Thread Andreas Noack
It can be quite large. With julia> function mymul(A,B) m, n = size(A, 1), size(B, 2) C = promote_type(typeof(A), typeof(B))(m,n) for j = 1:n for i = 1:m tmp = zero(eltype(C)); for k = 1:size(A, 2) tmp += A[i,k]*B[k,j] end C[i,j] = tmp

Re: [julia-users] Re: Efficient way to compute X' diag(w) Y

2015-07-08 Thread Josh Langsfeld
Ah, thanks, that's good to know. I was under the mistaken impression that loops are always the fastest option in Julia since it's brought up pretty frequently. Out of curiosity, what factor of slow-down would not using the optimized routines cause? On Wed, Jul 8, 2015 at 10:39 AM, Andreas Noack w

Re: [julia-users] Re: Efficient way to compute X' diag(w) Y

2015-07-08 Thread Andreas Noack
You could, but unless the matrices are small, it would be slower because it wouldn't use optimized matrix multiplication. 2015-07-08 10:36 GMT-04:00 Josh Langsfeld : > Maybe I'm missing something obvious, but couldn't you easily write your > own 'cross' function that uses a couple nested for-loop

[julia-users] Re: Efficient way to compute X' diag(w) Y

2015-07-08 Thread Josh Langsfeld
Maybe I'm missing something obvious, but couldn't you easily write your own 'cross' function that uses a couple nested for-loops to do the arithmetic without any intermediate allocations at all? On Tuesday, July 7, 2015 at 6:24:34 PM UTC-4, Matthieu wrote: > > Thanks, this is what I currently do

[julia-users] Re: Efficient way to compute X' diag(w) Y

2015-07-08 Thread Andreas Noack
No such BLAS routine exists, but for larger matrices the calculation will be dominated by the final matrix-matrix product anyway. Den tirsdag den 7. juli 2015 kl. 18.24.34 UTC-4 skrev Matthieu: > > Thanks, this is what I currently do :) > > However, I'd like to find a solution that is both memory

[julia-users] Re: Efficient way to compute X' diag(w) Y

2015-07-07 Thread Matthieu
Thanks, this is what I currently do :) However, I'd like to find a solution that is both memory efficient (X can be very large) and which does not modify X in place. Basically, I'm wondering whether there was a BLAS subroutine that would allow to compute cross(X, w, Y) in one pass without creat