Jutho & Andreas, thank you both for the resources.
On Tuesday, July 7, 2015 at 3:42:52 PM UTC-4, Matthieu wrote:
>
> A lot of statistical algorithms require to compute X' diag(w) Y where X
> and Y are two matrices and w is a vector of weight (Y may or may not equal
> X). Is there a way to comput
That's true, and therefore also not in Julia, unless using some command to
inline assembly. However, in C it might be possible to get to a factor 2 of
BLAS speed. This might be sufficient if you want to implement something
slightly different from matrix multiplication (like maybe this case) and
The OpenBLAS framework is described in
http://dl.acm.org/citation.cfm?id=2503219 and builds on top of GotoBLAS,
http://dl.acm.org/citation.cfm?id=1377607
Jutho: Part of the conclusion from your link is that you cannot write a
fast matrix multiplication in C, but that you'll have to write it in
ass
I found a rather instructive tutorial about the kind of optimisations going
into matrix multiplication in BLIS (not OpenBLAS but related) here:
http://apfel.mathematik.uni-ulm.de/%7Elehn/sghpc/gemm/index.html
It's not something one can implement in Julia (yet). Hopefully further work in
the dire
Andreas, do you know offhand which matrix multiplication algorithm OpenBLAS
routine uses?
On Wednesday, July 8, 2015 at 11:37:51 AM UTC-4, Andreas Noack wrote:
>
> It can be quite large. With
>
> julia> function mymul(A,B)
>m, n = size(A, 1), size(B, 2)
>C = promote_type(typeof(A)
It can be quite large. With
julia> function mymul(A,B)
m, n = size(A, 1), size(B, 2)
C = promote_type(typeof(A), typeof(B))(m,n)
for j = 1:n
for i = 1:m
tmp = zero(eltype(C)); for k = 1:size(A, 2)
tmp += A[i,k]*B[k,j]
end
C[i,j] = tmp
Ah, thanks, that's good to know. I was under the mistaken impression that
loops are always the fastest option in Julia since it's brought up pretty
frequently. Out of curiosity, what factor of slow-down would not using the
optimized routines cause?
On Wed, Jul 8, 2015 at 10:39 AM, Andreas Noack w
You could, but unless the matrices are small, it would be slower because it
wouldn't use optimized matrix multiplication.
2015-07-08 10:36 GMT-04:00 Josh Langsfeld :
> Maybe I'm missing something obvious, but couldn't you easily write your
> own 'cross' function that uses a couple nested for-loop
Maybe I'm missing something obvious, but couldn't you easily write your own
'cross' function that uses a couple nested for-loops to do the arithmetic
without any intermediate allocations at all?
On Tuesday, July 7, 2015 at 6:24:34 PM UTC-4, Matthieu wrote:
>
> Thanks, this is what I currently do
No such BLAS routine exists, but for larger matrices the calculation will
be dominated by the final matrix-matrix product anyway.
Den tirsdag den 7. juli 2015 kl. 18.24.34 UTC-4 skrev Matthieu:
>
> Thanks, this is what I currently do :)
>
> However, I'd like to find a solution that is both memory
Thanks, this is what I currently do :)
However, I'd like to find a solution that is both memory efficient (X can
be very large) and which does not modify X in place.
Basically, I'm wondering whether there was a BLAS subroutine that would
allow to compute cross(X, w, Y) in one pass without creat
11 matches
Mail list logo