Re: [julia-users] Re: SIMD multicore

2016-04-17 Thread Jiahao Chen
> Is that because Julia is calling a precompiled library and doesn't directly see the byte code? Yes

Re: [julia-users] Re: SIMD multicore

2016-04-17 Thread Jason Eckstein
There's also a BLAS operation for a*X + Y which is axpy!(a, X, Y). I tried it with the following lines. X = rand(Float32, 5000, 5000) Y = rand(Float32, 5000, 5000) for i = 1:100 axpy!(a, X, Y) end in a normal interactive session and noticed that all the cores were in use, near 100% CPU utilizat

Re: [julia-users] Re: SIMD multicore

2016-04-16 Thread Jiahao Chen
Yes, optimized BLAS implementations like MKL and OpenBLAS use vectorization heavily. Note that matrix addition A+B is fundamentally a very different beast from matrix multiplication A*B. In the former you have O(N^2) work and O(N^2) data, so the ratio of work to data is O(1). It is very likely

[julia-users] Re: SIMD multicore

2016-04-16 Thread Chris Rackauckas
BLAS functions are painstakingly developed to be beautiful bastions of parallelism (because of how ubiquitous their use is). The closest I think you can get is ParallelAccelerator.jl's @acc which does a lot of optimizations all together. However, it still won't match BLAS in terms of its effici

[julia-users] Re: SIMD multicore

2016-04-16 Thread Jason Eckstein
I often use julia muticore features with pmap and @parallel for loops. So the best way to achieve this is to split the array up into parts for each core and then run SIMD loops on each parallel process? Will there ever by a time when you can add a tag like SIMD that will have the compiler aut

[julia-users] Re: SIMD multicore

2016-04-16 Thread Valentin Churavy
Blas is using a combination of SIMD and multi-core processing. Multi-core (threading) is coming in Julia v0.5 as an experimental feature. On Saturday, 16 April 2016 14:13:00 UTC+9, Jason Eckstein wrote: > > I noticed in Julia 4 now if you call A+B where A and B are matrices of > equal size, the