Hi, I am trying to optimize a code which just adds a bunch of things. My
first instinct was to unravel the loops and run it as SIMD, like so:
dx = 1/400
addprocs(3)
imin = -6
jmin = -2
#Some SIMD
@time res = @sync @parallel (+) for i = imin:dx:0
tmp = 0
for j=jmin:dx:0
ans = 0
@simd
Chris
To get good performance, you need to put your code into a function.
You seem to be evaluating it directly at the REPL -- this will be
slow. See the "Performance Tips" in the manual.
The LLVM code you see is not your kernel code. Instead, it contains a
"call" statement, presumably to a funct