I probably already have an idea what's going on. How are the different
tasks distributed over the different Julia processes? Is the for loop
immediately cut into pieces where e.g. process 1 will handle the cases
iter=1:10, process 2 handles the cases iter=11:20 and so on? For different
values of the parameters, the execution time will be widely different (from
fraction of a second to several minutes or even more). If some processes
handle all the slow cases and other all the fast cases, then this explains
the behaviour I am seeing. I guess I need to write my own task
distribution, for which I will have to read the Manual section on parallel
computing again.