One thing you might want to keep in mind is that “non-blocking” doesn’t mean
“asynchronous progress”. The API may not block, but the communications only
progress whenever you actually call down into the library.
So if you are calling a non-blocking collective, and then make additional calls
int
>Try and do a variable amount of work for every process, I see non-blocking
>as a way to speed-up communication if they arrive individually to the
call.
>Please always have this at the back of your mind when doing this.
I tried to simplify the problem at the explanation. The "local_computation"
is
Try and do a variable amount of work for every process, I see non-blocking
as a way to speed-up communication if they arrive individually to the call.
Please always have this at the back of your mind when doing this.
Surely non-blocking has overhead, and if the communication time is low, so
will t
Hello!
I have a program that basically is (first implementation):
for i in N:
local_computation(i)
mpi_allreduce(in_place, i)
In order to try to mitigate the implicit barrier of the mpi_allreduce, I
tried to start an mpi_Iallreduce. Like this(second implementation):
for i in N:
local_comput
I know there was an issue with Torque and cpusets at one time, but I believe
that has been fixed (likely later in 1.10 series).
Regardless, the error message you are seeing indicates the failure to open a
socket between daemons on different nodes. Could be hitting a file descriptor
limit, or it
Hi Ralph,
Thanks for the reply!
I have tried, but couldn't get 1.8.8 or 1.10 (tried 1.10.0 back then) to
work with our pretty old Torque 2.5.13 with cpusets . Under some
circumstances (process/node layout as given by Torque), it fails to bind
cores with messages like:
Error message: hwloc_s