[OMPI users] Fwd: Minimum time between MPI_Bcast or MPI_Reduce calls?

Nathan Hjelm via users Fri, 18 Jan 2019 18:02:04 -0800

Since neither bcast nor reduce acts as a barrier it is possible to run out of 
resources if either of these calls (or both) are used in a tight loop. The sync 
coll component exists for this scenario. You can enable it by  adding the 
following to mpirun (or setting these variables through the environment or a 
file):


—mca coll_sync_priority 100 —mca coll_sync_barrier_after 10


This will effectively throttle the collective calls for you. You can also 
change the reduce to an allreduce.


-Nathan

> On Jan 18, 2019, at 6:31 PM, Jeff Wentworth via users 
> <users@lists.open-mpi.org> wrote:
> 
> Greetings everyone,
> 
> I have a scientific code using Open MPI (v3.1.3) that seems to work fine when 
> MPI_Bcast() and MPI_Reduce() calls are well spaced out in time.  Yet if the 
> time between these calls is short, eventually one of the nodes hangs at some 
> random point, never returning from the broadcast or reduce call.  Is there 
> some minimum time between calls that needs to be obeyed in order for Open MPI 
> to process these reliably?
> 
> The reason this has come up is because I am trying to run in a multi-node 
> environment some established acceptance tests in order to verify that the 
> Open MPI configured version of the code yields the same baseline result as 
> the original single node version of the code.  These acceptance tests must 
> pass in order for the code to be considered validated and deliverable to the 
> customer.  One of these acceptance tests that hangs does involve 90 
> broadcasts and 90 reduces in a short period of time (less than .01 cpu sec), 
> as in:
> 
> Broadcast #89 in
>  Broadcast #89 out 8 bytes
>  Calculate angle #89
>  Reduce #89 in
>  Reduce #89 out 208 bytes
> Write result #89 to file on service node
> Broadcast #90 in
>  Broadcast #90 out 8 bytes
>  Calculate angle #89
>  Reduce #90 in
>  Reduce #90 out 208 bytes
> Write result #90 to file on service node
> 
> If I slow down the above acceptance test, for example by running it under 
> valgrind, then it runs to completion and yields the correct result.  So it 
> seems to suggest that something internal to Open MPI is getting swamped.  I 
> understand that these acceptance tests might be pushing the limit, given that 
> they involve so many short calculations combined with frequent, yet tiny, 
> transfers of data among nodes.  
> 
> Would it be worthwhile for me to enforce with some minimum wait time between 
> the MPI calls, say 0.01 or 0.001 sec via nanosleep()?  The only time it would 
> matter would be when acceptance tests are run, as the situation doesn't arise 
> when beefier runs are performed. 
> 
> Thanks.
> 
> jw2002
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] Fwd: Minimum time between MPI_Bcast or MPI_Reduce calls?

Reply via email to