Re: [OMPI users] Collective operations and synchronization

Eugene Loh Tue, 24 Mar 2009 11:04:08 -0400

Ashley Pittman wrote:

On 23 Mar 2009, at 23:36, Shaun Jackman wrote:
loop {
MPI_Ibsend (for every edge of every leaf node)
MPI_barrier
MPI_Iprobe/MPI_Recv (until no messages pending)
MPI_Allreduce (number of nodes removed)
} until (no nodes removed by any node)
Previously, I attempted to use a single MPI_Allreduce without theMPI_Barrier:
You need both the MPI_Barrier and the synchronisation semantics of theMPI_Allreduce in this example,


Yes, since the sync and the Allreduce are in different places.

it's important that each send matches a recv for the same iteration soyou need to ensure all sends have been sent before you call probe anda Barrier is one way of doing this. You also need the syncronisationsemantics of the Allreduce to ensure the iProbe doesn't match a sendfrom the next iteration of the loop.
Perhaps there is a better way of accomplishing the same thing however,MPI_Barrier syncronises all processes so is potentially a lot moreheavyweight than it needs to be, in this example you only need tosyncronise with your neighbours so it might be quicker to use asend/receive for each of your neighbours containing a true/false valuerather than to rely on the existence of a message or not. i.e. thebarrier is needed because you don't know how many messages there are,it may well be quicker to have a fixed number of point to pointmessages rather than a extra global synchronisation. The addedadvantage of doing it this way would be you could remove the Probe aswell.

I'm not sure I understand this suggestion, so I'll say it the way Iunderstand it. Would it be possible for each process to send an "alldone" message to each of its neighbors? Conversely, each process wouldpoll its neighbors for messages, either processing graph operations orcollecting "all done" messages depending on whether the messageindicates a graph operation or signals "all done".

Potentially it would be possible to remove the Allreduce as well anduse the tag to identify the iteration count, assuming of course youdon't need to know the global number of branches at any iteration.One problem with this approach can be that one process can get veryslow and swamped with unexpected messages however assuming yourneighbour count is small this shouldn't be a problem. I'd expecttheir to not only be a net gain changing to this way but for theapplication to scale better as well.
Finally I've always favoured iRecv/Send over Ibsend/Recv as in themajority of cases this tends to be faster, you'd have to benchmarkyour specific setup however.

Re: [OMPI users] Collective operations and synchronization

Reply via email to