Re: [OMPI users] Collective operations and synchronization

Eugene Loh Wed, 25 Mar 2009 18:26:29 -0400

Shaun Jackman wrote:

On Tue, 2009-03-24 at 07:03 -0800, Eugene Loh wrote:
I'm not sure I understand this suggestion, so I'll say it the way Iunderstand it. Would it be possible for each process to send an"all done" message to each of its neighbors? Conversely, eachprocess would poll its neighbors for messages, either processinggraph operations or collecting "all done" messages depending onwhether the message indicates a graph operation or signals "all done".
Ashley Pittman wrote:
Exactly, that way you have a defined number of messages which can be
calculated locally for each process and hence there is no need to use
Probe and you can get rid of the MPI_Barrier call.
Hi Eugene,
By `poll its neighbours', do you mean posting an MPI_Irecv for eachneighbour, and working until an `all done' message (sent usingMPI_Send) has been received from each neighbour?


Yes.

As long as each process posts its MPI_Irecv before starting theMPI_Send, are we guaranteed that two processes won't deadlock byMPI_Send-ing to each other?


Yes, I think so.

I avoided this method at first because I didn't understand that theMPI_Irecv would stick around regardless of a message being ready toreceive; I figured that it had no effect if no message was ready toreceive.

Not sure I understand. Let's say you post an MPI_Irecv and you getsomething in a follow-up MPI_Test or MPI_Wait... but it's not the "alldone" signal. Rather, it's a graph operation. So, you perform thatgraph operation and then post the next MPI_Irecv. Something like


MPI_Irecv()
while () {
   MPI_Wait()
   if ( all_done ) break;
   MPI_Irecv()
}

In this implementation, the graph is partitioned arbitrarily; thevertices are distributed based on a hash function of each vertex'sunique ID, not based on the topology of the graph (which would benice, but difficult). So, every process is a neighbour of every otherprocess.


Okay.

I guess one other piece of advice is this. Start with something thatworks; make sure it is easy to reason about its correctness. Doesn'tmatter if there is excessive synchronization. Then, start running. Ifoversynchronization proves to be a bottleneck, then fix it. But don'tfix it until you have data that indicates that it's a problem. I'm surethe folks on this list can come up with all sorts of great, minimallysynchronizing algorithms, but maybe you can get by with schemes that aresimpler, more robust, etc.

Re: [OMPI users] Collective operations and synchronization

Reply via email to