On Jan 23, 2009, at 2:36 PM, Hartzman, Leslie D (MS) wrote:

I’m trying to modify some code that is involved in point-to-point communications. Process A has a one way mode of communication with Process B. ‘A’ checks to see if its rank is zero and if so will send a “command” to ‘B’ (MPI_Issend) about what kind of data is going to be coming next. After sending the command to ‘B’, ‘A’ then issues an Issend, sending a block of data to ‘B’.

Process ‘B’ sets up a number of request instances via MPI_Recv_init and then issues an MPI_Startall on the requests. ‘B’ sits in a “while (1)” loop, where the basic processing is a switch statement based on the content of the command being sent from ‘A’. At the top of the loop, ‘B’ sits at an MPI_Wait until a command comes in. Then at each case in the switch, ‘B’ sits in a MPI_Waitall to make sure that all ‘A’s have sent their data. ‘B’ then processes the received data, issues an MPI_Startall on the receive requests instances, exits the switch statement and then issues an MPI_Start on the ‘B’ command request so it can go back to waiting at the top of the loop.


In the original process ‘A’ code, prior to sending out a command, ‘A’ will issue an MPI_Wait to make sure that the command request instance is free.


I'm not quite sure I understand that statement. Can't you just compare the request to MPI_REQUEST_NULL? From your description, it sounds like if you get to this point and the request is not REQUEST_NULL, there's something else wrong. However, this may simply be a side-effect from the short description of complex code...?

After which it sends out the command, followed by the data. So I’ve taken this infrastructure and have tried to add a new command from within a function called in ‘A’. The function is passed the command request instance to be used for the MPI_Wait. I check the status of this MPI_Wait, and all is good. I then issue my own MPI_Issend (have also tried MPI_Ssend) to process ‘B’. The status coming back from the send is good. At the end of this function I added in another MPI_Wait because this function sends several commands from within a loop. None of the commands are received by ‘B’ – at least not at the beginning. After process ‘A’ goes through an outer loop a few times (each time calling my new function with the MPI calls in it), process ‘B’ suddenly gets some of the commands for one pass through the function. After that it never comes back from the MPI_Wait at the end of the inner function.



It's pretty hard to say without looking at your code.

But one warning is that depending on your network type, progress on MPI message passing may not occur unless you are in MPI function calls. So if you MPI_Isend (or MPI_Issend or any other non-blocking call), the message may or may not go out at that instant (or perhaps only the first part of it goes out at that instant). It may require another call into OMPI's progression engine to continue sending the message. Hence, on the receiver, it may not look like messages have arrived, but only because they haven't *fully* arrived yet (because the sender hasn't finished sending them yet).

That being said, I assume that your A process will block in an MPI_WAITANY, or somesuch, waiting for replies from the B process(es). Blocking in MPI_WAIT* will trip OMPI's progression engine such that whatever sends/receives are pending will get progressed as they can.

One clarifying question: why are you using synchronous sends?

--
Jeff Squyres
Cisco Systems


Reply via email to