On Jan 23, 2009, at 2:36 PM, Hartzman, Leslie D (MS) wrote:
I’m trying to modify some code that is involved in point-to-point
communications. Process A has a one way mode of communication with
Process B. ‘A’ checks to see if its rank is zero and if so will send
a “command” to ‘B’ (MPI_Issend) about what kind of data is going to
be coming next. After sending the command to ‘B’, ‘A’ then issues an
Issend, sending a block of data to ‘B’.
Process ‘B’ sets up a number of request instances via MPI_Recv_init
and then issues an MPI_Startall on the requests. ‘B’ sits in a
“while (1)” loop, where the basic processing is a switch statement
based on the content of the command being sent from ‘A’. At the top
of the loop, ‘B’ sits at an MPI_Wait until a command comes in. Then
at each case in the switch, ‘B’ sits in a MPI_Waitall to make sure
that all ‘A’s have sent their data. ‘B’ then processes the received
data, issues an MPI_Startall on the receive requests instances,
exits the switch statement and then issues an MPI_Start on the ‘B’
command request so it can go back to waiting at the top of the loop.
In the original process ‘A’ code, prior to sending out a command,
‘A’ will issue an MPI_Wait to make sure that the command request
instance is free.
I'm not quite sure I understand that statement. Can't you just
compare the request to MPI_REQUEST_NULL? From your description, it
sounds like if you get to this point and the request is not
REQUEST_NULL, there's something else wrong. However, this may simply
be a side-effect from the short description of complex code...?
After which it sends out the command, followed by the data. So I’ve
taken this infrastructure and have tried to add a new command from
within a function called in ‘A’. The function is passed the command
request instance to be used for the MPI_Wait. I check the status of
this MPI_Wait, and all is good. I then issue my own MPI_Issend (have
also tried MPI_Ssend) to process ‘B’. The status coming back from
the send is good. At the end of this function I added in another
MPI_Wait because this function sends several commands from within a
loop. None of the commands are received by ‘B’ – at least not at the
beginning. After process ‘A’ goes through an outer loop a few times
(each time calling my new function with the MPI calls in it),
process ‘B’ suddenly gets some of the commands for one pass through
the function. After that it never comes back from the MPI_Wait at
the end of the inner function.
It's pretty hard to say without looking at your code.
But one warning is that depending on your network type, progress on
MPI message passing may not occur unless you are in MPI function
calls. So if you MPI_Isend (or MPI_Issend or any other non-blocking
call), the message may or may not go out at that instant (or perhaps
only the first part of it goes out at that instant). It may require
another call into OMPI's progression engine to continue sending the
message. Hence, on the receiver, it may not look like messages have
arrived, but only because they haven't *fully* arrived yet (because
the sender hasn't finished sending them yet).
That being said, I assume that your A process will block in an
MPI_WAITANY, or somesuch, waiting for replies from the B process(es).
Blocking in MPI_WAIT* will trip OMPI's progression engine such that
whatever sends/receives are pending will get progressed as they can.
One clarifying question: why are you using synchronous sends?
--
Jeff Squyres
Cisco Systems