Eugene Loh wrote:
Shaun Jackman wrote:
Eugene Loh wrote:
Shaun Jackman wrote:

For my MPI application, each process reads a file and for each line sends a message (MPI_Send) to one of the other processes determined by the contents of that line. Each process posts a single MPI_Irecv and uses MPI_Request_get_status to test for a received message. If a message has been received, it processes the message and posts a new MPI_Irecv. I believe this situation is not safe and prone to deadlock since MPI_Send may block. The receiver would need to post as many MPI_Irecv as messages it expects to receive, but it does not know in advance how many messages to expect from the other processes. How is this situation usually handled in an MPI appliation where the number of messages to receive is unknown?
...

Each process posts an MPI_Irecv to listen for in-coming messages.

Each process enters a loop in which it reads its file and sends out messages. Within this loop, you also loop on MPI_Test to see if any message has arrived. If so, process it, post another MPI_Irecv(), and keep polling. (I'd use MPI_Test rather than MPI_Request_get_status since you'll have to call something like MPI_Test anyhow to complete the receive.)

Once you've posted all your sends, send out a special message to indicate you're finished. I'm thinking of some sort of tree fan-in/fan-out barrier so that everyone will know when everyone is finished.

Keep polling on MPI_Test, processing further receives or advancing your fan-in/fan-out barrier.

So, the key ingredients are:

*) keep polling on MPI_Test and reposting MPI_Irecv calls to drain in-coming messages while you're still in your "send" phase *) have another mechanism for processes to notify one another when they've finished their send phases
Hi Eugene,

Very astute. You've pretty much exactly described how it works now, particularly the loop around MPI_Test and MPI_Irecv to drain incoming messages. So, here's my worry, which I'll demonstrate with an example. We have four processes. Each calls MPI_Irecv once. Each reads one line of its file. Each sends one message with MPI_Send to some other process based on the line that it has read, and then goes into the MPI_Test/MPI_Irecv loop.

The events fall out in this order
2 sends to 0 and does not block (0 has one MPI_Irecv posted)
3 sends to 1 and does not block (1 has one MPI_Irecv posted)
0 receives the message from 2, consuming its MPI_Irecv
1 receives the message from 3, consuming its MPI_Irecv
0 sends to 1 and blocks (1 has no more MPI_Irecv posted)
1 sends to 0 and blocks (0 has no more MPI_Irecv posted)
and now processes 0 and 1 are deadlocked.

When I say `receives' above, I mean that Open MPI has received the message and copied it into the buffer passed to the MPI_Irecv call, but the application hasn't yet called MPI_Test. The next step would be for all the processes to call MPI_Test, but 0 and 1 are already deadlocked.

I don't get it. Processes should drain aggressively. So, if 0 receives a message, it should immediately post the next MPI_Irecv. Before 0 posts a send, it should MPI_Test (and post the next MPI_Irecv if the test received a message).

Further, you could convert to MPI_Isend.

But maybe I'm missing something.

Hi Eugene,

Before posting a send, the process can call MPI_Test to check for a received packet, but there's a race condition here. The packet can arrive between the MPI_Test (which returns false) and before it calls MPI_Send. I've added the MPI_Test calls to my example scenario:

2 calls MPI_Test. No message is waiting, so 2 decides to send.
2 sends to 0 and does not block (0 has one MPI_Irecv posted)
3 calls MPI_Test. No message is waiting, so 3 decides to send.
3 sends to 1 and does not block (1 has one MPI_Irecv posted)
0 calls MPI_Test. No message is waiting, so 0 decides to send.
0 receives the message from 2, consuming its MPI_Irecv
1 calls MPI_Test. No message is waiting, so 1 decides to send.
1 receives the message from 3, consuming its MPI_Irecv
0 sends to 1 and blocks (1 has no more MPI_Irecv posted)
1 sends to 0 and blocks (0 has no more MPI_Irecv posted)
and now processes 0 and 1 are deadlocked.

Cheers,
Shaun

Reply via email to