Hi, I'm trying to solve a problem of passing serializable, arbitrarily sized objects around using MPI and non-blocking communication. The problem I'm facing is what to do at the receiving end when expecting an object of unknown size, but at the same time not block on waiting for it.
When using blocking message passing, I have simply solved the problem by first sending a small, fixed size header containing the size of rest of the data, sent in the following mpi message. When using non-blocking message passing, this doesn't seem to be such a good idea, since we cant post the main data transfer until we have received the message header... It seems to take away most of the advantages on non-blocking io in the first place. I've been thinking about solving this using MPI_Probe / MPI_IProbe, but I'm worried about performance. Question 1: Will MPI_Probe or the underlying MPI implementation actually receive the full message data (assuming reasonably sized message, like less than 10MB) before MPI_Probe returns? Or will there be a significant data transfer delay (for large messages) when calling MPI_Recv after a successful MPI_Probe? What I want is something like this: 1) post one or several non-blocking, variable sized message receives 2) do other, non-MPI work, while any incoming messages will be fully received into buffers on the local machine. 3) perform completion of the receives posted in 1). I don't want to unnecessarily wait here for data transfers that could have taken place during 2). Problems: I can't post non-blocking MPI_Irecv() calls in 1, because I don't know the sizes of incoming messages. If I simply do nothing in 1, and call MPI_Probe in 3, I'm worried that I won't get nice compute/transfer overlap because the messages wont actually be received locally until I post a Probe or Recv in 3. Question 2: How can I achieve the communication sequence described in 1,2,3 above, with overlapping data transfer and local computation during 2? Question 3: A temporary kludge solution to the problem above might be to allocate a temporary receive buffer of some arbitrary, constant maximum size BUFSIZE in 1 for each non-blocking receive operation, make sure messages sent are not larger than BUFSIZE, and post MPI_Irecv(buffer, BUFSIZE,...) calls in 1. I haven't been able to figure out if it's actually correct and portable to receive less data than specified in the count argument to MPI_Irecv. What if the message sent on the other end is 10 bytes, and BUFSIZE=count=20. Would that be OK? If anyone can shed any light on this, I'd be grateful. FYI, we're using a cluster of 2-8 core x86-64 machines running Linux and connected using ordinary 1Gbit ethernet. Best regards, Lars Andersson