Hi Thomas/Jacky Maybe using MPI_Probe (and maybe also MPI_Cancel) to probe the message size, and receive only those with size>0? Anyway, I'm just code-guessing.
I hope it helps, Gus Correa On 05/01/2013 05:14 PM, Thomas Watson wrote:
Hi Gus, Thanks for your suggestion! The problem of this two-phased data exchange is as follows. Each rank can have data blocks that will be exchanged to potentially all other ranks. So if a rank needs to tell all the other ranks about which blocks to receive, it would require an all-to-all collective communication during phase one (e.g., MPI_Gatherallv). Because such collective communication is blocking in current stable OpenMPI (MPI-2), it would have a negative impact on scalability of the application, especially when we have a large number of MPI ranks. This negative impact would not be compensated by the bandwidth saved :-) What I really need is something like this: Isend sets count to 0 if a block is not dirty. On the receiving side, MPI_Waitall deallocates the corresponding Irecv request immediately and sets the Irecv request handle to MPI_REQUEST_NULL as if it were a normal Irecv. I am wondering if someone could confirm this behavior with me? I could do an experiment on this too... Regards, Jacky On Wed, May 1, 2013 at 3:46 PM, Gus Correa <g...@ldeo.columbia.edu <mailto:g...@ldeo.columbia.edu>> wrote: Maybe start the data exchange by sending a (presumably short) list/array/index-function of the dirty/not-dirty blocks status (say, 0=not-dirty,1=dirty), then putting if conditionals before the Isend/Irecv so that only dirty blocks are exchanged? I hope this helps, Gus Correa On 05/01/2013 01:28 PM, Thomas Watson wrote: Hi, I have a program where each MPI rank hosts a set of data blocks. After doing computation over *some of* its local data blocks, each MPI rank needs to exchange data with other ranks. Note that the computation may involve only a subset of the data blocks on a MPI rank. The data exchange is achieved at each MPI rank through Isend and Irecv and then Waitall to complete the requests. Each pair of Isend and Irecv exchanges a corresponding pair of data blocks at different ranks. Right now, we do Isend/Irecv for EVERY block! The idea is that because the computation at a rank may only involves a subset of blocks, we could mark those blocks as dirty during the computation. And to reduce data exchange bandwidth, we could only exchanges those *dirty* pairs across ranks. The problem is: if a rank does not compute on a block 'm', and if it does not call Isend for 'm', then the receiving rank must somehow know this and either a) does not call Irecv for 'm' as well, or b) let Irecv for 'm' fail gracefully. My questions are: 1. how Irecv will behave (actually how MPI_Waitall will behave) if the corresponding Isend is missing? 2. If we still post Isend for 'm', but because we really do not need to send any data for 'm', can I just set a "flag" in Isend so that MPI_Waitall on the receiving side will "cancel" the corresponding Irecv immediately? For example, I can set the count in Isend to 0, and on the receiving side, when MPI_Waitall see a message with empty payload, it reclaims the corresponding Irecv? In my code, the correspondence between a pair of Isend and Irecv is established by a matching TAG. Thanks! Jacky _________________________________________________ users mailing list us...@open-mpi.org <mailto:us...@open-mpi.org> http://www.open-mpi.org/__mailman/listinfo.cgi/users <http://www.open-mpi.org/mailman/listinfo.cgi/users> _________________________________________________ users mailing list us...@open-mpi.org <mailto:us...@open-mpi.org> http://www.open-mpi.org/__mailman/listinfo.cgi/users <http://www.open-mpi.org/mailman/listinfo.cgi/users> _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users