Re: [OMPI users] How to reduce Isend & Irecv bandwidth?

Gus Correa Wed, 1 May 2013 18:05:09 -0400

Hi Thomas/Jacky

Maybe using MPI_Probe (and maybe also MPI_Cancel)
to probe the message size,
and receive only those with size>0?
Anyway, I'm just code-guessing.


I hope it helps,
Gus Correa

On 05/01/2013 05:14 PM, Thomas Watson wrote:

Hi Gus,

Thanks for your suggestion!

The problem of this two-phased data exchange is as follows. Each rank
can have data blocks that will be exchanged to potentially all other
ranks. So if a rank needs to tell all the other ranks about which blocks
to receive, it would require an all-to-all collective communication
during phase one (e.g., MPI_Gatherallv). Because such collective
communication is blocking in current stable OpenMPI (MPI-2), it would
have a negative impact on scalability of the application, especially
when we have a large number of MPI ranks. This negative impact would not
be compensated by the bandwidth saved :-)

What I really need is something like this: Isend sets count to 0 if a
block is not dirty. On the receiving side, MPI_Waitall deallocates the
corresponding Irecv request immediately and sets the Irecv request
handle to MPI_REQUEST_NULL as if it were a normal Irecv. I am wondering
if someone could confirm this behavior with me? I could do an experiment
on this too...

Regards,

Jacky




On Wed, May 1, 2013 at 3:46 PM, Gus Correa <g...@ldeo.columbia.edu
<mailto:g...@ldeo.columbia.edu>> wrote:

    Maybe start the data exchange by sending a (presumably short)
    list/array/index-function of the dirty/not-dirty blocks status
    (say, 0=not-dirty,1=dirty),
    then putting if conditionals before the Isend/Irecv so that only
    dirty blocks are exchanged?

    I hope this helps,
    Gus Correa




    On 05/01/2013 01:28 PM, Thomas Watson wrote:

        Hi,

        I have a program where each MPI rank hosts a set of data blocks.
        After
        doing computation over *some of* its local data blocks, each MPI
        rank
        needs to exchange data with other ranks. Note that the
        computation may
        involve only a subset of the data blocks on a MPI rank. The data
        exchange is achieved at each MPI rank through Isend and Irecv
        and then
        Waitall to complete the requests. Each pair of Isend and Irecv
        exchanges
        a corresponding pair of data blocks at different ranks. Right
        now, we do
        Isend/Irecv for EVERY block!

        The idea is that because the computation at a rank may only
        involves a
        subset of blocks, we could mark those blocks as dirty during the
        computation. And to reduce data exchange bandwidth, we could only
        exchanges those *dirty* pairs across ranks.

        The problem is: if a rank does not compute on a block 'm', and if it
        does not call Isend for 'm', then the receiving rank must
        somehow know
        this and either a) does not call Irecv for 'm' as well, or b)
        let Irecv
        for 'm' fail gracefully.

        My questions are:
        1. how Irecv will behave (actually how MPI_Waitall will behave)
        if the
        corresponding Isend is missing?

        2. If we still post Isend for 'm', but because we really do not
        need to
        send any data for 'm', can I just set a "flag" in Isend so that
        MPI_Waitall on the receiving side will "cancel" the
        corresponding Irecv
        immediately? For example, I can set the count in Isend to 0, and
        on the
        receiving side, when MPI_Waitall see a message with empty
        payload, it
        reclaims the corresponding Irecv? In my code, the correspondence
        between
        a pair of Isend and Irecv is established by a matching TAG.

        Thanks!

        Jacky


        _________________________________________________
        users mailing list
        us...@open-mpi.org <mailto:us...@open-mpi.org>
        http://www.open-mpi.org/__mailman/listinfo.cgi/users
        <http://www.open-mpi.org/mailman/listinfo.cgi/users>


    _________________________________________________
    users mailing list
    us...@open-mpi.org <mailto:us...@open-mpi.org>
    http://www.open-mpi.org/__mailman/listinfo.cgi/users
    <http://www.open-mpi.org/mailman/listinfo.cgi/users>




_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] How to reduce Isend & Irecv bandwidth?

Reply via email to