Re: [OMPI users] Beginner's question: why multiple sends or receives don't work?

David Zhang Tue, 22 Feb 2011 11:40:27 -0500

I think Bill is right.  Here is the description for mpi_finalize:

This routine cleans up all MPI states. Once this routine is called, no MPI
routine (not even MPI_Init) may be called, except for MPI_Get_version,
MPI_Initialized, and MPI_Finalized. Unless there has been a call to
MPI_Abort, you must ensure that all pending communications involving a
process are complete before the process calls MPI_Finalize. If the call
returns, each process may either continue local computations or exit without
participating in further communication with other processes. *At the moment
when the last process calls MPI_Finalize, all pending sends must be matched
by a receive, and all pending receives must be matched by a send. *


So I believe what Bill is alluding to is that after you called the second
Isend, your receive side hasn't posted the second Irecv; thus when
mpi_finalize is called on the send side, the message got thrown out.  When
your receive side does get to the second Irecv, it is waiting for a message
that'll never arrive.

On Tue, Feb 22, 2011 at 8:06 AM, Bill Rankin <bill.ran...@sas.com> wrote:

> Try putting an "MPI_Barrier()" call before your MPI_Finalize() [*].  I
> suspect that one of the programs (the sending side) is calling Finalize
> before the receiving side has processed the messages.
>
> -bill
>
> [*] pet peeve of mine : this should almost always be standard practice.
>
>
> > -----Original Message-----
> > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
> > Behalf Of Xianglong Kong
> > Sent: Tuesday, February 22, 2011 10:27 AM
> > To: Open MPI Users
> > Subject: Re: [OMPI users] Beginner's question: why multiple sends or
> > receives don't work?
> >
> > Hi, Thank you for the reply.
> >
> > However, using MPI_waitall instead of MPI_wait didn't solve the
> > problem. The code would hang at the MPI_waitall. Also, I'm not quit
> > understand why the code is inherently unsafe.  Can the non-blocking
> > send or receive cause any deadlock?
> >
> > Thanks!
> >
> > Kong
> >
> > On Mon, Feb 21, 2011 at 2:32 PM, Jeff Squyres <jsquy...@cisco.com>
> > wrote:
> > > It's because you're waiting on the receive request to complete before
> > the send request.  This likely works locally because the message
> > transfer is through shared memory and is fast, but it's still an
> > inherently unsafe way to block waiting for completion (i.e., the
> > receive might not complete if the send does not complete).
> > >
> > > What you probably want to do is build an array of 2 requests and then
> > issue a single MPI_Waitall() on both of them.  This will allow MPI to
> > progress both requests simultaneously.
> > >
> > >
> > > On Feb 18, 2011, at 11:58 AM, Xianglong Kong wrote:
> > >
> > >> Hi, all,
> > >>
> > >> I'm an mpi newbie. I'm trying to connect two desktops in my office
> > >> with each other using a crossing cable and implement a parallel code
> > >> on them using MPI.
> > >>
> > >> Now, the two nodes can ssh to each other without password, and can
> > >> successfully run the MPI "Hello world" code. However, when I tried
> > to
> > >> use multiple MPI non-blocking sends or receives, the job would hang.
> > >> The problem only showed up if the two processes are launched in the
> > >> different nodes, the code can run successfully if the two processes
> > >> are launched in the same node. Also, the code can run successfully
> > if
> > >> there are only one send or/and one receive in each process.
> > >>
> > >> Here is the code that can run successfully:
> > >>
> > >> #include <stdlib.h>
> > >> #include <stdio.h>
> > >> #include <string.h>
> > >> #include <mpi.h>
> > >>
> > >> int main(int argc, char** argv) {
> > >>
> > >>       int myrank, nprocs;
> > >>
> > >>       MPI_Init(&argc, &argv);
> > >>       MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
> > >>       MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
> > >>
> > >>       printf("Hello from processor %d of %d\n", myrank, nprocs);
> > >>
> > >>       MPI_Request reqs1, reqs2;
> > >>       MPI_Status stats1, stats2;
> > >>
> > >>       int tag1=10;
> > >>       int tag2=11;
> > >>
> > >>       int buf;
> > >>       int mesg;
> > >>       int source=1-myrank;
> > >>       int dest=1-myrank;
> > >>
> > >>       if(myrank==0)
> > >>       {
> > >>               mesg=1;
> > >>
> > >>               MPI_Irecv(&buf, 1, MPI_INT, source, tag1,
> > MPI_COMM_WORLD, &reqs1);
> > >>               MPI_Isend(&mesg, 1, MPI_INT, dest,  tag2,
> > MPI_COMM_WORLD, &reqs2);
> > >>
> > >>
> > >>       }
> > >>
> > >>       if(myrank==1)
> > >>       {
> > >>               mesg=2;
> > >>
> > >>               MPI_Irecv(&buf, 1, MPI_INT, source, tag2,
> > MPI_COMM_WORLD, &reqs1);
> > >>               MPI_Isend(&mesg, 1, MPI_INT,  dest, tag1,
> > MPI_COMM_WORLD, &reqs2);
> > >>       }
> > >>
> > >>       MPI_Wait(&reqs1, &stats1);
> > >>       printf("myrank=%d,received the message\n",myrank);
> > >>
> > >>       MPI_Wait(&reqs2, &stats2);
> > >>       printf("myrank=%d,sent the messages\n",myrank);
> > >>
> > >>       printf("myrank=%d, buf=%d\n",myrank, buf);
> > >>
> > >>       MPI_Finalize();
> > >>       return 0;
> > >> }
> > >>
> > >> And here is the code that will hang
> > >>
> > >> #include <stdlib.h>
> > >> #include <stdio.h>
> > >> #include <string.h>
> > >> #include <mpi.h>
> > >>
> > >> int main(int argc, char** argv) {
> > >>
> > >>       int myrank, nprocs;
> > >>
> > >>       MPI_Init(&argc, &argv);
> > >>       MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
> > >>       MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
> > >>
> > >>       printf("Hello from processor %d of %d\n", myrank, nprocs);
> > >>
> > >>       MPI_Request reqs1, reqs2;
> > >>       MPI_Status stats1, stats2;
> > >>
> > >>       int tag1=10;
> > >>       int tag2=11;
> > >>
> > >>       int source=1-myrank;
> > >>       int dest=1-myrank;
> > >>
> > >>       if(myrank==0)
> > >>       {
> > >>               int buf1, buf2;
> > >>
> > >>               MPI_Irecv(&buf1, 1, MPI_INT, source, tag1,
> > MPI_COMM_WORLD, &reqs1);
> > >>               MPI_Irecv(&buf2, 1, MPI_INT, source, tag2,
> > MPI_COMM_WORLD, &reqs2);
> > >>
> > >>               MPI_Wait(&reqs1, &stats1);
> > >>               printf("received one message\n");
> > >>
> > >>               MPI_Wait(&reqs2, &stats2);
> > >>               printf("received two messages\n");
> > >>
> > >>               printf("myrank=%d, buf1=%d, buf2=%d\n",myrank, buf1,
> > buf2);
> > >>       }
> > >>
> > >>       if(myrank==1)
> > >>       {
> > >>               int mesg1=1;
> > >>               int mesg2=2;
> > >>
> > >>               MPI_Isend(&mesg1, 1, MPI_INT, dest, tag1,
> > MPI_COMM_WORLD, &reqs1);
> > >>               MPI_Isend(&mesg2, 1, MPI_INT, dest, tag2,
> > MPI_COMM_WORLD, &reqs2);
> > >>
> > >>               MPI_Wait(&reqs1, &stats1);
> > >>               printf("sent one message\n");
> > >>
> > >>               MPI_Wait(&reqs2, &stats2);
> > >>               printf("sent two messages\n");
> > >>       }
> > >>
> > >>       MPI_Finalize();
> > >>       return 0;
> > >> }
> > >>
> > >> And the output of the second failed code:
> > >> ***********************************************
> > >> Hello from processor 0 of 2
> > >>
> > >> Received one message
> > >>
> > >> Hello from processor 1 of 2
> > >>
> > >> Sent one message
> > >> *******************************************************
> > >>
> > >> Can anyone help to point out why the second code didn't work?
> > >>
> > >> Thanks!
> > >>
> > >> Kong
> > >>
> > >> _______________________________________________
> > >> users mailing list
> > >> us...@open-mpi.org
> > >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >
> > >
> > > --
> > > Jeff Squyres
> > > jsquy...@cisco.com
> > > For corporate legal information go to:
> > > http://www.cisco.com/web/about/doing_business/legal/cri/
> > >
> > >
> > > _______________________________________________
> > > users mailing list
> > > us...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >
> >
> >
> >
> > --
> > Xianglong Kong
> > Department of Mechanical Engineering
> > University of Rochester
> > Phone: (585)520-4412
> > MSN: dinosaur8...@hotmail.com
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
David Zhang
University of California, San Diego

Re: [OMPI users] Beginner's question: why multiple sends or receives don't work?

Reply via email to