Re: [OMPI users] Beginner's question: why multiple sends or receives don't work?

Jeff Squyres Tue, 22 Feb 2011 16:19:49 -0500

On Feb 22, 2011, at 11:06 AM, Bill Rankin wrote:

> Try putting an "MPI_Barrier()" call before your MPI_Finalize() [*].  I 
> suspect that one of the programs (the sending side) is calling Finalize 
> before the receiving side has processed the messages.


FWIW: I have rarely seen this to be the issue.

MPI does not guarantee point-to-point progress when you are in a collective.  
Some implementations do this anyone; others do not (e.g., some of OMPI's 
transports will; others will not).

In short, programs are erroneous that do not guarantee that all their 
outstanding requests have completed before calling finalize. 

Also, I first read your email on a phone and did not notice that you had *2* 
sets of source code.  Sorry for the confusion.  I just copied your 2nd code to 
my test cluster and it runs fine for me across multiple nodes -- it does not 
hang.  The order of waits seems correct to me.  



> -bill
> 
> [*] pet peeve of mine : this should almost always be standard practice.
> 
> 
>> -----Original Message-----
>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
>> Behalf Of Xianglong Kong
>> Sent: Tuesday, February 22, 2011 10:27 AM
>> To: Open MPI Users
>> Subject: Re: [OMPI users] Beginner's question: why multiple sends or
>> receives don't work?
>> 
>> Hi, Thank you for the reply.
>> 
>> However, using MPI_waitall instead of MPI_wait didn't solve the
>> problem. The code would hang at the MPI_waitall. Also, I'm not quit
>> understand why the code is inherently unsafe.  Can the non-blocking
>> send or receive cause any deadlock?
>> 
>> Thanks!
>> 
>> Kong
>> 
>> On Mon, Feb 21, 2011 at 2:32 PM, Jeff Squyres <jsquy...@cisco.com>
>> wrote:
>>> It's because you're waiting on the receive request to complete before
>> the send request.  This likely works locally because the message
>> transfer is through shared memory and is fast, but it's still an
>> inherently unsafe way to block waiting for completion (i.e., the
>> receive might not complete if the send does not complete).
>>> 
>>> What you probably want to do is build an array of 2 requests and then
>> issue a single MPI_Waitall() on both of them.  This will allow MPI to
>> progress both requests simultaneously.
>>> 
>>> 
>>> On Feb 18, 2011, at 11:58 AM, Xianglong Kong wrote:
>>> 
>>>> Hi, all,
>>>> 
>>>> I'm an mpi newbie. I'm trying to connect two desktops in my office
>>>> with each other using a crossing cable and implement a parallel code
>>>> on them using MPI.
>>>> 
>>>> Now, the two nodes can ssh to each other without password, and can
>>>> successfully run the MPI "Hello world" code. However, when I tried
>> to
>>>> use multiple MPI non-blocking sends or receives, the job would hang.
>>>> The problem only showed up if the two processes are launched in the
>>>> different nodes, the code can run successfully if the two processes
>>>> are launched in the same node. Also, the code can run successfully
>> if
>>>> there are only one send or/and one receive in each process.
>>>> 
>>>> Here is the code that can run successfully:
>>>> 
>>>> #include <stdlib.h>
>>>> #include <stdio.h>
>>>> #include <string.h>
>>>> #include <mpi.h>
>>>> 
>>>> int main(int argc, char** argv) {
>>>> 
>>>>       int myrank, nprocs;
>>>> 
>>>>       MPI_Init(&argc, &argv);
>>>>       MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
>>>>       MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
>>>> 
>>>>       printf("Hello from processor %d of %d\n", myrank, nprocs);
>>>> 
>>>>       MPI_Request reqs1, reqs2;
>>>>       MPI_Status stats1, stats2;
>>>> 
>>>>       int tag1=10;
>>>>       int tag2=11;
>>>> 
>>>>       int buf;
>>>>       int mesg;
>>>>       int source=1-myrank;
>>>>       int dest=1-myrank;
>>>> 
>>>>       if(myrank==0)
>>>>       {
>>>>               mesg=1;
>>>> 
>>>>               MPI_Irecv(&buf, 1, MPI_INT, source, tag1,
>> MPI_COMM_WORLD, &reqs1);
>>>>               MPI_Isend(&mesg, 1, MPI_INT, dest,  tag2,
>> MPI_COMM_WORLD, &reqs2);
>>>> 
>>>> 
>>>>       }
>>>> 
>>>>       if(myrank==1)
>>>>       {
>>>>               mesg=2;
>>>> 
>>>>               MPI_Irecv(&buf, 1, MPI_INT, source, tag2,
>> MPI_COMM_WORLD, &reqs1);
>>>>               MPI_Isend(&mesg, 1, MPI_INT,  dest, tag1,
>> MPI_COMM_WORLD, &reqs2);
>>>>       }
>>>> 
>>>>       MPI_Wait(&reqs1, &stats1);
>>>>       printf("myrank=%d,received the message\n",myrank);
>>>> 
>>>>       MPI_Wait(&reqs2, &stats2);
>>>>       printf("myrank=%d,sent the messages\n",myrank);
>>>> 
>>>>       printf("myrank=%d, buf=%d\n",myrank, buf);
>>>> 
>>>>       MPI_Finalize();
>>>>       return 0;
>>>> }
>>>> 
>>>> And here is the code that will hang
>>>> 
>>>> #include <stdlib.h>
>>>> #include <stdio.h>
>>>> #include <string.h>
>>>> #include <mpi.h>
>>>> 
>>>> int main(int argc, char** argv) {
>>>> 
>>>>       int myrank, nprocs;
>>>> 
>>>>       MPI_Init(&argc, &argv);
>>>>       MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
>>>>       MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
>>>> 
>>>>       printf("Hello from processor %d of %d\n", myrank, nprocs);
>>>> 
>>>>       MPI_Request reqs1, reqs2;
>>>>       MPI_Status stats1, stats2;
>>>> 
>>>>       int tag1=10;
>>>>       int tag2=11;
>>>> 
>>>>       int source=1-myrank;
>>>>       int dest=1-myrank;
>>>> 
>>>>       if(myrank==0)
>>>>       {
>>>>               int buf1, buf2;
>>>> 
>>>>               MPI_Irecv(&buf1, 1, MPI_INT, source, tag1,
>> MPI_COMM_WORLD, &reqs1);
>>>>               MPI_Irecv(&buf2, 1, MPI_INT, source, tag2,
>> MPI_COMM_WORLD, &reqs2);
>>>> 
>>>>               MPI_Wait(&reqs1, &stats1);
>>>>               printf("received one message\n");
>>>> 
>>>>               MPI_Wait(&reqs2, &stats2);
>>>>               printf("received two messages\n");
>>>> 
>>>>               printf("myrank=%d, buf1=%d, buf2=%d\n",myrank, buf1,
>> buf2);
>>>>       }
>>>> 
>>>>       if(myrank==1)
>>>>       {
>>>>               int mesg1=1;
>>>>               int mesg2=2;
>>>> 
>>>>               MPI_Isend(&mesg1, 1, MPI_INT, dest, tag1,
>> MPI_COMM_WORLD, &reqs1);
>>>>               MPI_Isend(&mesg2, 1, MPI_INT, dest, tag2,
>> MPI_COMM_WORLD, &reqs2);
>>>> 
>>>>               MPI_Wait(&reqs1, &stats1);
>>>>               printf("sent one message\n");
>>>> 
>>>>               MPI_Wait(&reqs2, &stats2);
>>>>               printf("sent two messages\n");
>>>>       }
>>>> 
>>>>       MPI_Finalize();
>>>>       return 0;
>>>> }
>>>> 
>>>> And the output of the second failed code:
>>>> ***********************************************
>>>> Hello from processor 0 of 2
>>>> 
>>>> Received one message
>>>> 
>>>> Hello from processor 1 of 2
>>>> 
>>>> Sent one message
>>>> *******************************************************
>>>> 
>>>> Can anyone help to point out why the second code didn't work?
>>>> 
>>>> Thanks!
>>>> 
>>>> Kong
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> 
>>> --
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>> 
>> 
>> 
>> --
>> Xianglong Kong
>> Department of Mechanical Engineering
>> University of Rochester
>> Phone: (585)520-4412
>> MSN: dinosaur8...@hotmail.com
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI users] Beginner's question: why multiple sends or receives don't work?

Reply via email to