Actually, I didn't read your message closely, enough -- sorry. If you're getting a message about an IP address that is unknown to you, this suggests that there might be something wonky in your network setup.
Can you send all the information listed here: http://www.open-mpi.org/community/help/ On Sep 25, 2012, at 11:54 AM, Jeff Squyres wrote: > Hav you disabled firewalls on your nodes (e.g., iptables)? > > On Sep 25, 2012, at 11:08 AM, Richard wrote: > >> sometimes the following message jumped out when I run the ring program, but >> not always. >> I do not know this ip address 192.168.122.1, it is not in my list of hosts. >> >> >> [[53402,1],6][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] >> connect() to 192.168.122.1 failed: Connection refused (111 >> >> >> >> >> >> At 2012-09-25 16:53:50,Richard <codemon...@163.com> wrote: >> >> if I tried the ring program, the first round of pass is fine, but the second >> round is blocked at some node. >> here is the message printed out >> >> Process 0 sending 10 to 1, tag 201 (3 processes! in ring) >> Process 0 sent to 1 >> rank 1, message 10,start=========== >> rank 1, message 10,end------------- >> rank 2, message 10,start=========== >> Process 0 decremented value: 9 >> rank 0, message 9,start=========== >> rank 0, message 9,end------------- >> rank 2, message 10,end------------- >> rank 1, message 9,start=========== >> >> I have added some printf statements in the ring_c.c as follows: >> 60 printf("rank %d, message %d,start===========\n", rank, message); >> 61 MPI_Send(&message, 1, MPI_INT, ! next, tag, MPI_COMM_WORLD); >> 62 printf("rank %d, message %d,end-------------\n", rank, message); >> >> >> >> At 2012-09-25 16:30:01,Richard <codemon...@163.com> wrote: >> Hi Jody, >> thanks for your suggestion and you are right. if I use the ring example, the >> same happened. >> I have put a printf statement, it seems that all the three processed have >> reached the line >> calling "PMPI_Allreduce", any further suggestion? >> >> Thanks. >> Richard >> >> >> >> Message: 12 >> Date: Tue, 25 Sep 2012 09:43:09 +0200 >> From: jody < >> jody....@gmail.com >>> >> Subject: Re: [OMPI users] mpi job is blocked >> To: Open MPI Users < >> us...@open-mpi.org >>> >> Message-ID: >> < >> cakbzmgfl0txdyu82hksohrwh34cbpwbkmrkwc5dcdbt7a7w...@mail.gmail.com >>> >> Content-Type: text/plain; charset=ISO-8859-1 >> >> Hi Richard >> >> When a collective call hangs, this usually means that one (or more) >> processes did not reach this command. >> Are you sure that all processes reach the allreduce statement? >> >> If something like this happens to me, i insert print statements just >> before the MPI-call so i can see which processes made >> it to this point and which ones did not. >> >> Hope this helps a bit >> Jody >> >> On Tue, Sep 25, 2012 at 8:20 AM, Richard < >> codemon...@163.com >>> wrote: >>> I have 3 computers with the same Linux system. I have setup the mpi cluster >>> based on ssh connection. >>> I have tested a very simple mpi program, it works on the cluster. >>> >>> To make my story clear, I name the three computer as A, B and C. >>> >>> 1) If I run the job with 2 processes on A and B, it works. >>> 2) if I run the job with 3 processes on A, B and C, it is blocked. >>> 3) if I run the job with 2 processes on A and C, it works. >>> 4) If I run the job with all the 3 processes on A, it works. >>> >>> Using gdb I found the line at which it is blocked, it is here >>> >>> #7 0x00002ad8a283043e in PMPI_Allreduce (sendbuf=0x7fff09c7c578, >>> recvbuf=0x7fff09c7c570, count=1, datatype=0x627180, op=0x627780, >>> comm=0x627380) >>> at pallreduce.c:105 >>> 105 err = comm->c_coll.coll_allreduce(sendbuf, recvbuf, count, >>> >>> It seems that there is a communication problem between some computers. But >>> the above series of test cannot tell me what >>> exactly it is. Can anyone help me? thanks. >>> >>> Richard >>> >>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> >> us...@open-mpi.org >> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> >> >> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/