Actually, I didn't read your message closely, enough -- sorry.

If you're getting a message about an IP address that is unknown to you, this 
suggests that there might be something wonky in your network setup.

Can you send all the information listed here:

    http://www.open-mpi.org/community/help/


On Sep 25, 2012, at 11:54 AM, Jeff Squyres wrote:

> Hav you disabled firewalls on your nodes (e.g., iptables)?
> 
> On Sep 25, 2012, at 11:08 AM, Richard wrote:
> 
>> sometimes the following message jumped out when I run the ring program, but 
>> not always.
>> I do not know this ip address  192.168.122.1, it is not in my list of hosts.
>> 
>> 
>> [[53402,1],6][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] 
>> connect() to 192.168.122.1 failed: Connection refused (111
>> 
>> 
>> 
>> 
>> 
>> At 2012-09-25 16:53:50,Richard <codemon...@163.com> wrote:
>> 
>> if I tried the ring program, the first round of pass is fine, but the second 
>> round is blocked at some node.
>> here is the message printed out
>> 
>> Process 0 sending 10 to 1, tag 201 (3 processes! in ring)
>> Process 0 sent to 1
>> rank 1, message 10,start===========
>> rank 1, message 10,end-------------
>> rank 2, message 10,start===========
>> Process 0 decremented value: 9
>> rank 0, message 9,start===========
>> rank 0, message 9,end-------------
>> rank 2, message 10,end-------------
>> rank 1, message 9,start===========
>> 
>> I have added some printf statements in the ring_c.c as follows:
>> 60         printf("rank %d, message %d,start===========\n", rank, message);
>> 61         MPI_Send(&message, 1, MPI_INT, ! next, tag, MPI_COMM_WORLD);
>> 62         printf("rank %d, message %d,end-------------\n", rank, message);
>> 
>> 
>> 
>> At 2012-09-25 16:30:01,Richard <codemon...@163.com> wrote:
>> Hi Jody,
>> thanks for your suggestion and you are right. if I use the ring example, the 
>> same happened.
>> I have put a printf statement, it seems that all the three processed have 
>> reached the line 
>> calling "PMPI_Allreduce", any further suggestion?
>> 
>> Thanks.
>> Richard
>> 
>> 
>> 
>> Message: 12
>> Date: Tue, 25 Sep 2012 09:43:09 +0200
>> From: jody <
>> jody....@gmail.com
>>> 
>> Subject: Re: [OMPI users] mpi job is blocked
>> To: Open MPI Users <
>> us...@open-mpi.org
>>> 
>> Message-ID:
>>      <
>> cakbzmgfl0txdyu82hksohrwh34cbpwbkmrkwc5dcdbt7a7w...@mail.gmail.com
>>> 
>> Content-Type: text/plain; charset=ISO-8859-1
>> 
>> Hi Richard
>> 
>> When a collective call hangs, this usually means that one (or more)
>> processes did not reach this command.
>> Are you sure that all processes reach the allreduce statement?
>> 
>> If something like this happens to me, i insert print statements just
>> before the MPI-call so i can see which processes made
>> it to this point and which ones did not.
>> 
>> Hope this helps a bit
>>  Jody
>> 
>> On Tue, Sep 25, 2012 at 8:20 AM, Richard <
>> codemon...@163.com
>>> wrote:
>>> I have 3 computers with the same Linux system. I have setup the mpi cluster
>>> based on ssh connection.
>>> I have tested a very simple mpi program, it works on the cluster.
>>> 
>>> To make my story clear, I name the three computer as A, B and C.
>>> 
>>> 1) If I run the job with 2 processes on A and B, it works.
>>> 2) if I run the job with 3 processes on A, B and C, it is blocked.
>>> 3) if I run the job with 2 processes on A and C, it works.
>>> 4) If I run the job with all the 3 processes on A, it works.
>>> 
>>> Using gdb I found the line at which it is blocked, it is here
>>> 
>>> #7  0x00002ad8a283043e in PMPI_Allreduce (sendbuf=0x7fff09c7c578,
>>> recvbuf=0x7fff09c7c570, count=1, datatype=0x627180, op=0x627780,
>>> comm=0x627380)
>>>    at pallreduce.c:105
>>> 105         err = comm->c_coll.coll_allreduce(sendbuf, recvbuf, count,
>>> 
>>> It seems that there is a communication problem between some computers. But
>>> the above series of test cannot tell me what
>>> exactly it is. Can anyone help me? thanks.
>>> 
>>> Richard
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> 
>> us...@open-mpi.org
>> 
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to