My apologies for not changing the subject to something suitable just then.

Thankyou for that. I have not yet been able to get the IT department to help me 
with disabling the firewalls, but hopefully that is the problem. Sorry for the 
late response, I was hoping the IT department would be faster.

Robertson

Message: 2
List-Post: users@lists.open-mpi.org
Date: Fri, 6 Feb 2009 17:27:34 -0500
From: Jeff Squyres <jsquy...@cisco.com>
Subject: Re: [OMPI users] OpenMPI hangs across multiple nodes.
To: Open MPI Users <us...@open-mpi.org>
Message-ID: <8ba0e4a5-fa7c-430b-8731-231ed6e67...@cisco.com>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes

Open MPI requires that there be no TCP firewall between hosts that are  
used in a single parallel job -- it uses random TCP ports between peers.


On Feb 5, 2009, at 2:39 AM, Robertson Burgess wrote:

> I have checked with IT. It is TCP. I have been told that there's a  
> firewall on the nodes. Should I open some ports on the firewall, and  
> if so, which ones?
>
> Robertson
>
>>>> Robertson Burgess 5/02/2009 5:09 pm >>>
> Thankyou for your help.
> I tried the command
> mpirun -np 4 -host node1,node2 -mca btl tcp,self random
> but still got the same result.
>
> I'm pretty sure that the communication between the nodes is TCP but  
> I'm not sure, I've emailedIT support to ask them, but am yet to hear  
> back from them.
> Other than that I'm running the latest release of OMPI (1.3) and I  
> installed it on both nodes. And yes they are in the same absolute  
> paths.
> My configuration was very standard:
>
> shell$ gunzip -c openmpi-1.3.tar.gz | tar xf -
> shell$ cd openmpi-1.3
> shell$  ./configure CC=icc CXX=icpc F77=ifort FC=ifort --prefix=/ 
> home/bburgess/bin/bin
> shell$ make all install
>
> Again thankyou for your help, I'll have to investigate whether my  
> assumption about my connections being TCP are correct. When I was  
> setting it up at first, and before I'd configured the nodes to log  
> into each other without a password, I did get the message
>
> user@ node.newcastle.edu.au's password:
>
> In my log files, so it did at least seem to be reaching the other  
> node. Does that mean that my connections are working, or could it be  
> more to it than that?
>
> Robertson Burgess
>
>
> Message: 2
> Date: Wed, 4 Feb 2009 15:37:44 +0200
> From: Lenny Verkhovsky <lenny.verkhov...@gmail.com>
> Subject: Re: [OMPI users] OpenMPI hangs across multiple nodes.
> To: Open MPI Users <us...@open-mpi.org>
> Message-ID:
>       <453d39990902040537o45137abbh2f12db423d971...@mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> what kind of communication between nodes do you have - tcp, openib (
> IB/IWARP ) ?
> you can try
>
> mpirun -np 4 -host node1,node2 -mca btl tcp,self random
>
>
>
> On Wed, Feb 4, 2009 at 1:21 AM, Ralph Castain <r...@lanl.gov> wrote:
>> Could you tell us which version of OpenMPI you are using, and how  
>> it was
>> configured?
>>
>> Did you install the OMPI libraries and binaries on both nodes? Are  
>> they in
>> the same absolute path locations?
>>
>> Thanks
>> Ralph
>>
>>
>> On Feb 3, 2009, at 3:46 PM, Robertson Burgess wrote:
>>
>>> Dear users,
>>> I am quite new to OpenMPI, I have compiled it on two nodes, each  
>>> node with
>>> 8 CPU cores. The two nodes are identical. The code I am using  
>>> works in
>>> parallel across the 8 cores on a single node. However, whenever I  
>>> try to run
>>> across both nodes, OpenMPI simply hangs. There is no output  
>>> whatsoever, when
>>> I run it in background, outputting to a log file, the log file is  
>>> always
>>> empty. The cores do not appear to be doing anything at all, either  
>>> on the
>>> host node or on the remote node. This happens whether I am running  
>>> my code,
>>> or even if I when I tell it to run a process that doesn't even  
>>> exist, for
>>> instance
>>>
>>> mpirun -np 4 -host node1,node2 random
>>>
>>> Simply results in the terminal hanging, so all I can do is close the
>>> terminal and open up a new one.
>>>
>>> mpirun -np 4 -host node1,node2 random >& log.log &
>>>
>>> simply produces and empty log.log file
>>>
>>> I am running Redhat Linux on the systems, and compiled OpenMPI  
>>> with the
>>> Intel Compilers 10.1. As I've said, it works fine on one node. I  
>>> have set up
>>> both nodes such that they can log into each other via ssh without  
>>> the need
>>> for a password, and I have altered my .bashrc file so the PATH and
>>> LD_LIBRARY_PATH include the appropriate folders.
>>> I have looked through the FAQ and mailing lists, but I was unable  
>>> to find
>>> anything that really matched my problem. Any help would be greatly
>>> appreciated.
>>>
>>> Sincerely,
>>> Robertson Burgess
>>> University of Newcastle
>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org 
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users 
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org 
>> http://www.open-mpi.org/mailman/listinfo.cgi/users 
>>
>
> **************************************
> _______________________________________________
> users mailing list
> us...@open-mpi.org 
> http://www.open-mpi.org/mailman/listinfo.cgi/users 


-- 
Jeff Squyres
Cisco Systems



------------------------------


Reply via email to