Am 13.11.2014 um 00:34 schrieb Ralph Castain:

>> On Nov 12, 2014, at 2:45 PM, Reuti <re...@staff.uni-marburg.de> wrote:
>> 
>> Am 12.11.2014 um 17:27 schrieb Reuti:
>> 
>>> Am 11.11.2014 um 02:25 schrieb Ralph Castain:
>>> 
>>>> Another thing you can do is (a) ensure you built with —enable-debug, and 
>>>> then (b) run it with -mca oob_base_verbose 100  (without the 
>>>> tcp_if_include option) so we can watch the connection handshake and see 
>>>> what it is doing. The —hetero-nodes will have not affect here and can be 
>>>> ignored.
>>> 
>>> Done. It really tries to connect to the outside interface of the headnode. 
>>> But being there a firewall or not: the nodes have no clue how to reach 
>>> 137.248.0.0 - they have no gateway to this network at all.
>> 
>> I have to revert this. They think that there is a gateway although it isn't. 
>> When I remove the entry by hand for the gateway in the routing table it 
>> starts up instantly too.
>> 
>> While I can do this on my own cluster I still have the 30 seconds delay on a 
>> cluster where I'm not root, while this can be because of the firewall there. 
>> The gateway on this cluster is indeed going to the outside world.
>> 
>> Personally I find this behavior a little bit too aggressive to use all 
>> interfaces. If you don't check this carefully beforehand and start a long 
>> running application one might even not notice the delay during the startup.
> 
> Agreed - do you have any suggestions on how we should choose the order in 
> which to try them? I haven’t been able to come up with anything yet. Jeff has 
> some fancy algo in his usnic BTL that we are going to discuss after SC that 
> I’m hoping will help, but I’d be open to doing something better in the 
> interim for 1.8.4

The plain`mpiexec` should just use the specified interface it finds in the 
hostfile. Being it hand crafted or prepared by any queuing system.


Option: could a single entry for a machine in the hostfile contain a list of 
interfaces? I mean something like:

node01,node01-extra-eth1,node01-extra-eth2 slots=4

or

node01* slots=4

Means: use exactly these interfaces or even try to find all available 
interfaces on/between the machines.

In case all interfaces have the same name, then it's up to the admin to correct 
this.

-- Reuti


>> -- Reuti
>> 
>> 
>>> It tries so independent from the internal or external name of the headnode 
>>> given in the machinefile - I hit ^C then. I attached the output of Open MPI 
>>> 1.8.1 for this setup too.
>>> 
>>> -- Reuti
>>> 
>>> <openmpi1.8.3.txt><openmpi1.8.1.txt>_______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2014/11/25777.php
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/11/25781.php
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/11/25782.php
> 

Reply via email to