Gus,

Am 13.11.2014 um 02:59 schrieb Gus Correa:

> On 11/12/2014 05:45 PM, Reuti wrote:
>> Am 12.11.2014 um 17:27 schrieb Reuti:
>> 
>>> Am 11.11.2014 um 02:25 schrieb Ralph Castain:
>>> 
>>>> Another thing you can do is (a) ensure you built with —enable-debug,
>> and then (b) run it with -mca oob_base_verbose 100
>> (without the tcp_if_include option) so we can watch
>> the connection handshake and see what it is doing.
>> The —hetero-nodes will have not affect here and can be ignored.
>>> 
>>> Done. It really tries to connect to the outside
>> interface of the headnode. But being there a firewall or not:
>> the nodes have no clue how to reach 137.248.0.0 -
>> they have no gateway to this network at all.
>> 
>> I have to revert this.
>> They think that there is a gateway although it isn't.
>> When I remove the entry by hand for the gateway in the
>> routing table it starts up instantly too.
>> 
>> While I can do this on my own cluster I still have the
>> 30 seconds delay on a cluster where I'm not root,
>> while this can be because of the firewall there.
>> The gateway on this cluster is indeed going
>> to the outside world.
>> 
>> Personally I find this behavior a little bit too aggressive
>> to use all interfaces. If you don't check this carefully
>> beforehand and start a long running application one might
>> even not notice the delay during the startup.
>> 
>> -- Reuti
>> 
> 
> Hi Reuti
> 
> You could use the mca parameter file
> (say, $prefix/etc/openmpi-mca-params.conf) to configure cluster-wide
> the oob (and btl) interfaces to be used.
> The users can still override your choices if they want.
> 
> Just put a line like this in openmpi-mca-params.conf :
> oob_tcp_if_include=192.168.154.0/26
> 
> (and similar for btl_tcp_if_include, btl_openib_if_include).
> 
> Get a full list from "ompi_info --all --all |grep if_include".
> 
> See these FAQ:
> 
> http://www.open-mpi.org/faq/?category=tcp#tcp-selection
> http://www.open-mpi.org/faq/?category=tuning#setting-mca-params
> 
> Compute nodes tend to be multi-homed, so what criterion would OMPI use
> to select one interface among many,

My compute nodes are having two interfaces: one for MPI (and the low ssh/SGE 
traffic to start processes somewhere) and one for NFS to transfer files from/to 
the file server. So: Open MPI may use both interfaces without telling me 
anything about it? How will it split the traffic? 50%/50%? When there is a 
heavy file transfer on the NFS interface: might it hurt Open MPI's 
communication or will it balance the usage on-the-fly?

When I prepare a machinefile with the name of the interfaces (or get the names 
from SGE's PE_HOSTFILE) it should use just this (except native IB), and not 
looking around for other paths to the other machine(s) (IMO). Therefore 
different interfaces have different names in my setup. "node01" is just eth0 
and different from "node01-nfs" for eth1.


> not knowing beforehand what exists in a particular computer?
> There would be a risk to make a bad choice.
> The current approach gives you everything, and you
> pick/select/restrict what you want to fit your needs,
> with mca parameters (which can be set in several
> ways and with various scopes).
> 
> I don't think this bad.
> However, I am biased about this.
> I like and use the openmpi-mca-params.conf file
> to setup sensible defaults.
> At least I think they are sensible. :)

I see that this can be prepared for all users this way. Whenever they use my 
installed version it will work - maybe I have to investigate on some other 
clusters where I'm not root what to enter there, but it can be done for sure.

BUT: it may be a rare situation that a group for quantum chemistry is having a 
sysadmin on their own taking care of the clusters and the well behaving 
operation of the installed software, being it applications or libraries. Often 
any PhD student in other groups will get a side project: please install 
software XY for the group. They are chemists and want to get the software 
running - they are no experts of Open MPI*. They don't care for a tight 
integration or using the correct interfaces as long as the application delivers 
the results in the end. For example: ORCA**. It's necessary for the users of 
the software to install a shared library of Open MPI in a specific version. I 
see in the ORCA*** forum that many struggle with it to compile a shared library 
version of Open MPI and have access to it during execution, i.e. how to set 
LD_LIBRARY_PATH that it's known on the slaves. The cluster admins are in 
another department and refuse to make any special arrangements for a single 
group sometimes. And as ORCA calls `mpiexec` several times during one job, the 
delay could occur several times.

On some other clusters that we have access to, the admins prepare Open MPI 
installations accessible by `modules`. But often not for the required 
combination of Open MPI and compiler type and version which is needed. If a 
software vendor suggests to use compiler X in version Y it's the best to follow 
that approach as it will generate less issues which might need to be 
investigated - i.e. numerical variations as different compilers optimize in a 
different way. Hence you end up to compile the necessary Open MPI on your own 
again and set again sensible defaults as you lay out above.

Continued in 2nd email...

-- Reuti

*) Sure, there are exceptions and experts too - I don't intend to offend anyone 
by this statement. But I speak for the groups of QC I have had contact to in 
the last couple of years.

**) http://www.cec.mpg.de/forum/portal.php

***) The current ORCA needs 1.6.5, but it may change in one point in the future.



> Cheers,
> Gus Correa
> 
>> 
>>> It tries so independent from the internal or external name of the headnode
> given in the machinefile - I hit ^C then.
> I attached the output of Open MPI 1.8.1 for this setup too.
>>> 
>>> -- Reuti
>>> 
>>> <openmpi1.8.3.txt><openmpi1.8.1.txt>_______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2014/11/25777.php
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/11/25781.php
>> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/11/25784.php
> 

Reply via email to