A... thanks Gilles. That makes sense. I was stuck thinking there was
an ssh problem on rank 0; it never occurred to me mpirun was doing
something clever there and that those ssh errors were from a different
instance altogether.
It's no problem to put my private key on all instances - I'll go
Adam,
by default, when more than 64 hosts are involved, mpirun uses a tree
spawn in order to remote launch the orted daemons.
That means you have two options here :
- allow all compute nodes to ssh each other (e.g. the ssh private key
of *all* the nodes should be in *all* the authorized_keys
-
I'm running OpenMPI 2.1.0, built from source, on RHEL 7. I'm using the
default ssh-based launcher, where I have my private ssh key on rank 0 and
the associated public key on all ranks. I create a hosts file with a list
of unique IPs, with the host that I'm running mpirun from on the first
line, a
William,
On a typical HPC cluster, the internal interface is not protected by
the firewall.
If this is eth0, then you can
mpirun --mca oob_tcp_if_include eth0 --mca btl_tcp_if_include eth0 ...
If only a small range of port is available, then you will also need to use the
oob_tcp_dynamic_ipv4_po
Thanks, George. My sysadmin now says he is pretty sure it is the firewall,
but that "isn't going to change" so we need to find a solution.
On 9 February 2018 at 16:58, George Bosilca wrote:
> What are the settings of the firewall on your 2 nodes ?
>
> George.
>
>
>
> On Fri, Feb 9, 2018 at 3: