A... thanks Gilles. That makes sense. I was stuck thinking there was
an ssh problem on rank 0; it never occurred to me mpirun was doing
something clever there and that those ssh errors were from a different
instance altogether.
It's no problem to put my private key on all instances - I'll go
Adam,
by default, when more than 64 hosts are involved, mpirun uses a tree
spawn in order to remote launch the orted daemons.
That means you have two options here :
- allow all compute nodes to ssh each other (e.g. the ssh private key
of *all* the nodes should be in *all* the authorized_keys
-
I'm running OpenMPI 2.1.0, built from source, on RHEL 7. I'm using the
default ssh-based launcher, where I have my private ssh key on rank 0 and
the associated public key on all ranks. I create a hosts file with a list
of unique IPs, with the host that I'm running mpirun from on the first
line, a