Adam, by default, when more than 64 hosts are involved, mpirun uses a tree spawn in order to remote launch the orted daemons.
That means you have two options here : - allow all compute nodes to ssh each other (e.g. the ssh private key of *all* the nodes should be in *all* the authorized_keys - do not use a tree spawn (e.g. mpirun --mca plm_rsh_no_tree_spawn true ...) I recommend the first option, otherwise mpirun would fork&exec a large number of ssh processes and hence use quite a lot of resources on the node running mpirun. Cheers, Gilles On Tue, Feb 13, 2018 at 8:23 AM, Adam Sylvester <op8...@gmail.com> wrote: > I'm running OpenMPI 2.1.0, built from source, on RHEL 7. I'm using the > default ssh-based launcher, where I have my private ssh key on rank 0 and > the associated public key on all ranks. I create a hosts file with a list > of unique IPs, with the host that I'm running mpirun from on the first line, > and run this command: > > mpirun -N 1 --bind-to none --hostfile hosts.txt hostname > > This works fine up to 64 machines. At 65 or greater, I get ssh errors. > Frequently > > Permission denied (publickey,gssapi-keyex,gssapi-with-mic) > > though today another user got > > Host key verification failed. > > I have confirmed I can successfully manually ssh into these instances. I've > also written a loop in bash which will background an ssh sleep command to > > 64 instances and this succeeds. > > From what I can tell, the /etc/ssh/ssh*config settings that limit ssh > connections have to do with inbound, not outbound limits, and I can prove by > running straight ssh commands that I'm not hitting a limit. > > Is there something wrong with my mpirun syntax (I've run this way thousands > of times without issues with fewer than 64 hosts, and I know MPI is > frequently used on orders of magnitudes more hosts than this)? Or is this a > known bug that's addressed in a later MPI release? > > Thanks for the help. > -Adam > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users