(Correction; I mixed up the output of the two first examples in my first mail, so it fails on the first one)
ubuntu@node0:~$ mpirun --leave-session-attached -mca plm_base_verbose 5 -np 4 -host node0,node1,node2,node3 hostname [node0:01486] mca:base:select:( plm) Querying component [slurm] [node0:01486] mca:base:select:( plm) Skipping component [slurm]. Query failed to return a module [node0:01486] mca:base:select:( plm) Querying component [rsh] [node0:01486] mca:base:select:( plm) Query of component [rsh] set priority to 10 [node0:01486] mca:base:select:( plm) Selected component [rsh] [node2:26962] mca:base:select:( plm) Querying component [rsh] [node2:26962] mca:base:select:( plm) Query of component [rsh] set priority to 10 [node2:26962] mca:base:select:( plm) Selected component [rsh] [node1:11477] mca:base:select:( plm) Querying component [rsh] [node1:11477] mca:base:select:( plm) Query of component [rsh] set priority to 10 [node1:11477] mca:base:select:( plm) Selected component [rsh] Host key verification failed. ubuntu@node0:~$ mpirun -mca plm_rsh_no_tree_spawn 1 -np 4 -host node0,node1,node2,node3 hostname node0 node1 node2 node3 So it definetely looks like a problem with the tree spawn. Any clue how I could proceed? /Christoffer 2013/11/11 Ralph Castain <r...@open-mpi.org> > Add --enable-debug to your configure and run it with the following > additional options > > --leave-session-attached -mca plm_base_verbose 5 > > Let's see where it fails during the launch phase. Offhand, the only thing > that message means to me is that the ssh keys are botched on at least one > node. Keep in mind that we use a tree-based launch, and so when you have > more than two nodes, one or more of the intermediate nodes are executing an > ssh. > > One way to see if that's the problem is to launch without the tree spawn: > add > > -mca plm_rsh_no_tree_spawn 1 > > to your cmd line and see if it works. > > > > On Nov 10, 2013, at 9:24 AM, Christoffer Hamberg < > christoffer.hamb...@gmail.com> wrote: > > Hi, > > I'm having some strange problems running Open MPI(1.9a1r29559) with Java > bindings on a Calxeda highbank ARM Server running Ubuntu 12.10 (GNU/Linux > 3.5.0-43-highbank armv7l). > > The problem arises when I try to run a job on more than 3 nodes (I have a > total of 8). > Note: It's the same error for any of the node[0-7]. > > ubuntu@node0:~$ mpirun -np 4 -host node0,node1,node2 hostname > Host key verification failed. > > ubuntu@node0:~$ mpirun -np 4 -host node0,node1,node2,node3 hostname > node0 > node0 > node1 > node2 > > and not running the job on the current node also gives Host key > verification failed for only 3 nodes. > > ubuntu@node0:~$ mpirun -np 4 -host node1,node3,node5 hostname > Host key verification failed. > > But not on 2 nodes: > ubuntu@node0:~$ mpirun -np 4 -host node1,node3 hostname > node1 > node1 > node3 > node3 > > I've configured it with the following: > ./configure --prefix=/opt/openmpi-1.9-java --without-openib > --enable-static --with-threads=posix --enable-mpi-thread-multiple > --enable-mpi-java --with-jdk-bindir=/usr/lib/jvm/java-7-openjdk-armhf/bin > --with-jdk-headers=/usr/lib/jvm/java-7-openjdk-armhf/include > > I have Open MPI 1.6.5 (without Java-binding) installed and it runs without > any problems on all nodes, so there should be no problem with SSH that the > error points to. > > Any ideas? > > Regards, > Christoffer > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >