I just compiled openmpi-2.0.0 on my own and it looks like a regression to use `ssh` although it's running under SGE. Also for `mpicc` it was necessary to supply "-ldl" to succeed, this wasn't necessary in former versions.
I'll look into it. For now I think it's best to stay with 1.10.3. Note that after 1.6.5 they do a core binding (bad in case if several Open MPI jobs are running on one and the same node, as all will use core 0 upwards) and check the network topology. If it's set up with dead routes/interfaces (which normally won't matter), the startup of the parallel job may be delayed by one to two minutes (until they face a timeout). -- Reuti > Am 10.08.2016 um 21:15 schrieb Ulrich Hiller <[email protected]>: > > Hello, > > My problem: How can i make gridengine not to use ssh? > > Installed: > openmpi-2.0.0 - configured with sge support. > gridengine (son of gridengine) 8.1.9-1 > > I have a simple openmpi program 'teste' which only gives "hello world" > output. > I start it with: > qsub -pe orte 160 -V -j yes -cwd -S /bin/bash <<< "mpiexec -n 160 teste >>> /home/ljohndoe/out.dat" > on the master node. > I get back the error: > > Host key verification failed. > Host key verification failed. > Permission denied, please try again. > Permission denied, please try again. > Received disconnect from 192.168.117.6: 2: Too many authentication > failures for johndoe > Permission denied, please try again. > Permission denied, please try again. > Received disconnect from 192.168.117.5: 2: Too many authentication > failures for johndoe > [...] > > When i configure a passwordless ssh login to the execute nodes > (exchanging the ssh key from master with 'ssh-copy-id), it works like > charm. So it obviuously uses ssh connection to the execute nodes. > > the output of 'qconf -sconf' contains: > > login_shells sh,bash,ksh,csh,tcsh > qlogin_command builtin > qlogin_daemon builtin > rlogin_command builtin > rlogin_daemon builtin > rsh_command builtin > rsh_daemon builtin > > (as far as i read this was the problem of a thread some time ago in this > list. But i seem to have the correct values) > > So everything should be fine- or not? > Also with > qlogin -l 'h=exec01' > and > qrsh -l 'h=exec01' > i can go without problems to the first node.(called exec01), and i can > also login to all other execute nodes as well. > > Is there anywhere another 'switch' where i can let qsub run _not_ over ssh? > > If is is of interest, the output of 'qconf -sp orte' is: > pe_name orte > slots 9999999 > user_lists NONE > xuser_lists NONE > start_proc_args NONE > stop_proc_args NONE > allocation_rule $round_robin > control_slaves FALSE > job_is_first_task TRUE > urgency_slots min > accounting_summary FALSE > qsort_args NONE > > Also, i do not have any ssh lines in ~/.profile or ~/.bashrc > > > Kind regards, ulrich > > > > > > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
