Hello everyone, I have trouble with the Gridengine integration of openmpi. When I run a job with only 4 processes, it runs fine. With more processes, mpirun sometimes fails to connect to the remote nodes, the qrsh calls fail.
I'll attach a job script and the error output. As you can see from the 'for' loop, I can connect to all nodes just fine, it is the qrsh executed by mpirun that fails. Qrsh was configured to run ssh with kerberos authentification (ssh -tt -o GSSAPIDelegateCredentials=no). My versions are openmpi 1.2.2, SGE 6.0u9, RHEL5. Any idea where the problem could be? Regards, Götz Waschk -- AL I:40: Do what thou wilt shall be the whole of the Law.
openmpi.job
Description: Binary data
openmpi.job.e6205663
Description: Binary data
openmpi.job.o6205663
Description: Binary data