Reuti wrote:
Am 15.01.2009 um 16:20 schrieb Jeff Dusenberry:
I'm trying to launch multiple xterms under OpenMPI 1.2.8 and the SGE
job scheduler for purposes of running a serial debugger. I'm
experiencing file-locking problems on the .Xauthority file.
I tried to fix this by asking for a delay between successive launches,
to reduce the chances of contention for the lock by:
~$ qrsh -pe mpi 4 -P CIS /share/apps/openmpi/bin/mpiexec --mca
pls_rsh_debug 1 --mca pls_rsh_delay 5 xterm
The 'pls_rsh_delay 5' parameter seems to have no effect. I tried
replacing 'pls_rsh_debug 1' with 'orte_debug 1', which gave me
additional debugging output, but didn't fix the file locking problem.
Sometimes the above commands will work and I will get all 4 xterms,
but more often I will get an error:
/usr/bin/X11/xauth: error in locking authority file
/export/home/duse/.Xauthority
followed by
X11 connection rejected because of wrong authentication.
xterm Xt error: Can't open display: localhost:11.0
and one or more of the xterms will fail to open.
Am I missing something? Is there another debug flag I need to set?
Any suggestions for a better way to do this would be appreciated.
You are right that it's neither Open MPI's, nor SGE's fault, but a race
condition in the SSH startup. You defined SSH with X11 forwarding in SGE
(qconf -mconf) - right? Then you have first a ssh connection from your
workstation to the login-machine. Then from the login-machine to the
node where the mpiexec runs. And then one for each slave node (means an
additonal one on the machine where mpiexec is already executed).
Yes, that's all correct. Clearly not very efficient, but I haven't had
any luck getting xauth or xhost to work more directly.
Although it might be possible to give every started sshd an unique
.Xauthority file, it's not straight forward to implement due to SGE's
startup of the daemons and you would need a sophisticated ~/.ssh/rc to
create the files at different location and use it in the forthcoming xterm.
Thanks, that helped a lot, but I still can't quite get it to work. I do
want the xterms to run mpi jobs. I tried this sshrc script (modified
from the sshd man page):
XAUTHORITY=/local/$USER/.Xauthority${SSH_TTY##*/}
export XAUTHORITY
if read proto cookie && [ -n "$DISPLAY" ]; then
if [ `echo $DISPLAY | cut -c1-10` = 'localhost:' ]; then
# X11UseLocalhost=yes
echo add unix:`echo $DISPLAY | cut -c11-` $proto $cookie
else
# X11UseLocalhost=no
echo add $DISPLAY $proto $cookie
fi | xauth -q -
fi
and I am successful in creating a unique .Xauthority for each process
locally on each node when I log in via ssh directly. Unfortunately, I
do have to provide another definition of XAUTHORITY somewhere in my
startup scripts - the one above does not get seen outside of the sshrc
execution.
When I try to run this under qrsh/mpiexec, it acts as if it doesn't have
the SSH_TTY environment variable (is that due to SGE?), and we're back
to a race condition. Is there another variable I can use in the sge/mpi
context? I also don't understand where I would define the XAUTHORITY
variable when running under mpiexec.
I'm not sure this is the best way to approach this - I was originally
hoping that the mpiexec call would have a way to introduce a delay
between successive launches but that doesn't seem to be working either.
Jeff