I have an OGS 2011.11p1 cluster. The primary submit host is a separate machine
from the queue master. When I try to use qrsh from the submit node, I get a
commlib error (Levi-Montalcini01 is the queue master, Levi-Montalcini86 is a
compute node):
$ qrsh -verbose
Your job 590725 ("QRLOGIN") has been submitted
waiting for interactive job to be scheduled ...
Your interactive job 590725 has been successfully scheduled.
Establishing builtin session to host Levi-Montalcini86 ...
error: commlib error: local host name error (IP based host name resolving
"Levi-Montalcini01" doesn't match client host name from connect message
"Levi-Montalcini86")
$
When I use qrsh from the queue master, it works fine:
$ qrsh -verbose
Your job 590750 ("QRLOGIN") has been submitted
waiting for interactive job to be scheduled ...
Your interactive job 590750 has been successfully scheduled.
Establishing builtin session to host Levi-Montalcini88 ...
Levi-Montalcini88|~>
During the failed attempt, I see traffic from the compute node back to the
queue master, but no traffic to the submit node from either the queue master or
the compute node. Is qrsh from a separate submit node expected to work? Thanks,
John
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users