Hi,
Am 10.09.2013 um 17:00 schrieb Burian, John:
> I have an OGS 2011.11p1 cluster. The primary submit host is a separate
> machine from the queue master. When I try to use qrsh from the submit node, I
> get a commlib error (Levi-Montalcini01 is the queue master, Levi-Montalcini86
> is a compute node):
>
> $ qrsh -verbose
> Your job 590725 ("QRLOGIN") has been submitted
> waiting for interactive job to be scheduled ...
> Your interactive job 590725 has been successfully scheduled.
> Establishing builtin session to host Levi-Montalcini86 ...
> error: commlib error: local host name error (IP based host name resolving
> "Levi-Montalcini01" doesn't match client host name from connect message
> "Levi-Montalcini86")
> $
>
> When I use qrsh from the queue master, it works fine:
>
> $ qrsh -verbose
> Your job 590750 ("QRLOGIN") has been submitted
> waiting for interactive job to be scheduled ...
> Your interactive job 590750 has been successfully scheduled.
> Establishing builtin session to host Levi-Montalcini88 ...
> Levi-Montalcini88|~>
>
> During the failed attempt, I see traffic from the compute node back to the
> queue master, but no traffic to the submit node from either the queue master
> or the compute node. Is qrsh from a separate submit node expected to work?
Yes, as long as there is a direct connection between the submit host and the
exechost (or a proper forwarding between them).
Do the Levi-Montalcini01 and Levi-Montalcini86 resolve to the same TCP/IP
address? Why are there different names?
-- Reuti
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users