Thanks Chris and Reuti for responses, but I suspect my problem is here:
On 9/10/13 11:55 AM, "John Kloss" <[email protected]> wrote: >Is your submit host multi-homed? The submit host and the queue master are both multi homed. The cluster has a 'internal' network, all the compute nodes sit on this network, and the submit host and queue master both have interfaces on this network. Then there is the 'external' network, the submit host and queue master have interfaces on this network, as well as user desktop machines. One design goal was that desktop machines could eventually be used as submit hosts, so the queue master has to function on both networks. The compute nodes on the 'internal' network only communicate to the external network through the queue master, which runs an IP Masquerade iptables rule. Now that I've explained it, I see what the problem is: The submit host is communicating with the queue master over the external network; the queue master starts the interactive job on the compute node, which tries to contact the submit host at its external address, is getting routed through the queue master and the iptables rule. Qrsh sees a connection that claims to be from node 87, but which has the queue master's IP address. I believe there are some changes I can make on the submit host to fix this specific problem, but I think I may still have trouble with the queue master functioning on both networks. I vaguely recall reading about a configuration option that told OGS about allowable host aliases. Am I misremembering that? Thanks, John >I have had issues where I had a >multi-homed submit host, say, hostA, which connects to two networks >via > >hostA-int -> "grid network" >hostA-ext -> "gateway network" > >Where "gateway network" and "grid network" do not route because >they're isolated from each other. > >And the hostname used by hostA to contact a compute node is hostA-ext. > The compute node can't reach hostA-ext; it can only reach hostA-int. >I had to change the hostname for hostA to hostA-int (under >/etc/hostname or /etc/sysconfig/network or /etc/node, etc.) so that >IP/hostname resolution matched for the "grid network". > >Or, perhaps your submit host local hostname does not match your domain >name lookup mechanism (DNS, NIS, etc.) . That is, your submit host >thinks its name is hostA.localhost and DNS thinks it's >hostA-submit.somenet.com. > >What do you get when you type from the submit host > >hostname > >vs. > >nslookup <submit_hostname> > >? > >Thanks. > > John. >_______________________________________________ >users mailing list >[email protected] >https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
