Hi, Am 10.09.2013 um 21:38 schrieb Burian, John:
> Thanks Chris and Reuti for responses, but I suspect my problem is here: > > > On 9/10/13 11:55 AM, "John Kloss" <[email protected]> wrote: > >> Is your submit host multi-homed? > > The submit host and the queue master are both multi homed. The cluster has > a 'internal' network, all the compute nodes sit on this network, and the > submit host and queue master both have interfaces on this network. Then > there is the 'external' network, the submit host and queue master have > interfaces on this network, as well as user desktop machines. One design > goal was that desktop machines could eventually be used as submit hosts, > so the queue master has to function on both networks. It should be set up to run on the internal network only. (In case you want to use desktop machines which are only connected to the external network as submit hosts and need it for this purpose: do they have the /home mounted like the exechosts too?) > The compute nodes on > the 'internal' network only communicate to the external network through > the queue master, which runs an IP Masquerade iptables rule. What is the reason for the nodes to communicate with the outside world? > Now that I've explained it, I see what the problem is: The submit host is > communicating with the queue master over the external network; the queue > master starts the interactive job on the compute node, which tries to > contact the submit host at its external address, is getting routed > through the queue master and the iptables rule. Qrsh sees a connection > that claims to be from node 87, but which has the queue master's IP > address. Yep, you can easily change this by running the submit host's SGE access also on the internal network (and only there). Please have look at `man host_aliases` and http://arc.liv.ac.uk/SGE/howto/multi_intrfcs.html -- Reuti > I believe there are some changes I can make on the submit host to fix this > specific problem, but I think I may still have trouble with the queue > master functioning on both networks. I vaguely recall reading about a > configuration option that told OGS about allowable host aliases. Am I > misremembering that? > > Thanks, > John > > > >> I have had issues where I had a >> multi-homed submit host, say, hostA, which connects to two networks >> via >> >> hostA-int -> "grid network" >> hostA-ext -> "gateway network" >> >> Where "gateway network" and "grid network" do not route because >> they're isolated from each other. >> >> And the hostname used by hostA to contact a compute node is hostA-ext. >> The compute node can't reach hostA-ext; it can only reach hostA-int. >> I had to change the hostname for hostA to hostA-int (under >> /etc/hostname or /etc/sysconfig/network or /etc/node, etc.) so that >> IP/hostname resolution matched for the "grid network". >> >> Or, perhaps your submit host local hostname does not match your domain >> name lookup mechanism (DNS, NIS, etc.) . That is, your submit host >> thinks its name is hostA.localhost and DNS thinks it's >> hostA-submit.somenet.com. >> >> What do you get when you type from the submit host >> >> hostname >> >> vs. >> >> nslookup <submit_hostname> >> >> ? >> >> Thanks. >> >> John. >> _______________________________________________ >> users mailing list >> [email protected] >> https://gridengine.org/mailman/listinfo/users > > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
