Thanks Mike for the answer. And BTW, we have a HOWTO related to this:

http://gridscheduler.sourceforge.net/howto/multi_intrfcs.html

Grid Engine is usually quite picky on name resolution. In the past, we
have received a few reports related to multi-NIC servers, 127.0.0.1 &
"localhost" resolution issues, and of course we started the work on
IPv6 a while ago - so there are a few things that Grid Engine needs to
be enhanced related to the commlib (aka Communication Library).

Last we changed something major in the commlib was in 2005 - when Ron
& I added poll(2) support for Linux & Solaris to support more than
1024 nodes (before that you could not have more than ~1000 nodes on a
Linux qmaster, and the workaround was to change the system include
file to extend a hard-coded system limit when you compile SGE - which
most people did not want to do even if they knew the hack). POLL(2)
support was reviewed & enhanced by Christian Reissmann at Sun (now at
the Oracle Grid Engine team - interesting, I sent Christian an email a
few days ago, and Andy & Christian are still at Oracle). In 2009,
Ionel emailed the dev list and wanted to add IPv6 support, and we (ie.
Ionel, Christian, and I) exchanged a few emails related to the IPv6
support. Basically we know the structure of the commlib, and we will
get back to it - but for now, just use the method documented by Mike.
When we are done with the higher priority things, we will fix
non-critical issues that have known and clean workarounds.

To us, if something works for other mission critical systems like LSF
but doesn't in Grid Engine, then it is a bug. Those are on the list of
things that we will add in Open Grid Scheduler/Grid Engine eventually.

Rayson




On Wed, May 9, 2012 at 6:06 PM, Mike Hanby <[email protected]> wrote:
> I have no idea if this is the solution, but we had an issue with Rocks and 
> the head node where the daemon wouldn't start properly due to the private 
> interface being on eth0. I would spit out a message similar to what you 
> posted.
>
> The solution was to create the host_aliases file under default/common:
>
> echo "$(/bin/hostname -s).local $(/bin/hostname --fqdn) $(/bin/hostname -s)" 
> > $SGE_ROOT/default/common/host_aliases
>
> Perhaps something similar needs to be done for the login node since it's 
> multihomed.
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On 
> Behalf Of Joseph Farran
> Sent: Wednesday, May 09, 2012 4:10 PM
> To: [email protected] Users
> Subject: [gridengine users] Installing OGE on Rocks Login Node
>
> Hello.
>
> I have a cluster running Rocks 5.4.3 that I originally setup with 
> Torque/Maui.    I am testing Open Grid Scheduler using the ge2011.11.tar 
> distribution.
>
> I setup OGE on the master head node and was able to also setup 6 compute 
> nodes using "start_gui_installer" on the head node.    All 6 compute nodes 
> were setup without any issues.
>
> All works except that when I tried to setup our login node, I cannot.    The 
> login node has both a private & public network interfaces.   I want to setup 
> our login node "login-node.xxx.uci.edu" as an Executable and Submit node.
>
> When I try to setup our Rocks login node using the private name of login-1-1, 
> it complains with:
>
>     The error message was:
>        error: commlib error: access denied (client IP resolved to host name 
> "login-1-1.local". This is not identical to clients host name   
> "login-node.xxx.uci.edu")
>     ERROR: unable to contact qmaster using port 6444 on host "headnode.local"
>
> So then I try installing OGE using the public name of  
> "login-node.xxx.uci.edu" and it also complains.   As soon as I enter 
> "login-node.xxx.uci.edu" the state column turns red with "Resolvable" and the 
> "Install" GUI button is greyed out so I cannot continue.
>
> Looks like OGE is confused about the actual fully qualified name of our login 
> node.   The FQN is "login-node.xxx.uci.edu" but neither name seems to work.
>
> What is the correct why to get around this?
>
> Joseph
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
>
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to