Thanks Mike & Rayson.

I will investigate this.

Joseph

On 05/09/2012 10:19 PM, Rayson Ho wrote:
Thanks Mike for the answer. And BTW, we have a HOWTO related to this:

http://gridscheduler.sourceforge.net/howto/multi_intrfcs.html

Grid Engine is usually quite picky on name resolution. In the past, we
have received a few reports related to multi-NIC servers, 127.0.0.1&
"localhost" resolution issues, and of course we started the work on
IPv6 a while ago - so there are a few things that Grid Engine needs to
be enhanced related to the commlib (aka Communication Library).

Last we changed something major in the commlib was in 2005 - when Ron
&  I added poll(2) support for Linux&  Solaris to support more than
1024 nodes (before that you could not have more than ~1000 nodes on a
Linux qmaster, and the workaround was to change the system include
file to extend a hard-coded system limit when you compile SGE - which
most people did not want to do even if they knew the hack). POLL(2)
support was reviewed&  enhanced by Christian Reissmann at Sun (now at
the Oracle Grid Engine team - interesting, I sent Christian an email a
few days ago, and Andy&  Christian are still at Oracle). In 2009,
Ionel emailed the dev list and wanted to add IPv6 support, and we (ie.
Ionel, Christian, and I) exchanged a few emails related to the IPv6
support. Basically we know the structure of the commlib, and we will
get back to it - but for now, just use the method documented by Mike.
When we are done with the higher priority things, we will fix
non-critical issues that have known and clean workarounds.

To us, if something works for other mission critical systems like LSF
but doesn't in Grid Engine, then it is a bug. Those are on the list of
things that we will add in Open Grid Scheduler/Grid Engine eventually.

Rayson




On Wed, May 9, 2012 at 6:06 PM, Mike Hanby<[email protected]>  wrote:
I have no idea if this is the solution, but we had an issue with Rocks and the 
head node where the daemon wouldn't start properly due to the private interface 
being on eth0. I would spit out a message similar to what you posted.

The solution was to create the host_aliases file under default/common:

echo "$(/bin/hostname -s).local $(/bin/hostname --fqdn) $(/bin/hostname -s)">  
$SGE_ROOT/default/common/host_aliases

Perhaps something similar needs to be done for the login node since it's 
multihomed.

-----Original Message-----
From: [email protected] [mailto:[email protected]] On 
Behalf Of Joseph Farran
Sent: Wednesday, May 09, 2012 4:10 PM
To: [email protected] Users
Subject: [gridengine users] Installing OGE on Rocks Login Node

Hello.

I have a cluster running Rocks 5.4.3 that I originally setup with Torque/Maui.  
  I am testing Open Grid Scheduler using the ge2011.11.tar distribution.

I setup OGE on the master head node and was able to also setup 6 compute nodes using 
"start_gui_installer" on the head node.    All 6 compute nodes were setup 
without any issues.

All works except that when I tried to setup our login node, I cannot.    The login node has 
both a private&  public network interfaces.   I want to setup our login node 
"login-node.xxx.uci.edu" as an Executable and Submit node.

When I try to setup our Rocks login node using the private name of login-1-1, 
it complains with:

     The error message was:
        error: commlib error: access denied (client IP resolved to host name 
"login-1-1.local". This is not identical to clients host name   
"login-node.xxx.uci.edu")
     ERROR: unable to contact qmaster using port 6444 on host "headnode.local"

So then I try installing OGE using the public name of  "login-node.xxx.uci.edu" and it also complains.   As 
soon as I enter "login-node.xxx.uci.edu" the state column turns red with "Resolvable" and the 
"Install" GUI button is greyed out so I cannot continue.

Looks like OGE is confused about the actual fully qualified name of our login node.   The 
FQN is "login-node.xxx.uci.edu" but neither name seems to work.

What is the correct why to get around this?

Joseph
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to