On Thu, Jan 12, 2012 at 10:36 PM, Rayson Ho <[email protected]>wrote:

> Good! That means qconf is waiting for the master's response but not
> getting it.
>
> If an IP filter or firewall is configured on that node, then it is
> very likely to be the cause. Make sure that firewalls are turned off
> or configured properly... I used to use sniffers like TCP dump to
> debug issues like this, but I have not used sniffers for a long while.
>

TCP dumps are attached. The Centos5 dump is from the WORKING client, while
the Centos4 dump is from the failing client. Both of these are virtual
machines that are hosted on the same Centos 6 host using KVM.

"sgemaster" is the name of the machine that is both the qmaster and the nfs
server for my installation of SGE.
"H1-C4-64-1" is the Centos 4/64-bit client (not working), "H1-C5-64-2" is
the Centos 5/64-bit client (working).

Simon


> Rayson
>
>
>
> On Fri, Jan 13, 2012 at 1:25 AM, Simon Matthews
> <[email protected]> wrote:
> >
> >
> > On Thu, Jan 12, 2012 at 10:15 PM, Rayson Ho <[email protected]>
> > wrote:
> >>
> >> Does it hang when you issue the qconf command on that node, or does it
> >> return the error message immediately??
> >
> >
> > It hangs. I see the message either after it times out or if I kill it.
> >
> > Simon
> >>
> >>
> >> Rayson
> >>
> >>
> >> On Fri, Jan 13, 2012 at 1:00 AM, Simon Matthews
> >> <[email protected]> wrote:
> >> > I am running the same version. I have one installation tree that is
> NFS
> >> > mounted. All clients use the same binaries.
> >> >
> >> > I had wanted to move to 6.2U5, but I can't find a source to download
> it.
> >> >
> >> > Simon
> >> >
> >> >
> >> > On Thu, Jan 12, 2012 at 9:50 PM, Rayson Ho <[email protected]>
> >> > wrote:
> >> >>
> >> >> On Fri, Jan 13, 2012 at 12:02 AM, Simon Matthews
> >> >> <[email protected]> wrote:
> >> >> > I have an installation of SGE 6.2U4 that I downloaded some years
> ago
> >> >> > that I
> >> >> > have installed on a couple of qmaster hosts.
> >> >>
> >> >> Are you using the same version of SGE (SGE 6.2u4) on both the qmaster
> >> >> & the node? You can run "qconf -help | head -1" on both the master &
> >> >> the node to show the version.
> >> >>
> >> >> Most common GDI errors are due to mismatching versions of Grid Engine
> >> >> - and if you are running the same version of SGE, then let us know, I
> >> >> will dig the code to see what can possibly go wrong.
> >> >>
> >> >> And don't worry about still using the Sun binaries, I work with sites
> >> >> that have even older versions of Grid Engine. Sun contributed the
> code
> >> >> to open source, and without Sun we wouldn't have this community.
> >> >>
> >> >> Rayson
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> > I hope that I do not offend the users of this list by asking for
> help
> >> >> > using
> >> >> > a binary installation, using binaries built by Sun.
> >> >> >
> >> >> > I hope that someone can shed some light on the problem.
> >> >> >
> >> >> > I have built some new virtualized clients using KVM on a Centos 6
> >> >> > host.
> >> >> > The
> >> >> > Centos 5 client seems to work properly, but the Centos 4 client
> does
> >> >> > not. I
> >> >> > need a Centos 4 execd for testing purposes.
> >> >> >
> >> >> > I cannot install sge_execd, because of the qconf problems.
> >> >> >
> >> >> > qconf -sh results in:
> >> >> >
> >> >> > qconf -sh
> >> >> > ERROR: failed receiving gdi request response for mid=1 (got no
> >> >> > message).
> >> >> >
> >> >> > I get this message if I try this client against the new cluster
> and a
> >> >> > cluster that has been running for several years. Other Centos 4
> >> >> > clients
> >> >> > can
> >> >> > run "qconf -ch" against both clusters without problem.
> >> >> >
> >> >> > qping works from the problematic client:
> >> >> >  qping -info sgemaster 6444 qmaster 1
> >> >> > 01/12/2012 20:59:38:
> >> >> > SIRM version:             0.1
> >> >> > SIRM message id:          1
> >> >> > start time:               01/12/2012 16:31:57 (1326414717)
> >> >> > run time [s]:             16052
> >> >> > messages in read buffer:  0
> >> >> > messages in write buffer: 0
> >> >> > nr. of connected clients: 2
> >> >> > status:                   2
> >> >> > info:                     MAIN: E (16052.50) | signaler000: E
> >> >> > (16052.05)
> >> >> > |
> >> >> > event_master000: E (0.58) | timer000: E (1.58) | worker000: W
> (41.59)
> >> >> > |
> >> >> > worker001: W (101.61) | listener000: W (5.58) | listener001: W
> (5.58)
> >> >> > |
> >> >> > scheduler000: W (5.57) | ERROR
> >> >> > malloc:                   arena(0) |ordblks(1) | smblks(0) |
> >> >> > hblksr(0) |
> >> >> > hblhkd(0) usmblks(0) | fsmblks(0) | uordblks(0) | fordblks(0) |
> >> >> > keepcost(0)
> >> >> > Monitor:                  disabled
> >> >> >
> >> >> > Simon
> >> >> >
> >> >> >
> >> >> >
> >> >> > _______________________________________________
> >> >> > users mailing list
> >> >> > [email protected]
> >> >> > https://gridengine.org/mailman/listinfo/users
> >> >> >
> >> >
> >> >
> >
> >
>

Attachment: centos5_qconf_working
Description: Binary data

Attachment: centos4_qconf_NOT_working
Description: Binary data

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to