On Thu, Jan 12, 2012 at 10:36 PM, Rayson Ho <[email protected]>wrote:

> Good! That means qconf is waiting for the master's response but not
> getting it.
>
> If an IP filter or firewall is configured on that node, then it is
> very likely to be the cause. Make sure that firewalls are turned off
> or configured properly... I used to use sniffers like TCP dump to
> debug issues like this, but I have not used sniffers for a long while.
>

Firewalls are all turned off, on both the VM guest that has problems and
its host. Also SELinux is disabled.

Simon


>
> Rayson
>
>
>
> On Fri, Jan 13, 2012 at 1:25 AM, Simon Matthews
> <[email protected]> wrote:
> >
> >
> > On Thu, Jan 12, 2012 at 10:15 PM, Rayson Ho <[email protected]>
> > wrote:
> >>
> >> Does it hang when you issue the qconf command on that node, or does it
> >> return the error message immediately??
> >
> >
> > It hangs. I see the message either after it times out or if I kill it.
> >
> > Simon
> >>
> >>
> >> Rayson
> >>
> >>
> >> On Fri, Jan 13, 2012 at 1:00 AM, Simon Matthews
> >> <[email protected]> wrote:
> >> > I am running the same version. I have one installation tree that is
> NFS
> >> > mounted. All clients use the same binaries.
> >> >
> >> > I had wanted to move to 6.2U5, but I can't find a source to download
> it.
> >> >
> >> > Simon
> >> >
> >> >
> >> > On Thu, Jan 12, 2012 at 9:50 PM, Rayson Ho <[email protected]>
> >> > wrote:
> >> >>
> >> >> On Fri, Jan 13, 2012 at 12:02 AM, Simon Matthews
> >> >> <[email protected]> wrote:
> >> >> > I have an installation of SGE 6.2U4 that I downloaded some years
> ago
> >> >> > that I
> >> >> > have installed on a couple of qmaster hosts.
> >> >>
> >> >> Are you using the same version of SGE (SGE 6.2u4) on both the qmaster
> >> >> & the node? You can run "qconf -help | head -1" on both the master &
> >> >> the node to show the version.
> >> >>
> >> >> Most common GDI errors are due to mismatching versions of Grid Engine
> >> >> - and if you are running the same version of SGE, then let us know, I
> >> >> will dig the code to see what can possibly go wrong.
> >> >>
> >> >> And don't worry about still using the Sun binaries, I work with sites
> >> >> that have even older versions of Grid Engine. Sun contributed the
> code
> >> >> to open source, and without Sun we wouldn't have this community.
> >> >>
> >> >> Rayson
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> > I hope that I do not offend the users of this list by asking for
> help
> >> >> > using
> >> >> > a binary installation, using binaries built by Sun.
> >> >> >
> >> >> > I hope that someone can shed some light on the problem.
> >> >> >
> >> >> > I have built some new virtualized clients using KVM on a Centos 6
> >> >> > host.
> >> >> > The
> >> >> > Centos 5 client seems to work properly, but the Centos 4 client
> does
> >> >> > not. I
> >> >> > need a Centos 4 execd for testing purposes.
> >> >> >
> >> >> > I cannot install sge_execd, because of the qconf problems.
> >> >> >
> >> >> > qconf -sh results in:
> >> >> >
> >> >> > qconf -sh
> >> >> > ERROR: failed receiving gdi request response for mid=1 (got no
> >> >> > message).
> >> >> >
> >> >> > I get this message if I try this client against the new cluster
> and a
> >> >> > cluster that has been running for several years. Other Centos 4
> >> >> > clients
> >> >> > can
> >> >> > run "qconf -ch" against both clusters without problem.
> >> >> >
> >> >> > qping works from the problematic client:
> >> >> >  qping -info sgemaster 6444 qmaster 1
> >> >> > 01/12/2012 20:59:38:
> >> >> > SIRM version:             0.1
> >> >> > SIRM message id:          1
> >> >> > start time:               01/12/2012 16:31:57 (1326414717)
> >> >> > run time [s]:             16052
> >> >> > messages in read buffer:  0
> >> >> > messages in write buffer: 0
> >> >> > nr. of connected clients: 2
> >> >> > status:                   2
> >> >> > info:                     MAIN: E (16052.50) | signaler000: E
> >> >> > (16052.05)
> >> >> > |
> >> >> > event_master000: E (0.58) | timer000: E (1.58) | worker000: W
> (41.59)
> >> >> > |
> >> >> > worker001: W (101.61) | listener000: W (5.58) | listener001: W
> (5.58)
> >> >> > |
> >> >> > scheduler000: W (5.57) | ERROR
> >> >> > malloc:                   arena(0) |ordblks(1) | smblks(0) |
> >> >> > hblksr(0) |
> >> >> > hblhkd(0) usmblks(0) | fsmblks(0) | uordblks(0) | fordblks(0) |
> >> >> > keepcost(0)
> >> >> > Monitor:                  disabled
> >> >> >
> >> >> > Simon
> >> >> >
> >> >> >
> >> >> >
> >> >> > _______________________________________________
> >> >> > users mailing list
> >> >> > [email protected]
> >> >> > https://gridengine.org/mailman/listinfo/users
> >> >> >
> >> >
> >> >
> >
> >
>
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to