On Thu, Jan 12, 2012 at 10:36 PM, Rayson Ho <[email protected]>wrote:
> Good! That means qconf is waiting for the master's response but not > getting it. > > If an IP filter or firewall is configured on that node, then it is > very likely to be the cause. Make sure that firewalls are turned off > or configured properly... I used to use sniffers like TCP dump to > debug issues like this, but I have not used sniffers for a long while. > Firewalls are all turned off, on both the VM guest that has problems and its host. Also SELinux is disabled. Simon > > Rayson > > > > On Fri, Jan 13, 2012 at 1:25 AM, Simon Matthews > <[email protected]> wrote: > > > > > > On Thu, Jan 12, 2012 at 10:15 PM, Rayson Ho <[email protected]> > > wrote: > >> > >> Does it hang when you issue the qconf command on that node, or does it > >> return the error message immediately?? > > > > > > It hangs. I see the message either after it times out or if I kill it. > > > > Simon > >> > >> > >> Rayson > >> > >> > >> On Fri, Jan 13, 2012 at 1:00 AM, Simon Matthews > >> <[email protected]> wrote: > >> > I am running the same version. I have one installation tree that is > NFS > >> > mounted. All clients use the same binaries. > >> > > >> > I had wanted to move to 6.2U5, but I can't find a source to download > it. > >> > > >> > Simon > >> > > >> > > >> > On Thu, Jan 12, 2012 at 9:50 PM, Rayson Ho <[email protected]> > >> > wrote: > >> >> > >> >> On Fri, Jan 13, 2012 at 12:02 AM, Simon Matthews > >> >> <[email protected]> wrote: > >> >> > I have an installation of SGE 6.2U4 that I downloaded some years > ago > >> >> > that I > >> >> > have installed on a couple of qmaster hosts. > >> >> > >> >> Are you using the same version of SGE (SGE 6.2u4) on both the qmaster > >> >> & the node? You can run "qconf -help | head -1" on both the master & > >> >> the node to show the version. > >> >> > >> >> Most common GDI errors are due to mismatching versions of Grid Engine > >> >> - and if you are running the same version of SGE, then let us know, I > >> >> will dig the code to see what can possibly go wrong. > >> >> > >> >> And don't worry about still using the Sun binaries, I work with sites > >> >> that have even older versions of Grid Engine. Sun contributed the > code > >> >> to open source, and without Sun we wouldn't have this community. > >> >> > >> >> Rayson > >> >> > >> >> > >> >> > >> >> > >> >> > I hope that I do not offend the users of this list by asking for > help > >> >> > using > >> >> > a binary installation, using binaries built by Sun. > >> >> > > >> >> > I hope that someone can shed some light on the problem. > >> >> > > >> >> > I have built some new virtualized clients using KVM on a Centos 6 > >> >> > host. > >> >> > The > >> >> > Centos 5 client seems to work properly, but the Centos 4 client > does > >> >> > not. I > >> >> > need a Centos 4 execd for testing purposes. > >> >> > > >> >> > I cannot install sge_execd, because of the qconf problems. > >> >> > > >> >> > qconf -sh results in: > >> >> > > >> >> > qconf -sh > >> >> > ERROR: failed receiving gdi request response for mid=1 (got no > >> >> > message). > >> >> > > >> >> > I get this message if I try this client against the new cluster > and a > >> >> > cluster that has been running for several years. Other Centos 4 > >> >> > clients > >> >> > can > >> >> > run "qconf -ch" against both clusters without problem. > >> >> > > >> >> > qping works from the problematic client: > >> >> > qping -info sgemaster 6444 qmaster 1 > >> >> > 01/12/2012 20:59:38: > >> >> > SIRM version: 0.1 > >> >> > SIRM message id: 1 > >> >> > start time: 01/12/2012 16:31:57 (1326414717) > >> >> > run time [s]: 16052 > >> >> > messages in read buffer: 0 > >> >> > messages in write buffer: 0 > >> >> > nr. of connected clients: 2 > >> >> > status: 2 > >> >> > info: MAIN: E (16052.50) | signaler000: E > >> >> > (16052.05) > >> >> > | > >> >> > event_master000: E (0.58) | timer000: E (1.58) | worker000: W > (41.59) > >> >> > | > >> >> > worker001: W (101.61) | listener000: W (5.58) | listener001: W > (5.58) > >> >> > | > >> >> > scheduler000: W (5.57) | ERROR > >> >> > malloc: arena(0) |ordblks(1) | smblks(0) | > >> >> > hblksr(0) | > >> >> > hblhkd(0) usmblks(0) | fsmblks(0) | uordblks(0) | fordblks(0) | > >> >> > keepcost(0) > >> >> > Monitor: disabled > >> >> > > >> >> > Simon > >> >> > > >> >> > > >> >> > > >> >> > _______________________________________________ > >> >> > users mailing list > >> >> > [email protected] > >> >> > https://gridengine.org/mailman/listinfo/users > >> >> > > >> > > >> > > > > > >
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
