Simon, I'm logging off now, please let the list know whether it's still causing problems and/or your findings.
(I'm in North America - EST timezone, and I normally don't stay up this late - but it usually takes me some time to get back to the normal daily schedule after the holidays :-D ) Rayson On Fri, Jan 13, 2012 at 1:36 AM, Rayson Ho <ray...@scalablelogic.com> wrote: > Good! That means qconf is waiting for the master's response but not getting > it. > > If an IP filter or firewall is configured on that node, then it is > very likely to be the cause. Make sure that firewalls are turned off > or configured properly... I used to use sniffers like TCP dump to > debug issues like this, but I have not used sniffers for a long while. > > Rayson > > > > On Fri, Jan 13, 2012 at 1:25 AM, Simon Matthews > <simon.d.matth...@gmail.com> wrote: >> >> >> On Thu, Jan 12, 2012 at 10:15 PM, Rayson Ho <ray...@scalablelogic.com> >> wrote: >>> >>> Does it hang when you issue the qconf command on that node, or does it >>> return the error message immediately?? >> >> >> It hangs. I see the message either after it times out or if I kill it. >> >> Simon >>> >>> >>> Rayson >>> >>> >>> On Fri, Jan 13, 2012 at 1:00 AM, Simon Matthews >>> <simon.d.matth...@gmail.com> wrote: >>> > I am running the same version. I have one installation tree that is NFS >>> > mounted. All clients use the same binaries. >>> > >>> > I had wanted to move to 6.2U5, but I can't find a source to download it. >>> > >>> > Simon >>> > >>> > >>> > On Thu, Jan 12, 2012 at 9:50 PM, Rayson Ho <ray...@scalablelogic.com> >>> > wrote: >>> >> >>> >> On Fri, Jan 13, 2012 at 12:02 AM, Simon Matthews >>> >> <simon.d.matth...@gmail.com> wrote: >>> >> > I have an installation of SGE 6.2U4 that I downloaded some years ago >>> >> > that I >>> >> > have installed on a couple of qmaster hosts. >>> >> >>> >> Are you using the same version of SGE (SGE 6.2u4) on both the qmaster >>> >> & the node? You can run "qconf -help | head -1" on both the master & >>> >> the node to show the version. >>> >> >>> >> Most common GDI errors are due to mismatching versions of Grid Engine >>> >> - and if you are running the same version of SGE, then let us know, I >>> >> will dig the code to see what can possibly go wrong. >>> >> >>> >> And don't worry about still using the Sun binaries, I work with sites >>> >> that have even older versions of Grid Engine. Sun contributed the code >>> >> to open source, and without Sun we wouldn't have this community. >>> >> >>> >> Rayson >>> >> >>> >> >>> >> >>> >> >>> >> > I hope that I do not offend the users of this list by asking for help >>> >> > using >>> >> > a binary installation, using binaries built by Sun. >>> >> > >>> >> > I hope that someone can shed some light on the problem. >>> >> > >>> >> > I have built some new virtualized clients using KVM on a Centos 6 >>> >> > host. >>> >> > The >>> >> > Centos 5 client seems to work properly, but the Centos 4 client does >>> >> > not. I >>> >> > need a Centos 4 execd for testing purposes. >>> >> > >>> >> > I cannot install sge_execd, because of the qconf problems. >>> >> > >>> >> > qconf -sh results in: >>> >> > >>> >> > qconf -sh >>> >> > ERROR: failed receiving gdi request response for mid=1 (got no >>> >> > message). >>> >> > >>> >> > I get this message if I try this client against the new cluster and a >>> >> > cluster that has been running for several years. Other Centos 4 >>> >> > clients >>> >> > can >>> >> > run "qconf -ch" against both clusters without problem. >>> >> > >>> >> > qping works from the problematic client: >>> >> > qping -info sgemaster 6444 qmaster 1 >>> >> > 01/12/2012 20:59:38: >>> >> > SIRM version: 0.1 >>> >> > SIRM message id: 1 >>> >> > start time: 01/12/2012 16:31:57 (1326414717) >>> >> > run time [s]: 16052 >>> >> > messages in read buffer: 0 >>> >> > messages in write buffer: 0 >>> >> > nr. of connected clients: 2 >>> >> > status: 2 >>> >> > info: MAIN: E (16052.50) | signaler000: E >>> >> > (16052.05) >>> >> > | >>> >> > event_master000: E (0.58) | timer000: E (1.58) | worker000: W (41.59) >>> >> > | >>> >> > worker001: W (101.61) | listener000: W (5.58) | listener001: W (5.58) >>> >> > | >>> >> > scheduler000: W (5.57) | ERROR >>> >> > malloc: arena(0) |ordblks(1) | smblks(0) | >>> >> > hblksr(0) | >>> >> > hblhkd(0) usmblks(0) | fsmblks(0) | uordblks(0) | fordblks(0) | >>> >> > keepcost(0) >>> >> > Monitor: disabled >>> >> > >>> >> > Simon >>> >> > >>> >> > >>> >> > >>> >> > _______________________________________________ >>> >> > users mailing list >>> >> > users@gridengine.org >>> >> > https://gridengine.org/mailman/listinfo/users >>> >> > >>> > >>> > >> >> _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users