Thanks a lot, William! It seems that removing them just to re-insert recreate the files and with this hosts come back to the queue : ===================================================
compute-2-4 lx26-amd64 8 - 23.5G - 4.0G - all.q BIP 0/0/8 au compute-3-10 lx26-amd64 8 0.00 23.5G 1.2G 4.0G 196.0K all.q BIP 0/0/8 compute-3-11 lx26-amd64 8 0.03 23.5G 661.5M 4.0G 196.0K all.q BIP 0/0/8 compute-3-12 lx26-amd64 8 0.02 23.5G 748.8M 4.0G 196.0K all.q BIP 0/0/8 compute-3-2 lx26-amd64 8 0.11 23.5G 657.9M 4.0G 196.0K all.q BIP 0/0/8 compute-3-3 lx26-amd64 8 0.06 23.5G 1.3G 4.0G 196.0K all.q BIP 0/0/8 compute-3-4 lx26-amd64 8 0.02 23.5G 599.6M 4.0G 24.6M all.q BIP 0/0/8 compute-3-5 lx26-amd64 8 0.01 23.5G 1.4G 4.0G 0.0 all.q BIP 0/0/8 compute-3-6 lx26-amd64 16 0.03 23.5G 728.4M 4.0G 260.0K all.q BIP 0/0/8 compute-3-7 lx26-amd64 8 0.00 23.5G 590.6M 4.0G 39.5M all.q BIP 0/0/8 compute-3-8 lx26-amd64 8 0.04 23.5G 666.6M 4.0G 24.1M all.q BIP 0/0/8 compute-3-9 lx26-amd64 8 0.01 23.5G 635.1M 4.0G 196.0K all.q BIP 0/0/8 compute-30-1 lx26-amd64 80 0.01 62.9G 1.9G 4.0G 38.2M all.q BIP 0/0/40 all.q@compute-3-10.local BIP 0/0/8 0.00 lx26-amd64 --------------------------------------------------------------------------------- all.q@compute-3-11.local BIP 0/0/8 0.00 lx26-amd64 --------------------------------------------------------------------------------- all.q@compute-3-12.local BIP 0/0/8 0.00 lx26-amd64 --------------------------------------------------------------------------------- all.q@compute-3-2.local BIP 0/0/8 0.04 lx26-amd64 --------------------------------------------------------------------------------- all.q@compute-3-3.local BIP 0/0/8 0.04 lx26-amd64 --------------------------------------------------------------------------------- all.q@compute-3-4.local BIP 0/0/8 0.06 lx26-amd64 --------------------------------------------------------------------------------- all.q@compute-3-5.local BIP 0/0/8 0.01 lx26-amd64 --------------------------------------------------------------------------------- all.q@compute-3-6.local BIP 0/0/8 0.00 lx26-amd64 --------------------------------------------------------------------------------- all.q@compute-3-7.local BIP 0/0/8 0.02 lx26-amd64 --------------------------------------------------------------------------------- all.q@compute-3-8.local BIP 0/0/8 0.03 lx26-amd64 --------------------------------------------------------------------------------- all.q@compute-3-9.local BIP 0/0/8 0.02 lx26-amd64 --------------------------------------------------------------------------------- all.q@compute-30-1.local BIP 0/0/40 0.02 lx26-amd64 On Mon, Jul 8, 2013 at 9:14 AM, William Hay <w....@ucl.ac.uk> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > Oops forgor to send to list... > > - -------- Original Message -------- > Subject: Re: [gridengine users] Rocks+SGE - execd up, no shepherds or > queues > Date: Mon, 08 Jul 2013 08:12:31 +0100 > From: William Hay <w....@ucl.ac.uk> > To: Samir Cury <sa...@hep.caltech.edu> > > On 05/07/13 20:07, Samir Cury wrote: > > Hi William, > > > > Thanks for the directions, I tried changing the queue > > configuration and host group configuration, issuing or not a > > restart on the master and exec nodes, but not much changes. > > > > Yes, we're using the spool, looking closer to it : > > > > /opt/gridengine/default/spool/qmaster/qinstances/all.q > > [root@t3-local all.q]# ll total 68 -rw-r--r-- 1 sge sge 223 Jun > > 16 2012 compute-2-2.local -rw-r--r-- 1 sge sge 223 Jun 16 2012 > > compute-2-4.local -rw-r--r-- 1 sge sge 225 Oct 15 2012 > > compute-30-1.local -rw-r--r-- 1 sge sge 224 Jun 16 2012 > > compute-3-10.local -rw-r--r-- 1 sge sge 224 Jun 16 2012 > > compute-3-11.local -rw-r--r-- 1 sge sge 224 Jun 16 2012 > > compute-3-12.local -rw-r--r-- 1 sge sge 227 Sep 27 2012 > > compute-31-2.local -rw-r--r-- 1 sge sge 223 Jun 16 2012 > > compute-3-2.local -rw-r--r-- 1 sge sge 223 Nov 20 2012 > > compute-3-3.local -rw-r--r-- 1 sge sge 223 Jun 16 2012 > > compute-3-4.local -rw-r--r-- 1 sge sge 223 Jul 5 10:23 > > compute-3-5.local -rw-r--r-- 1 sge sge 223 Jun 16 2012 > > compute-3-6.local -rw-r--r-- 1 sge sge 223 Jun 16 2012 > > compute-3-7.local -rw-r--r-- 1 sge sge 223 Jun 16 2012 > > compute-3-8.local -rw-r--r-- 1 sge sge 223 Jun 16 2012 > > compute-3-9.local -rw-r--r-- 1 sge sge 2000 Sep 24 2012 ss > > -rw-r--r-- 1 sge sge 229 Jun 16 2012 t3-higgs.ext.domain > > > > It looks good, and the most surprising is that the diff between > > compute-3-5 (not working) and compute-3-7 (working) is the > > "version 7" and "version 5" attributes. Not sure what it is (file > > serial number maybe) but doesn't look very meaningful as other > > hosts have different numbers (up to 12). > > > > I tried a bit of the obvious, moving the all.q directory to a > > backup name and restart the master to see if it recreates it > > correctly. Nope. It only got all my hosts missing. However, if I > > alter the queue "in memory" it recreates an empty "all.q" > > directory. > > > > Something I realized while trying other procedures is : > > > > [root@t3-local all.q]# qmod -e all.q Queue instance > > "all.q@compute-3-2.local" is already in the specified state: > > enabled Queue instance "all.q@compute-2-4.local" is already in the > > specified state: enabled Queue instance > > "al...@t3-higgs.ext.domain" is already in the specified state: > > enabled Queue instance "all.q@compute-3-7.local" is already in the > > specified state: enabled Queue instance "all.q@compute-3-8.local" > > is already in the specified state: enabled > > > > Meaning that although the hostgroup @allhosts looks what we want, > > qmod is only considering those nodes for some reason. > > > > Maybe the question now is -- what makes those nodes to be > > considered by qstat and qmod, and how to include(or force) them > > into this list. > > > > To isolate a hostgroup problem, I copied the list from qconf > > -mhgroup @allhosts directly in all.q's hostlist, but not luck > > either. > > > > Any idea on how to actually regenerate the all.q files in spool? > > That seems to be the way. Summarizing : > > > Possibly you could delete the missing hosts from the queue/hostgroup, > save it then re-add them? > > William > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.11 (GNU/Linux) > Comment: Using GnuPG with undefined - http://www.enigmail.net/ > > iQIcBAEBAgAGBQJR2ma4AAoJEKCzH4joEjNWAYwQAISEDI3Wkmih2I3Uf1qSkxXI > 3r8MlMML3vVB0gx/UH6l9nEze4FNKPfFnOME9JwMt7LbZbqHemDyx62sEkLhAMSr > juIiSQdUPGZaRphl8bZgXaIW+ihEjWURUXnGotoANdDE9MF+StpEkg5SJIyXoIOC > MxVp9soPLc29UolI3LcpaCnBPqTE1wHcl/+qiQdEyyCIlOJ7v8oLi9fCSPvPIp6n > UKgT7fXVQAtqQl3GDSL61YXMnZtWPGI3RUfKpmdzKz6V6CxEeWIkge7VtVz8u/Nn > ICq/txOGzIcJf0jFTcZjf5YzID2O26BIVJCFlkrsyOOYBzyh6Z4Dz75eykLos1xB > O6k34HFuecH7rqDGINBp+BSP4+MhPPWNav2GyBhKpIlgOWieq9YRsZh0q2Qzngo3 > 2Gur2wLCWb6NnTNmGImPLSstxc4zphXhdWaPKNTaxYMQrtsSiKhR69DCP+b6TBdA > lULsDIxAo5a2kP/Ea5bP7jT91tuPeEAk69YLvj8KfwYlLq6NLYiQBqL+e/yL+iri > 5FoJelSBgCPlcrgfllK8eL1MIvU/oaD7G2X/P5DXFxLu759l+yrSVvidik+BUpGK > 8rlGHCxSylqzgjnxbICEawuik9i0ZIwqDnWsP6m7/gS+qFrApFXCa5nkH176uh7h > 1f7ThRun3Mw0kKoFXZSs > =fGAV > -----END PGP SIGNATURE----- > > _______________________________________________ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users >
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users