I killed all sge_* processes in exec nodes and tried to restart execd but got this message
root@compute010:/home/ubuntu# /usr/lib/gridengine/sge_execd error: can't find connection error: can't get configuration from qmaster -- backgrounding On Mon, May 30, 2016 at 10:36 AM, Radhouane Aniba <[email protected]> wrote: > Hi Bill > > Yes I am sure > > This is what I have when I login to one of the nodes and do > > ubuntu@compute010:~$ ps -ef | grep sge_ > sgeadmin 1254 1 0 May28 ? 00:00:39 > /usr/lib/gridengine/sge_qmaster > sgeadmin 1446 1 0 May28 ? 00:00:22 > /usr/lib/gridengine/sge_execd > ubuntu 2552 2527 0 17:36 pts/0 00:00:00 grep --color=auto sge_ > > > On Mon, May 30, 2016 at 10:33 AM, Bill Bryce <[email protected]> wrote: > >> Hi Rad, >> >> Are you sure that the execution daemons are running on your compute >> nodes? Can you login to one of the nodes say ‘compute001’ and do a ps >> looking for the execd? When an execd is functioning normally it provides >> the load and memory, etc… none of your nodes are showing that. >> >> Regards, >> >> Bill. >> >> On May 30, 2016, at 1:20 PM, Radhouane Aniba <[email protected]> wrote: >> >> Hello all, >> >> I am trying to submit a simple "hello world" to test a gridengine (I used >> it before with no problems) >> >> The problem is that my job is waiting in the queue forever >> >> The qhost command shows a wired state of the compute nodes >> >> HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO >> SWAPUS >> ------------------------------------------------------------------------------- >> global - - - - - - >> - >> compute001 lx26-amd64 4 - 31.4G - 0.0 >> - >> compute002 lx26-amd64 4 - 31.4G - 0.0 >> - >> compute003 lx26-amd64 4 - 31.4G - 0.0 >> - >> compute004 lx26-amd64 4 - 31.4G - 0.0 >> - >> compute005 lx26-amd64 4 - 31.4G - 0.0 >> - >> compute006 lx26-amd64 4 - 31.4G - 0.0 >> - >> compute007 lx26-amd64 4 - 31.4G - 0.0 >> - >> compute008 lx26-amd64 4 - 31.4G - 0.0 >> - >> compute009 lx26-amd64 4 - 31.4G - 0.0 >> - >> compute010 lx26-amd64 4 - 31.4G - 0.0 >> - >> compute011 lx26-amd64 4 - 31.4G - 0.0 >> >> In normal times even when the compute nodes are not used I used to have >> some information on the load and memuse columns >> >> I am not an SGE persons but I am familiar with all the commands, any help >> would be much appreciated >> >> the qstat -f command shows all my nodes in au state. I've been reading a >> lot about it and I understood its an alarm state (overloaded ?) >> >> the only heavy activity I had on the head node was a script downloading >> 19T of data, could the headnode be the problem and not the compute nodes ? >> sge_execd is working on all the compute/exec nodes :/ >> >> -- >> *Rad* >> _______________________________________________ >> users mailing list >> [email protected] >> https://gridengine.org/mailman/listinfo/users >> >> >> William Bryce | VP Products >> Univa Corporation, Toronto >> E: [email protected] | D: 647-9742841 | Toll-Free (800) 370-5320 >> W: Univa.com | FB: facebook.com/univa.corporation | T: >> twitter.com/Grid_Engine >> >> > > > -- > *Radhouane Aniba* > *Bioinformatics Scientist* > *BC Cancer Agency, Vancouver, Canada* > -- *Radhouane Aniba* *Bioinformatics Scientist* *BC Cancer Agency, Vancouver, Canada*
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
