That was it, thanks! The node had failed so I didn't think there'd be anything running on there, but two jobs were stuck in the basic.q on that node. I've killed them and now can remove host compute-2-4.
-M On Wed, Sep 6, 2017 at 11:41 AM, Feng Zhang <[email protected]> wrote: > Is there any running jobs on queue instance of [email protected]? > > On Wed, Sep 6, 2017 at 11:33 AM, Michael Stauffer <[email protected]> > wrote: > > On Wed, Sep 6, 2017 at 11:16 AM, Feng Zhang <[email protected]> wrote: > >> > >> It seems SGE master did not get refreshed with new hostgroup. Maybe you > >> can try: > >> > >> 1. restart SGE master > > > > > > Is it safe to do this with jobs queued and running? I think it's not > > reliable, i.e. jobs can get killed and de-queued? > > > >> > >> or > >> > >> 2. change basic.q, "hostlist" to any node, like "compute-1-0.local", > >> > >> wait till it gets refreshed; then change it back to "@basichosts". > > > > > > I've done this, but it's not refreshing (been about 10 minutes now). I'm > > still getting the error when I try to delete exec host compute-2-4, and > > qhost is still showing basic.q on the nodes in @basichosts. > > > > Interestingly, host compute-2-4 was removed from another queue > > (qlogin.basic.q) that also uses @basichosts, so it's something about > basic.q > > that's stuck. > > > > Is there some way to refresh things other than restarting qmaster? > > > > -M > > > > > >> > >> > >> > >> > >> On Wed, Sep 6, 2017 at 10:29 AM, Michael Stauffer <[email protected]> > >> wrote: > >> > SoGE 8.1.8 > >> > > >> > Hi, > >> > > >> > I'm having trouble deleting an execution host. I've removed it from > the > >> > host group, but when I try to delete with qconf, it says it's still > part > >> > of > >> > 'basic.q'. Here's the relevant output? Anyone have any suggestions? > >> > > >> > [root@chead ~]# qconf -de compute-2-4.local > >> > Host object "compute-2-4.local" is still referenced in cluster queue > >> > "basic.q". > >> > > >> > [root@chead ~]# qconf -sq basic.q > >> > qname basic.q > >> > hostlist @basichosts > >> > seq_no 0 > >> > load_thresholds np_load_avg=1.74 > >> > suspend_thresholds NONE > >> > nsuspend 1 > >> > suspend_interval 00:05:00 > >> > priority 0 > >> > min_cpu_interval 00:05:00 > >> > processors UNDEFINED > >> > qtype BATCH > >> > ckpt_list NONE > >> > pe_list make mpich mpi orte unihost serial > >> > rerun FALSE > >> > slots 8,[compute-1-2.local=3],[compute-1-0.local=7], > \ > >> > [compute-1-1.local=7],[compute-1-3.local=7], \ > >> > [compute-1-5.local=8],[compute-1-6.local=8], \ > >> > [compute-1-7.local=8],[compute-1-8.local=8], \ > >> > [compute-1-9.local=8],[compute-1-10.local=8], \ > >> > [compute-1-11.local=8],[compute-1-12.local=8], > \ > >> > [compute-1-13.local=8],[compute-1-14.local=8], > \ > >> > [compute-1-15.local=8] > >> > tmpdir /tmp > >> > shell /bin/bash > >> > prolog NONE > >> > epilog NONE > >> > shell_start_mode posix_compliant > >> > starter_method NONE > >> > suspend_method NONE > >> > resume_method NONE > >> > terminate_method NONE > >> > notify 00:00:60 > >> > owner_list NONE > >> > user_lists NONE > >> > xuser_lists NONE > >> > subordinate_list NONE > >> > complex_values NONE > >> > projects NONE > >> > xprojects NONE > >> > calendar NONE > >> > initial_state default > >> > s_rt INFINITY > >> > h_rt INFINITY > >> > s_cpu INFINITY > >> > h_cpu INFINITY > >> > s_fsize INFINITY > >> > h_fsize INFINITY > >> > s_data INFINITY > >> > h_data INFINITY > >> > s_stack INFINITY > >> > h_stack INFINITY > >> > s_core INFINITY > >> > h_core INFINITY > >> > s_rss INFINITY > >> > h_rss INFINITY > >> > s_vmem 19G > >> > h_vmem 19G > >> > > >> > [root@chead ~]# qconf -shgrp @basichosts > >> > group_name @basichosts > >> > hostlist compute-1-0.local compute-1-2.local compute-1-3.local \ > >> > compute-1-5.local compute-1-6.local compute-1-7.local \ > >> > compute-1-8.local compute-1-9.local compute-1-10.local \ > >> > compute-1-11.local compute-1-12.local compute-1-13.local \ > >> > compute-1-14.local compute-1-15.local compute-2-0.local \ > >> > compute-2-2.local compute-2-5.local compute-2-7.local \ > >> > compute-2-8.local compute-2-9.local compute-2-11.local \ > >> > compute-2-12.local compute-2-13.local compute-2-15.local \ > >> > compute-2-6.local > >> > > >> > Thanks > >> > > >> > -M > >> > > >> > _______________________________________________ > >> > users mailing list > >> > [email protected] > >> > https://gridengine.org/mailman/listinfo/users > >> > > >> > >> > >> > >> -- > >> Best, > >> > >> Feng > > > > > > > > -- > Best, > > Feng >
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
