Try qconf -de host_list
Cheers, On Thu, Sep 7, 2017 at 3:22 AM, Michael Stauffer <[email protected]> wrote: > On Wed, Sep 6, 2017 at 12:42 PM, Reuti <[email protected]> wrote: > >> >> > Am 06.09.2017 um 17:33 schrieb Michael Stauffer <[email protected]>: >> > >> > On Wed, Sep 6, 2017 at 11:16 AM, Feng Zhang <[email protected]> >> wrote: >> > It seems SGE master did not get refreshed with new hostgroup. Maybe you >> can try: >> > >> > 1. restart SGE master >> > >> > Is it safe to do this with jobs queued and running? I think it's not >> reliable, i.e. jobs can get killed and de-queued? >> >> Just to mention, that it's safe to restart the qmaster or reboot even the >> machine the qmaster is running on. Nothing will happen to the running jobs >> on the exechosts. >> > > OK good to know. I've done that before and seen them finish, although some > googling suggested people have seen jobs get killed. Does a qmaster > restart, however, empty the queue? I imagine a reboot would too, unless the > queue is stored in a file? > > -M > > >> >> -- Reuti >> >> >> > or >> > >> > 2. change basic.q, "hostlist" to any node, like "compute-1-0.local", >> > wait till it gets refreshed; then change it back to "@basichosts". >> > >> > I've done this, but it's not refreshing (been about 10 minutes now). >> I'm still getting the error when I try to delete exec host compute-2-4, and >> qhost is still showing basic.q on the nodes in @basichosts. >> > >> > Interestingly, host compute-2-4 was removed from another queue >> (qlogin.basic.q) that also uses @basichosts, so it's something about >> basic.q that's stuck. >> > >> > Is there some way to refresh things other than restarting qmaster? >> > >> > -M >> > >> > >> > >> > >> > >> > On Wed, Sep 6, 2017 at 10:29 AM, Michael Stauffer <[email protected]> >> wrote: >> > > SoGE 8.1.8 >> > > >> > > Hi, >> > > >> > > I'm having trouble deleting an execution host. I've removed it from >> the >> > > host group, but when I try to delete with qconf, it says it's still >> part of >> > > 'basic.q'. Here's the relevant output? Anyone have any suggestions? >> > > >> > > [root@chead ~]# qconf -de compute-2-4.local >> > > Host object "compute-2-4.local" is still referenced in cluster queue >> > > "basic.q". >> > > >> > > [root@chead ~]# qconf -sq basic.q >> > > qname basic.q >> > > hostlist @basichosts >> > > seq_no 0 >> > > load_thresholds np_load_avg=1.74 >> > > suspend_thresholds NONE >> > > nsuspend 1 >> > > suspend_interval 00:05:00 >> > > priority 0 >> > > min_cpu_interval 00:05:00 >> > > processors UNDEFINED >> > > qtype BATCH >> > > ckpt_list NONE >> > > pe_list make mpich mpi orte unihost serial >> > > rerun FALSE >> > > slots 8,[compute-1-2.local=3],[compute-1-0.local=7], >> \ >> > > [compute-1-1.local=7],[compute-1-3.local=7], \ >> > > [compute-1-5.local=8],[compute-1-6.local=8], \ >> > > [compute-1-7.local=8],[compute-1-8.local=8], \ >> > > [compute-1-9.local=8],[compute-1-10.local=8], \ >> > > [compute-1-11.local=8],[compute-1-12.local=8], >> \ >> > > [compute-1-13.local=8],[compute-1-14.local=8], >> \ >> > > [compute-1-15.local=8] >> > > tmpdir /tmp >> > > shell /bin/bash >> > > prolog NONE >> > > epilog NONE >> > > shell_start_mode posix_compliant >> > > starter_method NONE >> > > suspend_method NONE >> > > resume_method NONE >> > > terminate_method NONE >> > > notify 00:00:60 >> > > owner_list NONE >> > > user_lists NONE >> > > xuser_lists NONE >> > > subordinate_list NONE >> > > complex_values NONE >> > > projects NONE >> > > xprojects NONE >> > > calendar NONE >> > > initial_state default >> > > s_rt INFINITY >> > > h_rt INFINITY >> > > s_cpu INFINITY >> > > h_cpu INFINITY >> > > s_fsize INFINITY >> > > h_fsize INFINITY >> > > s_data INFINITY >> > > h_data INFINITY >> > > s_stack INFINITY >> > > h_stack INFINITY >> > > s_core INFINITY >> > > h_core INFINITY >> > > s_rss INFINITY >> > > h_rss INFINITY >> > > s_vmem 19G >> > > h_vmem 19G >> > > >> > > [root@chead ~]# qconf -shgrp @basichosts >> > > group_name @basichosts >> > > hostlist compute-1-0.local compute-1-2.local compute-1-3.local \ >> > > compute-1-5.local compute-1-6.local compute-1-7.local \ >> > > compute-1-8.local compute-1-9.local compute-1-10.local \ >> > > compute-1-11.local compute-1-12.local compute-1-13.local \ >> > > compute-1-14.local compute-1-15.local compute-2-0.local \ >> > > compute-2-2.local compute-2-5.local compute-2-7.local \ >> > > compute-2-8.local compute-2-9.local compute-2-11.local \ >> > > compute-2-12.local compute-2-13.local compute-2-15.local \ >> > > compute-2-6.local >> > > >> > > Thanks >> > > >> > > -M >> > > >> > > _______________________________________________ >> > > users mailing list >> > > [email protected] >> > > https://gridengine.org/mailman/listinfo/users >> > > >> > >> > >> > >> > -- >> > Best, >> > >> > Feng >> > >> > _______________________________________________ >> > users mailing list >> > [email protected] >> > https://gridengine.org/mailman/listinfo/users >> >> > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users > >
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
