On Wed, Sep 6, 2017 at 12:42 PM, Reuti <[email protected]> wrote:
> > > Am 06.09.2017 um 17:33 schrieb Michael Stauffer <[email protected]>: > > > > On Wed, Sep 6, 2017 at 11:16 AM, Feng Zhang <[email protected]> wrote: > > It seems SGE master did not get refreshed with new hostgroup. Maybe you > can try: > > > > 1. restart SGE master > > > > Is it safe to do this with jobs queued and running? I think it's not > reliable, i.e. jobs can get killed and de-queued? > > Just to mention, that it's safe to restart the qmaster or reboot even the > machine the qmaster is running on. Nothing will happen to the running jobs > on the exechosts. > OK good to know. I've done that before and seen them finish, although some googling suggested people have seen jobs get killed. Does a qmaster restart, however, empty the queue? I imagine a reboot would too, unless the queue is stored in a file? -M > > -- Reuti > > > > or > > > > 2. change basic.q, "hostlist" to any node, like "compute-1-0.local", > > wait till it gets refreshed; then change it back to "@basichosts". > > > > I've done this, but it's not refreshing (been about 10 minutes now). I'm > still getting the error when I try to delete exec host compute-2-4, and > qhost is still showing basic.q on the nodes in @basichosts. > > > > Interestingly, host compute-2-4 was removed from another queue > (qlogin.basic.q) that also uses @basichosts, so it's something about > basic.q that's stuck. > > > > Is there some way to refresh things other than restarting qmaster? > > > > -M > > > > > > > > > > > > On Wed, Sep 6, 2017 at 10:29 AM, Michael Stauffer <[email protected]> > wrote: > > > SoGE 8.1.8 > > > > > > Hi, > > > > > > I'm having trouble deleting an execution host. I've removed it from the > > > host group, but when I try to delete with qconf, it says it's still > part of > > > 'basic.q'. Here's the relevant output? Anyone have any suggestions? > > > > > > [root@chead ~]# qconf -de compute-2-4.local > > > Host object "compute-2-4.local" is still referenced in cluster queue > > > "basic.q". > > > > > > [root@chead ~]# qconf -sq basic.q > > > qname basic.q > > > hostlist @basichosts > > > seq_no 0 > > > load_thresholds np_load_avg=1.74 > > > suspend_thresholds NONE > > > nsuspend 1 > > > suspend_interval 00:05:00 > > > priority 0 > > > min_cpu_interval 00:05:00 > > > processors UNDEFINED > > > qtype BATCH > > > ckpt_list NONE > > > pe_list make mpich mpi orte unihost serial > > > rerun FALSE > > > slots 8,[compute-1-2.local=3],[compute-1-0.local=7], \ > > > [compute-1-1.local=7],[compute-1-3.local=7], \ > > > [compute-1-5.local=8],[compute-1-6.local=8], \ > > > [compute-1-7.local=8],[compute-1-8.local=8], \ > > > [compute-1-9.local=8],[compute-1-10.local=8], \ > > > [compute-1-11.local=8],[compute-1-12.local=8], \ > > > [compute-1-13.local=8],[compute-1-14.local=8], \ > > > [compute-1-15.local=8] > > > tmpdir /tmp > > > shell /bin/bash > > > prolog NONE > > > epilog NONE > > > shell_start_mode posix_compliant > > > starter_method NONE > > > suspend_method NONE > > > resume_method NONE > > > terminate_method NONE > > > notify 00:00:60 > > > owner_list NONE > > > user_lists NONE > > > xuser_lists NONE > > > subordinate_list NONE > > > complex_values NONE > > > projects NONE > > > xprojects NONE > > > calendar NONE > > > initial_state default > > > s_rt INFINITY > > > h_rt INFINITY > > > s_cpu INFINITY > > > h_cpu INFINITY > > > s_fsize INFINITY > > > h_fsize INFINITY > > > s_data INFINITY > > > h_data INFINITY > > > s_stack INFINITY > > > h_stack INFINITY > > > s_core INFINITY > > > h_core INFINITY > > > s_rss INFINITY > > > h_rss INFINITY > > > s_vmem 19G > > > h_vmem 19G > > > > > > [root@chead ~]# qconf -shgrp @basichosts > > > group_name @basichosts > > > hostlist compute-1-0.local compute-1-2.local compute-1-3.local \ > > > compute-1-5.local compute-1-6.local compute-1-7.local \ > > > compute-1-8.local compute-1-9.local compute-1-10.local \ > > > compute-1-11.local compute-1-12.local compute-1-13.local \ > > > compute-1-14.local compute-1-15.local compute-2-0.local \ > > > compute-2-2.local compute-2-5.local compute-2-7.local \ > > > compute-2-8.local compute-2-9.local compute-2-11.local \ > > > compute-2-12.local compute-2-13.local compute-2-15.local \ > > > compute-2-6.local > > > > > > Thanks > > > > > > -M > > > > > > _______________________________________________ > > > users mailing list > > > [email protected] > > > https://gridengine.org/mailman/listinfo/users > > > > > > > > > > > -- > > Best, > > > > Feng > > > > _______________________________________________ > > users mailing list > > [email protected] > > https://gridengine.org/mailman/listinfo/users > >
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
