We're planning an outage on our cluster for the 12th of this month. I've added reservations for each of the subclusters to ensure that nothing is running at that time. The command I use is something like qrsub -l mem=4G,job=true -a 04120800 -d 24:0:0 -pe '*-j' 256 where mem is a consumable resource used to control memory usage and job is an exclusive resource associated with each host and the pe varies depending on which subcluster I'm reserving.
The reservations appear to be fine themselves but checking the schedule file it appears that queued jobs now make reservations after the outage even though they have plenty of time to run before it (I'm making the reservations this early because we have a few people submitting 7 day jobs). If I restart the scheduler then the jobs start reserving slots prior to the outage but the queues acquire a qtype of N according to qstat -f and jobs don't actually start in them. I can change the qtype in qstat -f to B by using qconf to change the qtype attribute of each queue to batch (which it already is according to qconf -sq). I can change the qtype to BP in qstat -f by modifying pe_list on each queue but it won't let me do this with a reservation in place (even though I'm just repeating what is already there). If I delete the reservation,modify the pe_list and recreate the reservation then I'm back to my original problem The upshot of this is that the cluster is now dominated by low priority small jobs while the high priority parallel jobs are making reservations after the outage. Also after a scheduler restart it takes a while for existing jobs to start making reservations. For a few hours thereafter only jobs submitted after the restart make reservations. Running SGE 6.2u3 at the moment. Is an upgrade likely to fix this? _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users