Hi, Am 14.12.2011 um 13:39 schrieb Schmidt U.:
> sometimes, especially during holidays a cluster could become less used. By > now we are using manually manipulated slot quota limit. I am looking for a > possibility for a scenario "dangerous queue" in an empty cluster. A user > could define a "dangerous flag" in the script of his jobs which have the > chance to fill a whole cluster, if it is more or less empty. When the amount > of submitted "important" jobs will increase again, such jobs of the > "dangerous queue" should be killed accordingly to the load of the cluster. > Is anyone using a similar scenario ? yes, you can use the checkpointing feature for it, i.e. instead of requesting a special flag, the user has to request the appropriate checkpointing environment. The checkpointing environment needs to be defined to "migrate on suspend" (setting "when x" in the CE). So it will be killed and rescheduled as result of the suspension. This queue with the attached checkpointing environment is subordinated (maybe slotwise) to the normal one. man sge_chkpt man checkpoint http://arc.liv.ac.uk/SGE/howto/checkpointing.html http://arc.liv.ac.uk/SGE/howto/APSTC-TB-2004-005.pdf (nice state diagrams) - The job can be suspended by any means. Whether it's done from the command line by `qmod sj ...`, by the subordination or passing suspend_thresholds doesn't matter. - A more fast forward approach could be to define a suspend_method to kill the process group of the job in question in this special queue, but then you lose the option to reschedule the job like it's done by a CE. - Resources need to be available for the normal job to start. SGE isn't looking ahead that they will be freed once the job with the flag/CE is killed. _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users