Hi,

Am 14.12.2011 um 13:39 schrieb Schmidt U.:

> sometimes, especially during holidays a cluster could become less used. By 
> now we are using manually manipulated slot quota limit. I am looking for a 
> possibility for a scenario "dangerous queue" in an empty cluster. A user 
> could define a "dangerous flag" in the script of his jobs which have the 
> chance to fill a whole cluster, if it is more or less empty. When the amount 
> of submitted "important" jobs will increase again, such jobs of the 
> "dangerous queue" should be killed accordingly  to the load of the cluster. 
> Is anyone using a similar scenario ?

yes, you can use the checkpointing feature for it, i.e. instead of requesting a 
special flag, the user has to request the appropriate checkpointing 
environment. The checkpointing environment needs to be defined to "migrate on 
suspend" (setting "when x" in the CE). So it will be killed and rescheduled as 
result of the suspension. This queue with the attached checkpointing 
environment is subordinated (maybe slotwise) to the normal one.

man sge_chkpt
man checkpoint
http://arc.liv.ac.uk/SGE/howto/checkpointing.html
http://arc.liv.ac.uk/SGE/howto/APSTC-TB-2004-005.pdf (nice state diagrams)

- The job can be suspended by any means. Whether it's done from the command 
line by `qmod sj ...`, by the subordination or passing suspend_thresholds 
doesn't matter.

- A more fast forward approach could be to define a suspend_method to kill the 
process group of the job in question in this special queue, but then you lose 
the option to reschedule the job like it's done by a CE.

- Resources need to be available for the normal job to start. SGE isn't looking 
ahead that they will be freed once the job with the flag/CE is killed.
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to