Hi,

> Am 07.12.2015 um 12:15 schrieb HUMMEL Michel <[email protected]>:
> 
> We  have a cluster with 4 nodes.
> If I shutdown a node when jobs are running on it, qstat still report a 
> "running" status for those jobs  until I reboot the node (then the jobs are 
> migrate to another node).
> Is there a configuration which allows OGS to report a more clear status like 
> E for jobs which are running on a power-off node ?
> Or a way to force migration even if the node is still power-off ?

The default for SGE is to assume a network problem and keep the job in running 
state in `qstat`. You can look into the parameters:

$ qconf -sconf
...
max_unheard                  00:05:00

and what should happen in case the exechost no longer known:

$ qconf -sconf
...
reschedule_unknown           00:00:00

(means no-reschedule as default). Also note the two qmaster_params 
"ENABLE_RESCHEDULE_KILL" and "ENABLE_RESCHEDULE_SLAVE".

-- Reuti
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to