Hi, > Am 07.12.2015 um 12:15 schrieb HUMMEL Michel <[email protected]>: > > We have a cluster with 4 nodes. > If I shutdown a node when jobs are running on it, qstat still report a > "running" status for those jobs until I reboot the node (then the jobs are > migrate to another node). > Is there a configuration which allows OGS to report a more clear status like > E for jobs which are running on a power-off node ? > Or a way to force migration even if the node is still power-off ?
The default for SGE is to assume a network problem and keep the job in running state in `qstat`. You can look into the parameters: $ qconf -sconf ... max_unheard 00:05:00 and what should happen in case the exechost no longer known: $ qconf -sconf ... reschedule_unknown 00:00:00 (means no-reschedule as default). Also note the two qmaster_params "ENABLE_RESCHEDULE_KILL" and "ENABLE_RESCHEDULE_SLAVE". -- Reuti _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
