Am 08.11.2012 um 14:41 schrieb William Hay: > Checkpoint every 6 minutes (Found this while testing the checkpoint > environment we'll probably increase the minimum for production).
This is the setting "min_cpu_interval" in the queue defintion - does "when" in the checkpointing environment include "r"? -- Reuti > > On 8 November 2012 13:18, Reuti <[email protected]> wrote: > Am 08.11.2012 um 13:59 schrieb William Hay: > > > There was a checkpointing environment. Also the same thing seems to happen > > in the first few minutes the job is running but not afterwards. > > Aha, this is different according to the documentation. What was defined in > the checkpointing environment for the "when" condition? > > -- Reuti > > > > > > On 8 November 2012 12:29, Reuti <[email protected]> wrote: > > Am 02.11.2012 um 15:56 schrieb William Hay: > > > > > I submitted an array job with -r y. One of the tasks was transferring to > > > a node (state t) when that node went down but despite > > > max_unheard+reschedule_unknown being exceeded neither that task nor > > > another task on the same node was rescheduled. A manual qmod -rq seems > > > to work but just working would be better. > > > > But if the node crashes while all jobs are state "r" it working for you - > > there was no checkpointing environment in the way? > > > > The array task was still shown in state "t" all the time? > > > > > > > Is this a known problem? > > > > It's hard to provoke. > > > > - Reuti > > > > > > > > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
