Am 08.11.2012 um 13:59 schrieb William Hay:

> There was a checkpointing environment.  Also the same thing seems to happen 
> in the first few minutes the job is running but not afterwards. 

Aha, this is different according to the documentation. What was defined in the 
checkpointing environment for the "when" condition?

-- Reuti


> 
> On 8 November 2012 12:29, Reuti <[email protected]> wrote:
> Am 02.11.2012 um 15:56 schrieb William Hay:
> 
> > I submitted an array job with -r y.  One of the tasks was transferring to a 
> > node (state t) when that node went down but despite 
> > max_unheard+reschedule_unknown being exceeded neither that task nor another 
> > task on the same node was rescheduled.  A manual qmod -rq seems to work but 
> > just working would be better.
> 
> But if the node crashes while all jobs are state "r" it working for you - 
> there was no checkpointing environment in the way?
> 
> The array task was still shown in state "t" all the time?
> 
> 
> > Is this a known problem?
> 
> It's hard to provoke.
> 
> - Reuti
> 
> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to