We allow jobs to overrun their wall time via "OverTimeLimit".  We've
noticed that jobs that complete successfully but go over the wall time are
reported as having "JobState=TIMEOUT" in the job completion log.  If mail
is configured for a job, the jobs are reported as "failed".  scontrol
indicates the job's exit code as "0:1", but AFAICT no signal was sent to
the job (my test scenario is simply a sleep command that's 60 seconds
longer than the wall time I request).

I realize that the file backend is deprecated, but I strongly suspect that
the SQL backend will also have these jobs listed as failures.

Is this an expected or designed behavior?  Our users find this somewhat
confusing- I'm considering changing this.  However, if I dig in and change
this, would there be other ramifications?

Thanks

Michael

Reply via email to