We allow jobs to overrun their wall time via "OverTimeLimit". We've noticed that jobs that complete successfully but go over the wall time are reported as having "JobState=TIMEOUT" in the job completion log. If mail is configured for a job, the jobs are reported as "failed". scontrol indicates the job's exit code as "0:1", but AFAICT no signal was sent to the job (my test scenario is simply a sleep command that's 60 seconds longer than the wall time I request).
I realize that the file backend is deprecated, but I strongly suspect that the SQL backend will also have these jobs listed as failures. Is this an expected or designed behavior? Our users find this somewhat confusing- I'm considering changing this. However, if I dig in and change this, would there be other ramifications? Thanks Michael
