Re: [slurm-users] job_time_limit: inactivity time limit reached ...

2022-09-19 Thread Chris Samuel
On 19/9/22 05:46, Paul Raines wrote: In slurm.conf I had InactiveLimit=60 which I guess is what is happening but my reading of the docs on this setting was it only affects the starting of a job with srun/salloc and not a job that has been running for days.  Is it InactiveLimit that leads to the

Re: [slurm-users] admin users without a database

2022-09-19 Thread Chris Samuel
On 19/9/22 06:14, Bernstein, Noam CIV USN NRL (6393) Washington DC (USA) wrote: Is it possible to make a user an admin without slurmdbd? The docs I've found indicates that I need to set the user's admin level with sacctmgr, but that command always says I don't believe so, I believe that's al

Re: [slurm-users] job_time_limit: inactivity time limit reached ...

2022-09-19 Thread Brian Andrus
Paul, You are likely spot on with the inactiveLimit change. It may also be an environment variable of TMOUT (under bash) set. Brian Andrus On 9/19/2022 5:46 AM, Paul Raines wrote: I have had two nights where right at 3:35am a bunch of jobs were killed early with TIMEOUT way before  their no

Re: [slurm-users] job_time_limit: inactivity time limit reached ...

2022-09-19 Thread Reed Dier
I’m not sure if this might be helpful, but my logrotate.d for slurm looks a bit differently, namely instead of a systemctl reload, I am sending a specific SIGUSR2 signal, which is supposedly for the specific purpose of logrotation in slurm. > postrotate > pkill -x --signal SIGUS

[slurm-users] admin users without a database

2022-09-19 Thread Bernstein, Noam CIV USN NRL (6393) Washington DC (USA)
Is it possible to make a user an admin without slurmdbd? The docs I've found indicates that I need to set the user's admin level with sacctmgr, but that command always says You are not running a supported accounting_storage plugin Only 'accounting_storage/slurmdbd' is supported. I don't especial

[slurm-users] job_time_limit: inactivity time limit reached ...

2022-09-19 Thread Paul Raines
I have had two nights where right at 3:35am a bunch of jobs were killed early with TIMEOUT way before their normal TimeLimit. The slurmctld log has lots of lines like at 3:35am with [2022-09-12T03:35:02.303] job_time_limit: inactivity time limit reached for JobId=1636922 with jobs running o