Re: [slurm-users] Mysterious job terminations on Slurm 17.11.10

2019-02-01 Thread Chris Samuel
On Friday, 1 February 2019 6:04:45 AM AEDT Andy Riebs wrote: > Any thoughts on what might be happening, or what I might try next? Anything in dmesg on the nodes or syslog at that time? I'm wondering if you're seeing the OOM killer step in and take processes out. What does your slurm.conf look l

Re: [slurm-users] TotalCPU: sacct reporting inexplicable high values

2019-02-01 Thread Christopher Benjamin Coffey
Nico, yep that’s a very annoying bug as we do the same here with job efficiency. It was patched in 18.08.05. However the db still needs to be cleaned up. We are working on a script to fix this. When we are done, we'll offer it up to the list. Best, Chris — Christopher Coffey High-Performance C

[slurm-users] TotalCPU: sacct reporting inexplicable high values

2019-02-01 Thread nico.faerber
Hi While doing some statistics on efficient CPU usage, I realized that sacct is reporting inexplicable (at least for me) high values for TotalCPU, UserCPU and SystemCPU. Here is a simple example (each job step is a infinite while loop): sacct -j 64338003 --format=jobid,elapsed,ncpus,cputime,

Re: [slurm-users] Mysterious job terminations on Slurm 17.11.10

2019-02-01 Thread Riebs, Andy
Given the extreme amount of output that will be generated for potentially a couple hundred job runs, I was hoping that someone would say “Seen it, here’s how to fix it.” Guess I’ll have to go with the “high output” route. Thanks Doug! Andy From: slurm-users [mailto:slurm-users-boun...@lists.sc