Hi Loris,
On 9/29/22 09:26, Loris Bennett wrote:
Has anyone already come up with a good way to identify non-MPI jobs which
request multiple cores but don't restrict themselves to a single node,
leaving cores idle on all but the first node?
I can see that this is potentially not easy, since an MPI job might have
still have phases where only one core is actually being used.
Just an idea: The "pestat -F" tool[1] will tell you if any nodes have an
"unexpected" CPU load. If you see the same JobID runing on multiple nodes
with a too low CPU load, that might point to a job such as you describe.
/Ole
[1] https://github.com/OleHolmNielsen/Slurm_tools/tree/master/pestat