Hi Loris,

On 9/29/22 09:26, Loris Bennett wrote:
Has anyone already come up with a good way to identify non-MPI jobs which
request multiple cores but don't restrict themselves to a single node,
leaving cores idle on all but the first node?

I can see that this is potentially not easy, since an MPI job might have
still have phases where only one core is actually being used.

Just an idea: The "pestat -F" tool[1] will tell you if any nodes have an "unexpected" CPU load. If you see the same JobID runing on multiple nodes with a too low CPU load, that might point to a job such as you describe.

/Ole

[1] https://github.com/OleHolmNielsen/Slurm_tools/tree/master/pestat

Reply via email to