I should probably add some example output:
Someone we need to talk to:
Node | Memory (GB) | CPUs
Hostname Alloc Max Cur Alloc Used Eff%
m8-10-5 19.5 0 0 1 0.00 0
*m8-10-2 19.5 2.3 2.2 1 0.99 99
m8-10-3 19.5 0 0 1 0.00 0
m8-10-4 19.5 0 0 1 0.00 0
* denotes the node where the batch script executes (node 0)
CPU usage is cumulative since the start of the job
Much better:
Node | Memory (GB) | CPUs
Hostname Alloc Max Cur Alloc Used Eff%
m9-48-2 112.0 21.1 19.3 16 15.97 99
m9-48-3 98.0 18.5 16.8 14 13.98 99
m9-16-3 112.0 20.9 19.2 16 15.97 99
m9-44-1 112.0 21.0 19.2 16 15.97 99
m9-43-3 119.0 22.3 20.4 17 16.97 99
m9-44-2 112.0 21.2 19.3 16 15.98 99
m9-14-4 112.0 21.0 19.2 16 15.97 99
m9-46-4 119.0 22.5 20.5 17 16.97 99
*m9-10-2 91.0 32.0 15.8 13 12.81 98
m9-43-1 119.0 22.3 20.4 17 16.97 99
m9-16-1 126.0 23.9 21.6 18 17.97 99
m9-47-4 119.0 22.4 20.5 17 16.97 99
m9-43-4 119.0 22.4 20.5 17 16.97 99
m9-48-1 84.0 15.7 14.4 12 11.98 99
m9-42-4 119.0 22.2 20.3 17 16.97 99
m9-43-2 119.0 22.2 20.4 17 16.97 99
* denotes the node where the batch script executes (node 0)
CPU usage is cumulative since the start of the job
Ryan
On 09/19/2016 11:13 AM, Ryan Cox wrote:
We use this script that we cobbled together:
https://github.com/BYUHPC/slurm-random/blob/master/rjobstat. It
assumes that you're using cgroups. It uses ssh to connect to each
node so it's not very scalable but it works well enough for us.
Ryan
On 09/18/2016 06:42 PM, Igor Yakushin wrote:
how to monitor CPU/RAM usage on each node of a slurm job? python API?
Hi All,
I'd like to be able to see for a given jobid how much resources are
used by a job on each node it is running on at this moment. Is there
a way to do it?
So far it looks like I have to script it: get the list of the
involved nodes using, for example, squeue or qstat, ssh to each node
and find all the user processes (not 100% guaranteed that they would
be from the job I am interested in: is there a way to find UNIX pids
corresponding to Slurm jobid?).
Another question: is there python API to slurm? I found pyslurm but
so far it would not build with my version of Slurm.
Thank you,
Igor
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University