[slurm-dev] Re: how to monitor CPU/RAM usage on each node of a slurm job? python API?

2016-09-19 Thread Nicholas McCollum
I attempted to make something somewhat similar to this with the intent of finding users who submit highly inefficiant jobs. I wanted it to be real-time, no extra plugins and work for users to see how their jobs scaled. It uses only sacct and sstat. For anyone with sudo access, it will show

[slurm-dev] Re: how to monitor CPU/RAM usage on each node of a slurm job? python API?

2016-09-19 Thread Rémi Palancher
Hi Carlos, Le 19/09/2016 à 18:08, Carlos Fenoy a écrit : Hi All, I'm working on a plugin that stores performance information of every task of every job in influxdb. This can be visualized easily with Grafana and provides information of cpu used and memory used as well as read and writes from f

[slurm-dev] Re: how to monitor CPU/RAM usage on each node of a slurm job? python API?

2016-09-19 Thread Carlos Fenoy
The advantage of this plugin over the HDF5 one is that you get the results in almost realtime. The plugin has a small buffer to prevent spamming too much the influxdb server. Sent from my iPhone > On 19 Sep 2016, at 18:57, Igor Yakushin wrote: > > Hi Carlos, > Can one get results in real tim

[slurm-dev] Re: how to monitor CPU/RAM usage on each node of a slurm job? python API?

2016-09-19 Thread Ryan Cox
I should probably add some example output: Someone we need to talk to: Node | Memory (GB) | CPUs Hostname AllocMaxCur Alloc Used Eff% m8-10-519.5 0 0 1 0.00 0 *m8-10-219.52.32.2 1 0.9999 m8-

[slurm-dev] Re: how to monitor CPU/RAM usage on each node of a slurm job? python API?

2016-09-19 Thread Ryan Cox
We use this script that we cobbled together: https://github.com/BYUHPC/slurm-random/blob/master/rjobstat. It assumes that you're using cgroups. It uses ssh to connect to each node so it's not very scalable but it works well enough for us. Ryan On 09/18/2016 06:42 PM, Igor Yakushin wrote: ho

[slurm-dev] Re: how to monitor CPU/RAM usage on each node of a slurm job? python API?

2016-09-19 Thread Igor Yakushin
Hi Carlos, Can one get results in real time as the job is running or only once it is finished? Thank you, Igor On Mon, Sep 19, 2016 at 11:07 AM, Carlos Fenoy wrote: > Hi All, > > I'm working on a plugin that stores performance information of every task > of every job in influxdb. This can be vi

[slurm-dev] Re: how to monitor CPU/RAM usage on each node of a slurm job? python API?

2016-09-19 Thread Carlos Fenoy
Hi All, I'm working on a plugin that stores performance information of every task of every job in influxdb. This can be visualized easily with Grafana and provides information of cpu used and memory used as well as read and writes from filesystems. This plugin is using the profile capability of sl

[slurm-dev] Re: how to monitor CPU/RAM usage on each node of a slurm job? python API?

2016-09-19 Thread Torres, Giovanni
On 9/19/16, 11:33 AM, "Igor Yakushin" wrote: > > Hi Giovanni, > We have just upgraded to 16.05.4. > When I try building pyslurm, it says that version 2.6 of Slurm is required. This thread should move to the PySlurm google group instead: https://groups.google.com/forum/#!forum/pyslurm. You can

[slurm-dev] Re: how to monitor CPU/RAM usage on each node of a slurm job? python API?

2016-09-19 Thread Igor Yakushin
Hi Giovani, We have just upgraded to 16.05.4. When I try building pyslurm, it says that version 2.6 of Slurm is required. Thank you, Igor On Mon, Sep 19, 2016 at 8:09 AM, Torres, Giovanni wrote: > On 9/18/16, 8:41 PM, "Igor Yakushin" wrote: > > > > Hi All, > > > > > > I'd like to be able to se

[slurm-dev] Re: how to monitor CPU/RAM usage on each node of a slurm job? python API?

2016-09-19 Thread Peter A Ruprecht
>> Reply-To: slurm-dev mailto:slurm-dev@schedmd.com>> Date: Sunday, September 18, 2016 at 9:57 PM To: slurm-dev mailto:slurm-dev@schedmd.com>> Subject: [slurm-dev] Re: how to monitor CPU/RAM usage on each node of a slurm job? python API? Hi Peter, Ganglia plugin would be interesti

[slurm-dev] Re: how to monitor CPU/RAM usage on each node of a slurm job? python API?

2016-09-19 Thread Torres, Giovanni
On 9/18/16, 8:41 PM, "Igor Yakushin" wrote: > > Hi All, > > > I'd like to be able to see for a given jobid how much resources are used by a > job on each node it is running on at this moment. Is there a way to do it? > > So far it looks like I have to script it: get the list of the involved

[slurm-dev] Re: how to monitor CPU/RAM usage on each node of a slurm job? python API?

2016-09-18 Thread Daniel Letai
Sorry, didn't notice you wanted real time profiling. As mentioned, you can use sstat, or if you wish to probe each node for the local resources used, you could: xdsh $(squeue -h -j $SLURM_JOBID) "ps -q $(scontrol listpids $SLURM_JOBID | awk 'NR>1{print $1}' ORS=',')" On 09/19/2016 08:28

[slurm-dev] Re: how to monitor CPU/RAM usage on each node of a slurm job? python API?

2016-09-18 Thread Daniel Letai
how to monitor CPU/RAM usage on each node of a slurm job? python API? You should use HDF5 1http://slurm.schedmd.com/hdf5_profile_user_guide.html On 09/19/2016 03:41 AM, Igor Yakushin wrote: Hi All, I'd like to be able to see for a given jobid how much resources are used b

[slurm-dev] Re: how to monitor CPU/RAM usage on each node of a slurm job? python API?

2016-09-18 Thread Igor Yakushin
Hi Peter, Ganglia plugin would be interesting. How do ganglia clients on different nodes communicate? Typically they do not talk to each other but only to the central node. However, to decide that they are part of the same job, they somehow need to talk to each other? Thank you, Igor On Sun, Sep

[slurm-dev] Re: how to monitor CPU/RAM usage on each node of a slurm job? python API?

2016-09-18 Thread Igor Yakushin
Hi Lachlan, At first glance sstat does not give the kind of information I want. It does not break it into separate cores and only gives some aggregated statistics instead of the current one. I looked at all binaries in slurm bin directory and it does not look like any of them do exactly what I need

[slurm-dev] Re: how to monitor CPU/RAM usage on each node of a slurm job? python API?

2016-09-18 Thread Lachlan Musicman
Gah, yes. sstat, not sinfo. -- The most dangerous phrase in the language is, "We've always done it this way." - Grace Hopper On 19 September 2016 at 13:00, Peter A Ruprecht wrote: > Igor, > > Would sstat give you what you need? (http://slurm.schedmd.com/sstat.html) > It doesn't update in

[slurm-dev] Re: how to monitor CPU/RAM usage on each node of a slurm job? python API?

2016-09-18 Thread Peter A Ruprecht
Igor, Would sstat give you what you need? (http://slurm.schedmd.com/sstat.html) It doesn't update instantaneously but at least a few times a minute. If you want to get fancy, I believe that xdmod can integrate with TACC-stats to provide graphs about what is happening inside a job but I'm not

[slurm-dev] Re: how to monitor CPU/RAM usage on each node of a slurm job? python API?

2016-09-18 Thread Lachlan Musicman
Also, if you have slurm installed on a deb based distro, you can try this https://github.com/edf-hpc/slurm-web I tried to get it running on RPM (Centos) but it is too tightly coupled to deb for my ability to port it. cheers L. -- The most dangerous phrase in the language is, "We've always d

[slurm-dev] Re: how to monitor CPU/RAM usage on each node of a slurm job? python API?

2016-09-18 Thread Lachlan Musicman
I think you need a couple of things going on: 1. you have to have some sort of accounting organised and set up 2. your sbatch scripts need to use: srun not just 3. sinfo should then work on the job number. When I asked, that was the response iirc. cheers L. -- The most dangerous phrase i