Thanks Kevin and Simon, The full thing that you do is indeed overkill, however I was able to learn how to collect/parse some of the information I need.
What I am still unable to get is: - utilization by queue (or list of node names), to track actual use of expensive resources such as GPUs, high memory nodes, etc - statistics about wait-in-queue for jobs, due to unavailable resources hopefully both in a sreport-like format by user and by overall system I suspect this information is available in sacct, but needs some massaging/consolidation to become useful for what I am looking for. Perhaps either (or both) of your scripts already do that in some place that I did not find? That would be terrific, and I'd appreciate it if you can point me to its place. Thanks again! On Tue, Aug 20, 2024 at 9:09 AM Kevin Broch via slurm-users < slurm-users@lists.schedmd.com> wrote: > Heavyweight solution (although if you have grafana and prometheus going > already a little less so): > https://github.com/rivosinc/prometheus-slurm-exporter > > On Tue, Aug 20, 2024 at 12:40 AM Simon Andrews via slurm-users < > slurm-users@lists.schedmd.com> wrote: > >> Possibly a bit more elaborate than you want but I wrote a web based >> monitoring system for our cluster. It mostly uses standard slurm commands >> for job monitoring, but I've also added storage monitoring which requires a >> separate cron job to run every night. It was written for our cluster, but >> probably wouldn't take much work to adapt to another cluster with similar >> structure. >> >> You can see the code and some screenshots at: >> >> https://github.com/s-andrews/capstone_monitor >> >> ..and there's a video walk through at: >> >> https://vimeo.com/982985174 >> >> We've also got more friendly scripts for monitoring current and past jobs >> on the command line. These are in a private repository as some of the >> other information there is more sensitive but I'm happy to share those >> scripts. You can see the scripts being used in >> https://vimeo.com/982986202 >> >> Simon. >> >> -----Original Message----- >> From: Paul Edmon via slurm-users <slurm-users@lists.schedmd.com> >> Sent: 09 August 2024 16:12 >> To: slurm-users@lists.schedmd.com >> Subject: [slurm-users] Print Slurm Stats on Login >> >> We are working to make our users more aware of their usage. One of the >> ideas we came up with was to having some basic usage stats printed at login >> (usage over past day, fairshare, job efficiency, etc). Does anyone have any >> scripts or methods that they use to do this? Before baking my own I was >> curious what other sites do and if they would be willing to share their >> scripts and methodology. >> >> -Paul Edmon- >> >> >> -- >> slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe >> send an email to slurm-users-le...@lists.schedmd.com >> >> ------------------------------------ >> This email has been scanned for spam & viruses. If you believe this email >> should have been stopped by our filters, click the following link to report >> it ( >> https://portal-uk.mailanyone.net/index.html#/outer/reportspam?token=dXNlcj1zaW1vbi5hbmRyZXdzQGJhYnJhaGFtLmFjLnVrO3RzPTE3MjMyMTY5MzA7dXVpZD02NkI2MzQyMTY5MzU2Q0YwRThDQzI5RTY4MkMxOEY5Mjt0b2tlbj01MjI1ZmJmYzJjODgzNWM3ZDE2ZGRiOTE2ZjIxYzk4MjliMjY2MjA0Ow%3D%3D >> ). >> >> -- >> slurm-users mailing list -- slurm-users@lists.schedmd.com >> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com >> > > -- > slurm-users mailing list -- slurm-users@lists.schedmd.com > To unsubscribe send an email to slurm-users-le...@lists.schedmd.com >
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com