Thanks Kevin and Simon,

The full thing that you do is indeed overkill, however I was able to learn
how to collect/parse some of the information I need.

What I am still unable to get is:

- utilization by queue (or list of node names), to track actual use of
expensive resources such as GPUs, high memory nodes, etc
- statistics about wait-in-queue for jobs, due to unavailable resources

hopefully both in a sreport-like format by user and by overall system

I suspect this information is available in sacct, but needs some
massaging/consolidation to become useful for what I am looking for. Perhaps
either (or both) of your scripts already do that in some place that I did
not find? That would be terrific, and I'd appreciate it if you can point me
to its place.

Thanks again!

On Tue, Aug 20, 2024 at 9:09 AM Kevin Broch via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Heavyweight solution (although if you have grafana and prometheus going
> already a little less so):
> https://github.com/rivosinc/prometheus-slurm-exporter
>
> On Tue, Aug 20, 2024 at 12:40 AM Simon Andrews via slurm-users <
> slurm-users@lists.schedmd.com> wrote:
>
>> Possibly a bit more elaborate than you want but I wrote a web based
>> monitoring system for our cluster.  It mostly uses standard slurm commands
>> for job monitoring, but I've also added storage monitoring which requires a
>> separate cron job to run every night.  It was written for our cluster, but
>> probably wouldn't take much work to adapt to another cluster with similar
>> structure.
>>
>> You can see the code and some screenshots at:
>>
>>  https://github.com/s-andrews/capstone_monitor
>>
>> ..and there's a video walk through at:
>>
>> https://vimeo.com/982985174
>>
>> We've also got more friendly scripts for monitoring current and past jobs
>> on the command line.  These are in a private repository as some of the
>> other information there is more sensitive but I'm happy to share those
>> scripts.  You can see the scripts being used in
>> https://vimeo.com/982986202
>>
>> Simon.
>>
>> -----Original Message-----
>> From: Paul Edmon via slurm-users <slurm-users@lists.schedmd.com>
>> Sent: 09 August 2024 16:12
>> To: slurm-users@lists.schedmd.com
>> Subject: [slurm-users] Print Slurm Stats on Login
>>
>> We are working to make our users more aware of their usage. One of the
>> ideas we came up with was to having some basic usage stats printed at login
>> (usage over past day, fairshare, job efficiency, etc). Does anyone have any
>> scripts or methods that they use to do this? Before baking my own I was
>> curious what other sites do and if they would be willing to share their
>> scripts and methodology.
>>
>> -Paul Edmon-
>>
>>
>> --
>> slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe
>> send an email to slurm-users-le...@lists.schedmd.com
>>
>> ------------------------------------
>> This email has been scanned for spam & viruses. If you believe this email
>> should have been stopped by our filters, click the following link to report
>> it (
>> https://portal-uk.mailanyone.net/index.html#/outer/reportspam?token=dXNlcj1zaW1vbi5hbmRyZXdzQGJhYnJhaGFtLmFjLnVrO3RzPTE3MjMyMTY5MzA7dXVpZD02NkI2MzQyMTY5MzU2Q0YwRThDQzI5RTY4MkMxOEY5Mjt0b2tlbj01MjI1ZmJmYzJjODgzNWM3ZDE2ZGRiOTE2ZjIxYzk4MjliMjY2MjA0Ow%3D%3D
>> ).
>>
>> --
>> slurm-users mailing list -- slurm-users@lists.schedmd.com
>> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to