Hi Loris,

Thanks so much for your relevant comments!

On 07/21/2017 12:00 PM, Loris Bennett wrote:

Hi Ole,

Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk> writes:

As a small contribution to the Slurm community, I've moved my collection of
Slurm tools to GitHub at https://github.com/OleHolmNielsen/Slurm_tools.  These
are tools which I feel makes the daily cluster monitoring and management a
little easier.

The following Slurm tools are available:

* pestat Prints a Slurm cluster nodes status with 1 line per node and job info.

* slurmreportmonth Generate monthly accounting statistics from Slurm using the
sreport command.

* showuserjobs Print the current node status and batch jobs status broken down
into userids.

* slurmibtopology Infiniband topology tool for Slurm.

* Slurm triggers scripts.

* Scripts for managing nodes.

* Scripts for managing jobs.

The tools "pestat" and "slurmibtopology" have previously been announced to this
list, but future updates will be on GitHub only.

I would also like to mention our Slurm deployment HowTo guide at
https://wiki.fysik.dtu.dk/niflheim/SLURM

/Ole

Thanks for sharing your tools.  Here are some brief comments

- psjob/psnode
   - The USERLIST variable makes the commands a bit brittle, since ps
     will fail if you pass an unknown username.

Good point!

- showuserjobs
   - Doesn't handle usernames longer than 8-chars (we have longer names)

Good point!

   - The grouping doesn't seem quite correct.  As shown in the example
     below, not all the users of the group appear under the group total
     for the appropriate group:

I tried to make the "sort" command do the final sorting, but I couldn't make it to the GROUP_TOTAL first. Maybe I have to move the sorting into the awk code...

Username Jobs CPUs Jobs CPUs Group Further info
     ========    ==== =====   ==== =====  ========  
=============================
     GRAND_TOTAL  168  1089     55   451  ALL       running+idle=1540 CPUs 29 
users
     GROUP_TOTAL   56   349     10   119  group01   running+idle=468 CPUs 8 
users
     user01        27   324      4    52  group02   One, User
     GROUP_TOTAL   27   324      4    52  group02   running+idle=376 CPUs 1 
users
     user02        29   174      1     6  group01   Two, User
     GROUP_TOTAL    5   148     18   208  group03   running+idle=356 CPUs 4 
users
     user03         3   120     16   176  group03   Three, User
     user04        11    96      3    48  group01   Four, User
     ...
In general, maybe it would good to have a common config file, where things such as
paths to binaries, USERLIST and username lengths are defined.

Yes, but what's the best way for this? I'd like to scripts to be self-contained so people can pick what they need without doing additional setups for users and sysadmins.

/Ole

Reply via email to