Re: [slurm-users] Fwd: Getting information about AssocGrpCPUMinutesLimit for a job

Ole Holm Nielsen Sun, 11 Aug 2019 07:20:22 -0700

Andreas made a good suggestion of looking at the user's TRESRunMin fromsshare in order to answer Jeff's question about AssocGrpCPUMinutesLimitfor a job. However, getting at this information is in practice reallycomplicated, and I don't think any ordinary user will bother to look it up.

Due to this complexity, I have added some new functionality to my"showjob" script available fromhttps://github.com/OleHolmNielsen/Slurm_tools/tree/master/jobs.

The "showjob" tool now tries to extract the information by combining thesshare, squeue, and sacctmgr commands. The job reasonsAssocGrpCPUMinutesLimit as well as AssocGrpCpuLimit are treated.


An example output for a job is:

$ showjob  1347368

Job 1347368 of user xxx in account yyy has a jobstate=PENDING withreason=AssocGrpCpuLimit


Information about GrpCpuLimit:
User GrpTRES limit is:     cpu=1600
Current user TRES is:      cpu=1360
This job requires TRES:    cpu=960
...

I think some end users might find this information useful.

Could I ask any interested sites to test the "showjob" tool to see ifthe logic works also in their environment? Please send me feedback sothat I may possibly improve the tool.


Best regards,
Ole


On 09-08-2019 08:00, Henkel, Andreas wrote:

Users may call sshare -l and have a look at the TRESRunMin. There the
number  of  TRES-minutes  allocated  by jobs currently running against
the account is listed. With a little math (cpu*timelimit) about the job
in question the users should be able to figure this out. At least they
wouldn't need the debug level increased ot a log file.

Best,

Andreas

On 8/7/19 8:47 PM, Sarlo, Jeffrey S wrote:
We had a job queued waiting for resources and when we changed thedebug level, we were able to get the following in the slurmctld.log file.
[2019-08-02T10:03:47.347] debug2: JobId=804633 being held, the job isat or exceeds assoc 50(jeff/(null)/(null)) group max tres(cpu)minutes of 30000000 of which 1436396 are still available but requestis for 1440000 (plus 0 already in use) tres minutes (request trescount 80)
We were then able to see that we just needed to lower the timelimitfor the job a little.
Is there a way a user can get this same type of information for ajob, without having to change the slurm debug level and then lookingin a log file?

Re: [slurm-users] Fwd: Getting information about AssocGrpCPUMinutesLimit for a job

Reply via email to