It looks like Brian's suggestion of using SACCT will be the fast answer in the 
short term so I'll just have to write my own script to aggregate the output.  I 
was hoping for a canned solution such as XDMoD but haven't found one that quite 
fits our needs.  If there's a list of recommended supporting applications for 
SLURM I would appreciate that.

One example of how the canned reporting doesn't meet our needs is that my users 
self limit their arrays such as "--array=1-12000%100".  Technically, the 
initial job isn't waiting on anything but itself since it only runs 100 at a 
time but all the pending array jobs still show up as waiting.  If the partition 
resources are too low and the job is running less than 100 then it actually is 
waiting on another job.  The challenge will be determining when a job is self 
limiting vs waiting on a different job.


Thanks,

William Dear


________________________________
From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Loris 
Bennett <loris.benn...@fu-berlin.de>
Sent: Monday, May 16, 2022 9:04 AM
To: Slurm User Community List <slurm-users@lists.schedmd.com>
Subject: Re: [slurm-users] Performance tracking of array tasks

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you recognize the sender and know the content 
is safe.


Hi William,

William Dear <william.d...@i3-corps.com> writes:

> Could anyone please recommend methods of tracking the performance of 
> individual tasks in a task array job?  I have installed XDMoD but it is 
> focused solely on the Job level with no information about
> tasks.
>
> My users almost exclusively use task arrays to run embarrassingly parallel 
> jobs.  After the job is complete I would like to see run time and peak RAM 
> usage per task so that we can correctly size the
> reservations for future jobs.  It would also be very helpful to break this 
> down by node so that I can identify poorly performing nodes.
>
> William Dear

I'm not sure what you mean by a 'task array job'.  A job can have
multiple tasks within it - I don't think you will be able to get data on
such individual tasks very easily.  However, a job array is just a sort
of convenient wrapper around a bunch of jobs.  Each element of a job
array still has its own job ID, so you can extract job data the same way
you do for a non-array job.

Cheers,

Loris

--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin         Email loris.benn...@fu-berlin.de

_____________________________________
Confidentiality Notice - The information contained in this e-mail and any 
attachments to it may be legally privileged and include confidential 
information. If you are not the intended recipient, be aware that any 
disclosure, distribution or copying of this e-mail or its attachments is 
prohibited. If you have received this e-mail in error, please notify the sender 
immediately of that fact by return e-mail and permanently delete the e-mail and 
any attachments to it.

Reply via email to