Hi,

William Dear <william.d...@i3-corps.com> writes:

> It looks like Brian's suggestion of using SACCT will be the fast answer in 
> the short term so I'll just have to write my own script to aggregate the 
> output.  I was hoping for a canned solution such as XDMoD but haven't found 
> one that quite
> fits our needs.  If there's a list of recommended supporting applications for 
> SLURM I would appreciate that.
>
> One example of how the canned reporting doesn't meet our needs is that my 
> users self limit their arrays such as "--array=1-12000%100".  Technically, 
> the initial job isn't waiting on anything but itself since it only runs 100 
> at a time but
> all the pending array jobs still show up as waiting.  If the partition 
> resources are too low and the job is running less than 100 then it actually 
> is waiting on another job.  The challenge will be determining when a job is 
> self limiting vs
> waiting on a different job.

What is the use-case for having users need to self-limit?  We just rely
on the cap for the maximum number of jobs in an array and on fairshare
to do the rest.

Cheers,

Loris

> Thanks,
>
> William Dear
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Loris 
> Bennett <loris.benn...@fu-berlin.de>
> Sent: Monday, May 16, 2022 9:04 AM
> To: Slurm User Community List <slurm-users@lists.schedmd.com>
> Subject: Re: [slurm-users] Performance tracking of array tasks 
>  
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open attachments unless you recognize the sender and know the 
> content is safe.
>
> Hi William,
>
> William Dear <william.d...@i3-corps.com> writes:
>
>> Could anyone please recommend methods of tracking the performance of 
>> individual tasks in a task array job?  I have installed XDMoD but it is 
>> focused solely on the Job level with no information about
>> tasks.
>>
>> My users almost exclusively use task arrays to run embarrassingly parallel 
>> jobs.  After the job is complete I would like to see run time and peak RAM 
>> usage per task so that we can correctly size the
>> reservations for future jobs.  It would also be very helpful to break this 
>> down by node so that I can identify poorly performing nodes.
>>
>> William Dear
>
> I'm not sure what you mean by a 'task array job'.  A job can have
> multiple tasks within it - I don't think you will be able to get data on
> such individual tasks very easily.  However, a job array is just a sort
> of convenient wrapper around a bunch of jobs.  Each element of a job
> array still has its own job ID, so you can extract job data the same way
> you do for a non-array job.
>
> Cheers,
>
> Loris
>
> --
> Dr. Loris Bennett (Herr/Mr)
> ZEDAT, Freie Universität Berlin         Email loris.benn...@fu-berlin.de
>
> _____________________________________
> Confidentiality Notice - The information contained in this e-mail and any 
> attachments to it may be legally privileged and include confidential 
> information. If you are not the intended recipient, be aware that any 
> disclosure,
> distribution or copying of this e-mail or its attachments is prohibited. If 
> you have received this e-mail in error, please notify the sender immediately 
> of that fact by return e-mail and permanently delete the e-mail and any 
> attachments
> to it.

Reply via email to