Re: [slurm-users] Performance tracking of array tasks

William Dear Tue, 17 May 2022 06:42:42 -0700

> What is the use-case for having users need to self-limit?

Our users self limit jobs with extremely high disk IO requirements.  Some batch 
jobs read/write over 15TB a day and I haven't identified an effective method of 
capping IOPS per user.  We still have issues with the occasional user deciding 
to use SLURM to extract hundreds of 60GB tar.gz files in parallel with no task 
limits.  One of my current goals is to find a method of quickly identifying 
jobs with high IO Wait so that a single user can't DDOS the storage.  
Unfortunately, all jobs using the same storage device end up with high IO wait 
so identifying the culprit requires also comparing total IO per job.



William Dear


________________________________
From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Loris 
Bennett <loris.benn...@fu-berlin.de>
Sent: Tuesday, May 17, 2022 12:46 AM
To: Slurm User Community List <slurm-users@lists.schedmd.com>
Subject: Re: [slurm-users] Performance tracking of array tasks

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you recognize the sender and know the content 
is safe.


Hi,

William Dear <william.d...@i3-corps.com> writes:

> It looks like Brian's suggestion of using SACCT will be the fast answer in 
> the short term so I'll just have to write my own script to aggregate the 
> output.  I was hoping for a canned solution such as XDMoD but haven't found 
> one that quite
> fits our needs.  If there's a list of recommended supporting applications for 
> SLURM I would appreciate that.
>
> One example of how the canned reporting doesn't meet our needs is that my 
> users self limit their arrays such as "--array=1-12000%100".  Technically, 
> the initial job isn't waiting on anything but itself since it only runs 100 
> at a time but
> all the pending array jobs still show up as waiting.  If the partition 
> resources are too low and the job is running less than 100 then it actually 
> is waiting on another job.  The challenge will be determining when a job is 
> self limiting vs
> waiting on a different job.

What is the use-case for having users need to self-limit?  We just rely
on the cap for the maximum number of jobs in an array and on fairshare
to do the rest.

Cheers,

Loris

> Thanks,
>
> William Dear
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Loris 
> Bennett <loris.benn...@fu-berlin.de>
> Sent: Monday, May 16, 2022 9:04 AM
> To: Slurm User Community List <slurm-users@lists.schedmd.com>
> Subject: Re: [slurm-users] Performance tracking of array tasks
>
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open attachments unless you recognize the sender and know the 
> content is safe.
>
> Hi William,
>
> William Dear <william.d...@i3-corps.com> writes:
>
>> Could anyone please recommend methods of tracking the performance of 
>> individual tasks in a task array job?  I have installed XDMoD but it is 
>> focused solely on the Job level with no information about
>> tasks.
>>
>> My users almost exclusively use task arrays to run embarrassingly parallel 
>> jobs.  After the job is complete I would like to see run time and peak RAM 
>> usage per task so that we can correctly size the
>> reservations for future jobs.  It would also be very helpful to break this 
>> down by node so that I can identify poorly performing nodes.
>>
>> William Dear
>
> I'm not sure what you mean by a 'task array job'.  A job can have
> multiple tasks within it - I don't think you will be able to get data on
> such individual tasks very easily.  However, a job array is just a sort
> of convenient wrapper around a bunch of jobs.  Each element of a job
> array still has its own job ID, so you can extract job data the same way
> you do for a non-array job.
>
> Cheers,
>
> Loris
>
> --
> Dr. Loris Bennett (Herr/Mr)
> ZEDAT, Freie Universität Berlin         Email loris.benn...@fu-berlin.de
>
> _____________________________________
> Confidentiality Notice - The information contained in this e-mail and any 
> attachments to it may be legally privileged and include confidential 
> information. If you are not the intended recipient, be aware that any 
> disclosure,
> distribution or copying of this e-mail or its attachments is prohibited. If 
> you have received this e-mail in error, please notify the sender immediately 
> of that fact by return e-mail and permanently delete the e-mail and any 
> attachments
> to it.

Re: [slurm-users] Performance tracking of array tasks

Reply via email to