Hi Jason,

we intend to have a maximum wallclock time of 5 days. We chose this, to have 
the possibility to do a timely maintenance without disturbing and killing the 
users jobs. Yet we see that some users and / or codes need a longer runtime. 
That is why we set the maxtime for the partitions to 30 days. Our users must 
write a proposal if they need a larger amount of core hours. There they have to 
justify why they need a longer runtime than 5 days. This maxtime is limited by 
the association created for the triple (account,user,partition). When we need 
to do a timely maintenance, we kill such long running jobs. Our users know that.

Our default time is set to 15 minutes.

Best
Marcus

Am 06.10.2020 um 16:53 schrieb Jason Simms:
FWIW, I define the DefaultTime as 5 minutes, which effectively means for any 
"real" job that users must actually define a time. It helps users get into that 
habit, because in the absence of a DefaultTime, most will not even bother to think 
critically and carefully about what time limit is actually reasonable, which is important 
for, e.g., effective job backfill and scheduling estimations.

I currently don't have a MaxTime defined, because how do I know how long a job 
will take? Most jobs on my cluster require no more than 3-4 days, but in some 
cases at other campuses, I know that jobs can run for weeks. I suppose even 
setting a time limit such as 4 weeks would be overkill, but at least it's not 
infinite. I'm curious what others use as that value, and how you arrived at it.

Warmest regards,
Jason

On Tue, Oct 6, 2020 at 5:55 AM John H <j...@sdf.org <mailto:j...@sdf.org>> 
wrote:

    Yes I hadn't considered that! Thanks for the tip, Michael I shall do that.

    John

    On Fri, Oct 02, 2020 at 01:49:44PM +0000, Renfro, Michael wrote:
     > Depending on the users who will be on this cluster, I'd probably adjust 
the partition to have a defined, non-infinite MaxTime, and maybe a lower 
DefaultTime. Otherwise, it would be very easy for someone to start a job that 
reserves all cores until the nodes get rebooted, since all they have to do is 
submit a job with no explicit time limit (which would then use DefaultTime, which 
itself has a default value of MaxTime).
     >



--
*Jason L. Simms, Ph.D., M.P.H.*
Manager of Research and High-Performance Computing
XSEDE Campus Champion
Lafayette College
Information Technology Services
710 Sullivan Rd | Easton, PA 18042
Office: 112 Skillman Library
p: (610) 330-5632

--
Dipl.-Inf. Marcus Wagner

IT Center
Gruppe: Systemgruppe Linux
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth-aachen.de
www.itc.rwth-aachen.de

Social Media Kanäle des IT Centers:
https://blog.rwth-aachen.de/itc/
https://www.facebook.com/itcenterrwth
https://www.linkedin.com/company/itcenterrwth
https://twitter.com/ITCenterRWTH
https://www.youtube.com/channel/UCKKDJJukeRwO0LP-ac8x8rQ

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to