On Tuesday, 6 October 2020 7:53:02 AM PDT Jason Simms wrote: > I currently don't have a MaxTime defined, because how do I know how long a > job will take? Most jobs on my cluster require no more than 3-4 days, but > in some cases at other campuses, I know that jobs can run for weeks. I > suppose even setting a time limit such as 4 weeks would be overkill, but at > least it's not infinite. I'm curious what others use as that value, and how > you arrived at it
My journey over the last 16 years in HPC has been one of decreasing time limits, back in 2003 with VPAC's first Linux cluster we had no time limits, we then introduced a 90 day limit so we could plan quarterly maintenances (and yes, we had users who had jobs which legitimately ran longer than that, so they had to learn to checkpoint). At VLSCI we had 30 day limits (life sciences, so many long running poorly scaling jobs), then when I was at Swinburne it was a 7 day limit, and now here at NERSC we've got 2 day limits. It really is down to what your use cases are and how much influence you have over your users. It's often the HPC sysadmins responsibility to try and find that balance between good utilisation, effective use of the system and reaching the desired science/research/development outcomes. Best of luck! Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA