Agreed the point of greater responsibility but even rm -r ( without f) gives a warning. In this case should slurm have that option ( forced) especially if it can immediately kill a running job?
On Thu, 6 Jul 2023, 18:16 Jason Simms, <jsim...@swarthmore.edu> wrote: > An unfortunate example of the “with great power comes great > responsibility” maxim. Linux will gleefully let you rm -fr your entire > system, drop production databases, etc., provided you have the right > privileges. Ask me how I know… > > Still, I get the point. Would it be possible to somehow ask for > confirmation if you are setting a max time that is less than the current > walltime? Perhaps. Could you script that yourself? Yes, I’m certain of it. > Those kind of built-in safeguards aren’t super common, however. > > Jason > > On Thu, Jul 6, 2023 at 12:55 PM Amjad Syed <amjad...@gmail.com> wrote: > >> Yes, the initial End Time was 7-00:00:00 but it allowed the typo >> (16:00:00) which caused the jobs to be killed without warning >> >> Amjad >> >> On Thu, Jul 6, 2023 at 5:27 PM Bernstein, Noam CIV USN NRL (6393) >> Washington DC (USA) <noam.bernst...@nrl.navy.mil> wrote: >> >>> Is the issue that the error in the time made it shorter than the time >>> the job had already run, so it killed it immediately? >>> >>> On Jul 6, 2023, at 12:04 PM, Jason Simms <jsim...@swarthmore.edu> wrote: >>> >>> No, not a bug, I would say. When the time limit is reached, that's it, >>> job dies. I wouldn't be aware of a way to manage that. Once the time limit >>> is reached, it wouldn't be a hard limit if you then had to notify the user >>> and then... what? How long would you give them to extend the time? Wouldn't >>> be much of a limit if a job can be extended, plus that would throw off the >>> scheduler/estimator. I'd chalk it up to an unfortunate typo. >>> >>> Jason >>> >>> On Thu, Jul 6, 2023 at 11:54 AM Amjad Syed <amjad...@gmail.com> wrote: >>> >>>> Hello >>>> >>>> We were trying to increase the time limit of a slurm running job >>>> >>>> scontrol update job=<jobid> TimeLimit=16-00:00:00 >>>> >>>> But we accidentally got it to 16 hours >>>> >>>> scontrol update job=<jobid> TimeLimit=16:00:00 >>>> >>>> This actually timeout and killed the running job and did not give any >>>> notification >>>> >>>> Is this a bug, should not the user be warned that this job will be >>>> killled ? >>>> >>>> Amjad >>>> >>>> >>> >>> -- >>> *Jason L. Simms, Ph.D., M.P.H.* >>> Manager of Research Computing >>> Swarthmore College >>> Information Technology Services >>> (610) 328-8102 >>> Schedule a meeting: https://calendly.com/jlsimms >>> >>> >>> >>> >>> >>> >>> >>> >>> *U.S. NAVAL * >>> >>> >>> *RESEARCH * >>> >>> LABORATORY >>> Noam Bernstein, Ph.D. >>> Center for Materials Physics and Technology >>> U.S. Naval Research Laboratory >>> T +1 202 404 8628 F +1 202 404 7546 >>> https://www.nrl.navy.mil >>> >>> >>> -- > *Jason L. Simms, Ph.D., M.P.H.* > Manager of Research Computing > Swarthmore College > Information Technology Services > (610) 328-8102 > Schedule a meeting: https://calendly.com/jlsimms >