:) That’s absolutely true… but secondary. The SLURM job won’t end itself, but it has been chewing CPU in any case. There is no runaway process on the compute node — it’s clean and clear.
Does anyone else ever see this? Cheers, Alden > On Jun 9, 2017, at 12:17 PM, Michael Jennings <[email protected]> wrote: > > > On Thursday, 08 June 2017, at 11:38:20 (-0700), > Stradling, Alden Reid (ars9ac) wrote: > >> JobName State Start Elapsed CPUTime >> ---------- ---------- ------------------- ---------- ---------- >> PBMC_6c_0+ SUSPENDED 2017-06-06T16:39:51 1-21:12:55 15-01:43:20 >> >> That 15 days??? not really possible since the job started two days ago. > > Yes, it is actually possible. Notice that you're looking at CPU TIME, > not Elapsed Time (the prior column). Your job used 8 CPUs, so you > have to multiply the elapsed time (just over 45 hours) by the number > of CPUs you used. If you do that, you'll get 1,302,072 CPU seconds, > or just over 15 days. :-) > > Michael > > -- > Michael E. Jennings <[email protected]> > HPC Systems Team, Los Alamos National Laboratory > Bldg. 03-2327, Rm. 2341 W: +1 (505) 606-0605
