> On Feb 20, 2019, at 7:14 PM, Gilles Gouaillardet <gil...@rist.or.jp> wrote: > > Ryan, > > That being said, the "Alarm clock" message looks a bit suspicious. > > Does it always occur at 20+ minutes elapsed ? > > Is there some mechanism that automatically kills a job if it does not write > anything to stdout for some time ? > > A quick way to rule that out is to > > srun -- mpi=pmi2 -p main -t 1:00:00 -n6 -N1 sleep 1800 > > and see if that completes or get killed with the same error message.
FWIW, the “sleep” completes just fine: [novosirj@amarel-test2 testpar]$ sacct -j 84173276 -M perceval -o jobid,jobname,start,end,node,state JobID JobName Start End NodeList State ------------ ---------- ------------------- ------------------- --------------- ---------- 84173276 sleep 2019-02-21T14:46:03 2019-02-21T15:16:03 node077 COMPLETED 84173276.ex+ extern 2019-02-21T14:46:03 2019-02-21T15:16:03 node077 COMPLETED 84173276.0 sleep 2019-02-21T14:46:03 2019-02-21T15:16:03 node077 COMPLETED -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `'
signature.asc
Description: Message signed with OpenPGP
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users