> On Feb 20, 2019, at 7:14 PM, Gilles Gouaillardet <gil...@rist.or.jp> wrote:
> 
> Ryan,
> 
> That being said, the "Alarm clock" message looks a bit suspicious.
> 
> Does it always occur at 20+ minutes elapsed ?
> 
> Is there some mechanism that automatically kills a job if it does not write 
> anything to stdout for some time ?
> 
> A quick way to rule that out is to
> 
> srun -- mpi=pmi2 -p main -t 1:00:00 -n6 -N1 sleep 1800
> 
> and see if that completes or get killed with the same error message.

FWIW, the “sleep” completes just fine:

[novosirj@amarel-test2 testpar]$ sacct -j 84173276 -M perceval -o 
jobid,jobname,start,end,node,state
       JobID    JobName               Start                 End        NodeList 
     State
------------ ---------- ------------------- ------------------- --------------- 
----------
84173276          sleep 2019-02-21T14:46:03 2019-02-21T15:16:03         node077 
 COMPLETED
84173276.ex+     extern 2019-02-21T14:46:03 2019-02-21T15:16:03         node077 
 COMPLETED
84173276.0        sleep 2019-02-21T14:46:03 2019-02-21T15:16:03         node077 
 COMPLETED

--
____
|| \\UTGERS,     |---------------------------*O*---------------------------
||_// the State  |         Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ  | Office of Advanced Research Computing - MSB C630, Newark
     `'

Attachment: signature.asc
Description: Message signed with OpenPGP

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to