Am 25.09.2015 um 06:45 schrieb Lane, William:

> I'm running this on a development cluster and testing implementing
> h_rt limits and job status email functionality.
> 
> Job 187 (mpirun) Aborted
> Exit Status      = 0
> Signal           = KILL
> User             = lanew
> Queue            = short.q@cscld1-0-2
> Host             = cscld1-0-2.local
> Start Time       = 09/22/2015 23:18:58
> End Time         = 09/22/2015 23:19:00

2 seconds is rather short. Does it happen with longer runtimes too? For 
parallel jobs, it may happen on any of the slave nodes first, and hence it 
can't be scanned on the master machine of the parallel job (usually I suggest 
to use local spool directories to lower the network traffic, with the side 
effect that the mail wrapper can't scan these entries any longer). In case the 
spool directory is shared, then this would be possible of course,

Do you submit jobs with "-notify" and/or a set s_rt?

--Reuti


> CPU              = 00:00:00
> Max vmem         = 10.229M
> failed assumedly after job because:
> job 187.1 died through signal KILL (9)
> 
> I had thought an exit status of 0 indicates normal termination?
> 
> From the accounting file man page 
>           "For example: If a job dies through signal  9  (SIGKILL)
>           then the exit status becomes 128 + 9 = 137."
> 
> 
> IMPORTANT WARNING: This message is intended for the use of the person or 
> entity to which it is addressed and may contain information that is 
> privileged and confidential, the disclosure of which is governed by 
> applicable law. If the reader of this message is not the intended recipient, 
> or the employee or agent responsible for delivering it to the intended 
> recipient, you are hereby notified that any dissemination, distribution or 
> copying of this information is strictly prohibited. Thank you for your 
> cooperation. _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to