AFAICS the sent kill by SGE happens after a task returned already with an 
error. SGE would in this case use the kill signal to be sure to kill all child 
processes. Hence the question would  be: what was the initial command in the 
job script, and what output/error did it generate?

-- Reuti

> Am 14.05.2019 um 11:36 schrieb hiller <hil...@mpia-hd.mpg.de>:
> 
> Dear all,
> i have a problem that jobs sent to gridengine randomly die.
> The gridengine version is 8.1.9
> The OS is opensuse 15.0
> The gridengine messages file says:
> 05/13/2019 18:31:45|worker|karun|E|master task of job 635659.1 failed - 
> killing job
> 05/13/2019 18:31:46|worker|karun|W|job 635659.1 failed on host karun10 
> assumedly after job because: job 635659.1 died through signal KILL (9)
> 
> qacct -j 635659 says:
> failed       100 : assumedly after job
> exit_status  137                  (Killed)
> 
> 
> The was no kill triggered by the user. Also there are no other limitations, 
> neither ulimit nor in the gridengine queue
> The 'qconf -sq all.q' command gives:
> s_rt                  INFINITY
> h_rt                  INFINITY
> s_cpu                 INFINITY
> h_cpu                 INFINITY
> s_fsize               INFINITY
> h_fsize               INFINITY
> s_data                INFINITY
> h_data                INFINITY
> s_stack               INFINITY
> h_stack               INFINITY
> s_core                INFINITY
> h_core                INFINITY
> s_rss                 INFINITY
> h_rss                 INFINITY
> s_vmem                INFINITY
> h_vmem                INFINITY
> 
> Years ago there were some threads about the same issue, but i did not find a 
> solution.
> 
> Does somebody have a hint what i can do or check/debug?
> 
> With kind regards and many thanks for any help, ulrich
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to