I have observed apparently random failures when users had gid's in the range `gid_range` (see below; gid_range should be out of the range where users have gid's). But usually this kind of thing would be due to OOM.
qconf -sconf | grep gid_range gid_range 50000-51000 On Tue, May 14, 2019 at 10:42 AM Reuti <re...@staff.uni-marburg.de> wrote: > AFAICS the sent kill by SGE happens after a task returned already with an > error. SGE would in this case use the kill signal to be sure to kill all > child processes. Hence the question would be: what was the initial command > in the job script, and what output/error did it generate? > > -- Reuti > > > Am 14.05.2019 um 11:36 schrieb hiller <hil...@mpia-hd.mpg.de>: > > > > Dear all, > > i have a problem that jobs sent to gridengine randomly die. > > The gridengine version is 8.1.9 > > The OS is opensuse 15.0 > > The gridengine messages file says: > > 05/13/2019 18:31:45|worker|karun|E|master task of job 635659.1 failed - > killing job > > 05/13/2019 18:31:46|worker|karun|W|job 635659.1 failed on host karun10 > assumedly after job because: job 635659.1 died through signal KILL (9) > > > > qacct -j 635659 says: > > failed 100 : assumedly after job > > exit_status 137 (Killed) > > > > > > The was no kill triggered by the user. Also there are no other > limitations, neither ulimit nor in the gridengine queue > > The 'qconf -sq all.q' command gives: > > s_rt INFINITY > > h_rt INFINITY > > s_cpu INFINITY > > h_cpu INFINITY > > s_fsize INFINITY > > h_fsize INFINITY > > s_data INFINITY > > h_data INFINITY > > s_stack INFINITY > > h_stack INFINITY > > s_core INFINITY > > h_core INFINITY > > s_rss INFINITY > > h_rss INFINITY > > s_vmem INFINITY > > h_vmem INFINITY > > > > Years ago there were some threads about the same issue, but i did not > find a solution. > > > > Does somebody have a hint what i can do or check/debug? > > > > With kind regards and many thanks for any help, ulrich > > _______________________________________________ > > users mailing list > > users@gridengine.org > > https://gridengine.org/mailman/listinfo/users > > > _______________________________________________ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users >
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users