It's a limit being reached, of some sort. Do you have a RQS of any kind (qconf -srqs)? We see this for job-requested, or system set RAM exhaustion (OOM killer, as mentioned 'dmesg -T' on compute nodes often useful), as well as time limits reached. What is the whole output from 'qacct -j JOBID'?
Cheers, -Hugh -----Original Message----- From: [email protected] <[email protected]> On Behalf Of hiller Sent: Tuesday, May 14, 2019 9:02 AM To: [email protected] Subject: Re: [gridengine users] jobs randomly die Hi, nope, there are no oom messages in the journal. Regards, ulrich On 5/14/19 12:49 PM, Arnau wrote: > Hi, > > _maybe_ the OOM killer killed the job ? a look to messages will give you an > answer (I've seen this in my cluster). > > HTH, > Arnau > > El mar., 14 may. 2019 a las 12:37, hiller (<[email protected] > <mailto:[email protected]>>) escribió: > > Dear all, > i have a problem that jobs sent to gridengine randomly die. > The gridengine version is 8.1.9 > The OS is opensuse 15.0 > The gridengine messages file says: > 05/13/2019 18:31:45|worker|karun|E|master task of job 635659.1 failed - > killing job > 05/13/2019 18:31:46|worker|karun|W|job 635659.1 failed on host karun10 > assumedly after job because: job 635659.1 died through signal KILL (9) > > qacct -j 635659 says: > failed 100 : assumedly after job > exit_status 137 (Killed) > > > The was no kill triggered by the user. Also there are no other > limitations, neither ulimit nor in the gridengine queue > The 'qconf -sq all.q' command gives: > s_rt INFINITY > h_rt INFINITY > s_cpu INFINITY > h_cpu INFINITY > s_fsize INFINITY > h_fsize INFINITY > s_data INFINITY > h_data INFINITY > s_stack INFINITY > h_stack INFINITY > s_core INFINITY > h_core INFINITY > s_rss INFINITY > h_rss INFINITY > s_vmem INFINITY > h_vmem INFINITY > > Years ago there were some threads about the same issue, but i did not > find a solution. > > Does somebody have a hint what i can do or check/debug? > > With kind regards and many thanks for any help, ulrich > _______________________________________________ > users mailing list > [email protected] <mailto:[email protected]> > https://gridengine.org/mailman/listinfo/users > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
