Dear all, i have a problem that jobs sent to gridengine randomly die. The gridengine version is 8.1.9 The OS is opensuse 15.0 The gridengine messages file says: 05/13/2019 18:31:45|worker|karun|E|master task of job 635659.1 failed - killing job 05/13/2019 18:31:46|worker|karun|W|job 635659.1 failed on host karun10 assumedly after job because: job 635659.1 died through signal KILL (9)
qacct -j 635659 says: failed 100 : assumedly after job exit_status 137 (Killed) The was no kill triggered by the user. Also there are no other limitations, neither ulimit nor in the gridengine queue The 'qconf -sq all.q' command gives: s_rt INFINITY h_rt INFINITY s_cpu INFINITY h_cpu INFINITY s_fsize INFINITY h_fsize INFINITY s_data INFINITY h_data INFINITY s_stack INFINITY h_stack INFINITY s_core INFINITY h_core INFINITY s_rss INFINITY h_rss INFINITY s_vmem INFINITY h_vmem INFINITY Years ago there were some threads about the same issue, but i did not find a solution. Does somebody have a hint what i can do or check/debug? With kind regards and many thanks for any help, ulrich _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users