OK, we figured it out, the user has an orphaned tmux session which has a
shell in it with the command
watch qdel *
On 11/30/2015 04:27 PM, Alex Chekholko wrote:
Hi,
My qmaster messages log is continuously printing, every 2s:
11/30/2015 16:21:28|worker|scg3-hn01|E|The job * of user(s) yangili does
not exist
11/30/2015 16:21:30|worker|scg3-hn01|E|The job * of user(s) yangili does
not exist
11/30/2015 16:21:32|worker|scg3-hn01|E|The job * of user(s) yangili does
not exist
...
11/30/2015 16:21:47|worker|scg3-hn01|E|The job * of user(s) yangili does
not exist
11/30/2015 16:21:49|worker|scg3-hn01|E|The job * of user(s) yangili does
not exist
...
I tried restarting the qmaster, I tried restarting the execds...
Any suggestions?
Everything seems to be working fine, except for that one user, and his
jobs get killed immediately, e.g.
11/30/2015 16:20:27|worker|scg3-hn01|W|job 3124980.1 failed on host
scg3-0-6.local assumedly after job because: job 3124980.1 died through
signal KILL (9)
I'm not able to reproduce it with any other user account.
I tried stuff like 'qdel -u yangili'; the user has no jobs in the system:
[root@scg3-hn01 ~]# qdel -u yangili
There is no job registered for the following users: yangili
11/30/2015 16:24:34|worker|scg3-hn01|E|The job * of user(s) yangili does
not exist
11/30/2015 16:24:35|worker|scg3-hn01|E|There is no job registered for
the following users: yangili
11/30/2015 16:24:36|worker|scg3-hn01|E|The job * of user(s) yangili does
not exist
This is SoGE version 8.1.8(?), installed Feb2015.
Regards,
--
Alex Chekholko
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users