Hi, Am 11.02.2014 um 22:37 schrieb Stephen Spencer:
> I have a sixty-node cluster running SGE 6.2u5 (RHEL 6.5). > > The immediate issue is that a user has jobs in the "qw" state, and there are > idle nodes in the cluster which appear to be able to accept the jobs. > > What works and doesn't work? > • "qsub -q [email protected] job.sh" works - the job runs on "n20" > • Repeated invocations of "qrsh hostname" will not, however, result in > the job running on one of the troublesome hosts. What is the definition of: $ qconf -sconf ... qlogin_command builtin qlogin_daemon builtin rlogin_command builtin rlogin_daemon builtin rsh_command builtin rsh_daemon builtin Any output when you use the "-q ..." for `qrsh` too? In addition, you can try "-w v" and "-w p" too. > Things I've tried, and know, so far: > • I've restarted the troublesome nodes - no change. > • "sge_execd" is running on the the troublesome nodes. > • The troublesome nodes are in the execution host list and the submit > host list. > • Most of the rest of the cluster's pretty busy. > • Interestingly, the troublesome nodes don't show up in the "scheduling > info" list produced as part of the "qstat -j <jobid>" command's output. > Short of restarting the entire cluster, I'm at a loss as to what to look at > next. Is "qtype INTERACTIVE" limited to certain nodes/queues? -- Reuti > -- > Stephen Spencer > [email protected] > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
