Hi,

Am 11.02.2014 um 22:37 schrieb Stephen Spencer:

> I have a sixty-node cluster running SGE 6.2u5 (RHEL 6.5).
> 
> The immediate issue is that a user has jobs in the "qw" state, and there are 
> idle nodes in the cluster which appear to be able to accept the jobs.
> 
> What works and doesn't work?
>       • "qsub -q [email protected] job.sh" works - the job runs on "n20" 
>       • Repeated invocations of "qrsh hostname" will not, however, result in 
> the job running on one of the troublesome hosts.

What is the definition of:

$ qconf -sconf
...
qlogin_command               builtin
qlogin_daemon                builtin
rlogin_command               builtin
rlogin_daemon                builtin
rsh_command                  builtin
rsh_daemon                   builtin

Any output when you use the "-q ..." for `qrsh` too? In addition, you can try 
"-w v" and "-w p" too.


> Things I've tried, and know, so far:
>       • I've restarted the troublesome nodes - no change.
>       • "sge_execd" is running on the the troublesome nodes.
>       • The troublesome nodes are in the execution host list and the submit 
> host list.
>       • Most of the rest of the cluster's pretty busy.
>       • Interestingly, the troublesome nodes don't show up in the "scheduling 
> info" list produced as part of the "qstat -j <jobid>" command's output.
> Short of restarting the entire cluster, I'm at a loss as to what to look at 
> next. 

Is "qtype INTERACTIVE" limited to certain nodes/queues?

-- Reuti


> -- 
> Stephen Spencer
> [email protected]
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to