Hi,

Am 19.04.2013 um 01:51 schrieb Adam Brenner:

> We are running GE 1.8.2. When a node gets oversubscribed and jobs get
> "suspended" -- turned into the T state -- users who's jobs are in the
> T state can not directly SSH into that node. Am I correct in that GE
> is the cause for this (users not able to SSH into the nodes)?

No. Unless someone installed a PAM to check for running jobs on the nodes and 
enables resp. disables access to the node for certain users. Then this PAM 
should be adjusted to honor "T" and maybe "S" states too besides "r".


> If so,
> we would want users to be able to directly login to those nodes.
> 
> The reason I suspect GE is that cause, is that SSH'ing to other nodes
> work, another user account who has no running job on the
> oversubscribed node can directly SSH to node.

In caes it's even possible to log in without running jobs at all, I would 
assume the node in the above case is completely overloaded and the SSH just 
can't get enough time to handle the request. If you are not using the `nice` 
values for now, you can try to define in all queues a "priority 19", which will 
set the nice value for the started processes to the lowest possible setting 
(IIRC 20 on Solaris). As long as all user porcesses have the same value, it' 
doesn't matter whether it's 0 or 19. The interactive login would then get a 
higher one as it defaults to 0.

-- Reuti


> When the node resumes back to a normal level, and jobs that were in
> the T state go back to R, the user is then able to SSH directly to the
> node.
> 
> 
> In our case, it is very helpful for our users to directly SSH into the
> nodes to determine what is wrong with their qsub scripts, etc. This is
> a follow up the following thread by Joseph and Harry:
>      https://gridengine.org/pipermail/users/2013-February/005585.html
> 
> Thanks,
> -Adam
> 
> --
> Adam Brenner
> Computer Science, Undergraduate Student
> Donald Bren School of Information and Computer Sciences
> 
> Research Computing Support
> Office of Information Technology
> http://www.oit.uci.edu/rcs/
> 
> University of California, Irvine
> www.ics.uci.edu/~aebrenne/
> [email protected]
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to