Hi,

> Am 05.01.2015 um 11:21 schrieb William Hay <[email protected]>:
> 
> On Fri, 26 Dec 2014 02:15:42 +0000
> Sangmin Park <[email protected]> wrote:
> 
>> Hi,
>> 
>> I manage several hpc machines in my site.
>> Each machine consists of master node and computing nodes.
>> User can access master nodes of each machine via login node, but can not 
>> access computing nodes directly from the login node.
>> 
>> SGE is installed in each machine.On each machine, all SGE command operates 
>> correctly, whereas on the login node, it does not work. When I typed 'qstat' 
>> command in login node, cursor is waiting without any kind of output forever.

Somehow I never got this email.

Communication blocked by a firewall?


>> Of course, SGE is installed in login node, too.

So from each login node you can reach all master nodes (i.e. to login there)? 
But you want to issue the SGE commands directly on the login in node(s) instead?

Was there any setup done that the login nodes could know about the cluster?


>> Is there method I can submit a job from login node to hpc machine?
>> Is it possible?

SGE doesn't support multi clustering out of the box. It can by achieved by 
several means though, but it needs some configurations steps.

a) Do you have a central home across all clusters, or do you need some file 
staging to route the necessary input and output files to each particular 
cluster?

b) The SGE installed on the "login" node needs to know which cluster to 
address. This can be done by 1) mounting each $SGE_ROOT (or only 
$SGE_ROOT/default/common*) from each cluster and setting the local value of 
$SGE_ROOT to this mounted directory before each command; or 2) copy one time 
$SGE_ROOT/default/common* to the login node and setting $SGE_ROOT in the same 
way.

c) Before you submit a job, you have to set $SGE_ROOT targeting the cluster you 
want to address (or use for `qstat`). A `qstat` wrapper could reset $SGE_ROOT 
several times and display the overall status of the clusters.

==

A more sophisticated setup would involve:

- a local SGE instance on the login node, i.e. you submit on the machine itself
- a load sensor, which will change the $SGE_ROOT several times and display the 
load or free slots on each cluster in a unique complex
- a starter method, which will forward the local scheduled jobs to one of the 
remote clusters

There is an older Howto by Charu Chaubal, but it needs some adjustments to work 
with SGE 6 or later. I set it up one time (including file staging to the remote 
cluster). But as this was working with the particular applications we use only, 
I never made a newer Howto of it (I used the job context to specify the type of 
computation and which files to forward or copy back).

http://arc.liv.ac.uk/SGE/howto/TransferQueues/transferqueues.html

-- Reuti

*) Replace "default" with each cell's name her


> The qstat command should either work or let you know the node isn't an 
> admin/submit node.
> 
> A simpler test might be to try the qping command to check basic connectivity 
> to the qmaster.  
> 
> It might be a routing issue.  Around here our login nodes are dual homed.  
> Possibly the login nodes are trying to access the qmaster via the external 
> interface for some reason.
> 
>> 
>> - Sangmin
>> 
> 
> 
> -- 
> William Hay <[email protected]>
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to