Hi,

Am 14.01.2015 um 10:09 schrieb Roberto Nunnari:

> Hi.
> 
> man sge_pe states:
> 
> control_slaves
>  This parameter can be set to TRUE or FALSE (the default). It indicates 
> whether Oracle Grid Engine is the creator of the slave tasks of a parallel  
> application  via  sge_execd(8)  and  sge_shepherd(8) and thus has full 
> control over all processes in a parallel application, which enables 
> capabilities such as resource limitation and correct accounting. However, to 
> gain control over the slave tasks of a parallel  application,  a 
> sophisticated  PE  interface  is  required, which works closely together with 
> Oracle Grid Engine facilities. Such PE interfaces are available through your 
> local Oracle Grid Engine support office.
> 
> Does that mean that you need to buy some software from Oracle in order to 
> take advantage of 'control_slaves TRUE' ?

No.

It mainly refers to the fact that it depends on the parallel application 
whether any preparation might be necessary by supplying scripts for 
start/stop_proc_args and set up or tuning the started application not to do 
nasty things like jumping out of the process tree.

Technically its value must be set to TRUE to allow that a started job script is 
allowed to perform `qrsh --inherit ...` to reach other nodes without any 
`rsh`/`ssh` at all (in my clusters `ssh` is available for admin staff only).

While these scripts were mandatory for many parallel applications in the past, 
MPICH and Open MPI (./configure --with-sge for the latter) in the actual 
versions support SGE out of the box.

For Open MPI you can look for the value:

$ ompi_info | grep grid
                 MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.5)

whether it's set up in your version. Care must be taken with Open MPI 1.8 and 
newer as by default they issue a core binding independent from SGE's one and 
always start at socket/core 0/0, i.e. if more than one Open MPI job is running 
on a node it's necessary to either switch of Open MPI's core binding (and/or 
use SGE's one) or reformat the by SGE granted core list that it can be used by 
Open MPI.

-- Reuti


> In my production environment, I have four PEs and two are set as 
> 'control_slaves FALSE' and two 'control_slaves TRUE'.. and as long as I know, 
> all of them behave as expected.. that has been like that for about 9 years, 
> since I inherited the SGE cluster..
> 
> Can anybody cast some light on it, please?
> 
> my present environment:
> - OGE 6.2u7
> - on the execution nodes: openmpi 1.5.4
> - on the master node: openmpi 1.4
> 
> Thank you and best regards.
> Robi
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to