Il 14.01.2015 10:09, Roberto Nunnari ha scritto:
Hi.

man sge_pe states:

control_slaves
   This parameter can be set to TRUE or FALSE (the default). It
indicates whether Oracle Grid Engine is the creator of the slave tasks
of a parallel  application  via  sge_execd(8)  and  sge_shepherd(8) and
thus has full control over all processes in a parallel application,
which enables capabilities such as resource limitation and correct
accounting. However, to gain control over the slave tasks of a parallel
  application,  a sophisticated  PE  interface  is  required, which
works closely together with Oracle Grid Engine facilities. Such PE
interfaces are available through your local Oracle Grid Engine support
office.

Does that mean that you need to buy some software from Oracle in order
to take advantage of 'control_slaves TRUE' ?

In my production environment, I have four PEs and two are set as
'control_slaves FALSE' and two 'control_slaves TRUE'.. and as long as I
know, all of them behave as expected.. that has been like that for about
9 years, since I inherited the SGE cluster..

Can anybody cast some light on it, please?

my present environment:
- OGE 6.2u7
- on the execution nodes: openmpi 1.5.4
- on the master node: openmpi 1.4

Thank you and best regards.
Robi

I can add that on the execution nodes, jobs launched with a PE configured with 'control_slaves TRUE' have a process hierarchy with sge_execd and sge_shepherd..

sge 1977 1 0 2014 ? 04:50:50 /opt/sge/bin/lx24-amd64/sge_execd
sge      23594  1977  0 Jan12 ?        00:00:00     sge_shepherd-1668440 -bg
user1 23596 23594 0 Jan12 ? 00:00:00 -sh /opt/sge/default/spool/node21/job_scripts/1668440 user1 23702 23596 99 Jan12 ? 11-23:30:46 /homea/user1/opt/myprog -v -ntomp 6 -nice 0 -gpu_id 0 -plumed plumed.dat -s topo

So.. it seams that in this case it is working properly..
Robi

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to