Re: [gridengine users] jobs allocate cores on node but do nothing

Reuti Mon, 15 Aug 2016 05:53:29 -0700

Hi,

> Am 15.08.2016 um 14:30 schrieb Ulrich Hiller <hil...@mpia-hd.mpg.de>:
> 
> Hello,
> 
>> The other issue seems to be, that in fact your job is using only one
> machine, which means that it is essentially ignoring any granted slot
> allocation. While the job is running, can you please execute on the
> master node of the parallel job:
>> 
>> $ ps -e f
>> 
>> (f w/o -) and post the relevant lines belonging to either sge_execd or
> just running as kids of the init process, in case they jumped out of the
> process tree. Maybe a good start would be to execute something like
> `mpiexec sleep 300` in the jobscript.
>> 
> 
> i invoked
> qsub -pe orte 160 -j yes -cwd -S /bin/bash <<< "mpiexec -n 160 sleep 300"
> 
> the only line ('ps -e f') on the master node was:
> 55722 ?        Sl     3:42 /opt/sge/bin/lx-amd64/sge_qmaster


I meant the master node of the parallel job. According to the output below it 
was exec-node01 in your test.


> No other sge lines, no child processes from it, and no other init
> processes leading to sge While at the same time the sleep processes were
> running on the nodes (Checked with ps command on the nodes).
> 
> The qstat command gave :
>   264 0.60500 STDIN      ulrich       r     08/15/2016 11:33:02
> all.q@exec-node01                  MASTER

Here you should see some processes executing a `qrsh -inherit ...`

Does is gives a better output?


> all.q@exec-node01                  SLAVE
> 
> all.q@exec-node01                  SLAVE
> 
> all.q@exec-node01                  SLAVE
> [ ...]
> 
> 264 0.60500 STDIN      ulrich       r     08/15/2016 11:33:02
> all.q@exec-node03                  SLAVE
> 
> all.q@exec-node03                  SLAVE
> 
> all.q@exec-node03                  SLAVE
> [ ... ]
>   264 0.60500 STDIN      ulrich       r     08/15/2016 11:33:02
> all.q@exec-node05                  SLAVE
> 
> all.q@exec-node05                  SLAVE
> [ ...]

The output of SGE shows only what was granted. Whether parallel complies to it, 
is often a necessary setting to look into and adjust in case it's not working 
as intended. Therefore the checks with `ps -e f`.

-- Reuti


> Because there was only the master deamon running on the master node, and
> you were tlaking about child processes: Was this now normal behaviour my
> cluster showed or is there something wrong?
> 
> Kind reagrds, ulrich
> 
> 
> 
> On 08/12/2016 07:11 PM, Reuti wrote:
>> Hi,
>> 
>>> Am 12.08.2016 um 18:48 schrieb Ulrich Hiller <hil...@mpia-hd.mpg.de>:
>>> 
>>> Hello,
>>> 
>>> i have a strange effect, where i am not sure whether it is "only" a
>>> misconfiguration or a bug.
>>> 
>>> First: I run son of gridengine 8.1.9-1.el6.x86_64 (i installed the rhel
>>> rpm on an opensuse 13.1 machine. This should not matter in this case,
>>> and it is reported to be able to run on opensuse).
>>> 
>>> mpirun and mpiexec are from openmpi-1.10.3 (no other mpi was installed,
>>> neither on master, nor on slaves). The installation was made with:
>>> ./configure --prefix=`pwd`/build --disable-dlopen --disable-mca-dso
>>> --with-orte --with-sge --with-x --enable-mpi-thread-multiple
>>> --enable-orterun-prefix-by-default --enable-mpirun-prefix-by-default
>>> --enable-orte-static-ports --enable-mpi-cxx --enable-mpi-cxx-seek
>>> --enable-oshmem --enable-java --enable-mpi-java
>>> make
>>> make install
>>> 
>>> I attached the outputs of 'qconf -ap all.q' , 'qconf -sconf' and 'qconf
>>> -sp orte' as textfiles.
>>> 
>>> Now my problem:
>>> I asked for 20 cores and if i run qstat -u '*' it shows that this job
>>> is being run in slave07 using 20 cores but is not true! if i run qstat
>>> -f -u '*' i see that this job is only using 3 cores in salve07 and
>>> there are 17 cores in other nodes allocated to this job which are in fact
>>> unused!
>> 
>> qstat will list only the master node of the parallel job and the number of 
>> overall slots. The granted allocation you can check with:
>> 
>> $ qstat -g t -u '*'
>> 
>> The other issue seems to be, that in fact your job is using only one 
>> machine, which means that it is essentially ignoring any granted slot 
>> allocation. While the job is running, can you please execute on the master 
>> node of the parallel job:
>> 
>> $ ps -e f
>> 
>> (f w/o -) and post the relevant lines belonging to either sge_execd or just 
>> running as kids of the init process, in case they jumped out of the process 
>> tree. Maybe a good start would be to execute something like `mpiexec sleep 
>> 300` in the jobscript.
>> 
>> Next step could be a `mpihello.c` where you put an almost endless loop 
>> inside and switch off all optimizations during compilations to check whether 
>> these slave processes are distributed in the correct way.
>> 
>> Note that some applications will check the number of cores they are running 
>> on and start by OpenMP (not Open MPI) as many threads as cores are found. 
>> Could this be the case for your application too?
>> 
>> -- Reuti
>> 
>> 
>>> Or other example:
>>> My job took say 6 cpus on slave07 and 14 on slave06 but nothing was
>>> running on 06 and therefore a waste of ressource on 06 and overload on
>>> 07 becomes highly possible (the numbers are made up).
>>> If i ran 1 Cpus in many independent jobs that would not be an issue, but
>>> imagine i now request 60 cpus on slave07, that would seriously overload
>>> the node in many cases.
>>> 
>>> Or other example:
>>> if i ask for say 50 CPUs, the job will start on one node, e.g,
>>> slave01,  but only reserving say 15 CPUs out of 64 and reserve the rest
>>> on many other nodes (obviously wasting space doing nothing).
>>> This has the bad consequence of allocating many more CPUs than available
>>> when many jobs are running, imagine you have 10 jobs like this one...
>>> some nodes will run maybe 3 even if they only have 24 CPUs...
>>> 
>>> I hope that i have made clear what the issue is.
>>> 
>>> I also see that the `qstat` and `qstat -f` are in disagreement. The
>>> latter is correct, i checked the processes running on the nodes.
>>> 
>>> 
>>> Did somebody already encounter such a problem? Does somebody have an
>>> idea where to look into or what to test?
>>> 
>>> With kind regards, ulrich
>>> 
>>> 
>>> 
>>> <qhost.txt><qconf-sconf.txt><qconf-mp-orte.txt><qconf-all.q>_______________________________________________
>>> users mailing list
>>> users@gridengine.org
>>> https://gridengine.org/mailman/listinfo/users
>> 
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] jobs allocate cores on node but do nothing

Reply via email to