[gridengine users] jobs allocate cores on node but do nothing

Ulrich Hiller Fri, 12 Aug 2016 09:50:17 -0700

Hello,

i have a strange effect, where i am not sure whether it is "only" a
misconfiguration or a bug.


First: I run son of gridengine 8.1.9-1.el6.x86_64 (i installed the rhel
rpm on an opensuse 13.1 machine. This should not matter in this case,
and it is reported to be able to run on opensuse).

mpirun and mpiexec are from openmpi-1.10.3 (no other mpi was installed,
neither on master, nor on slaves). The installation was made with:
./configure --prefix=`pwd`/build --disable-dlopen --disable-mca-dso
--with-orte --with-sge --with-x --enable-mpi-thread-multiple
--enable-orterun-prefix-by-default --enable-mpirun-prefix-by-default
--enable-orte-static-ports --enable-mpi-cxx --enable-mpi-cxx-seek
--enable-oshmem --enable-java --enable-mpi-java
make
make install

I attached the outputs of 'qconf -ap all.q' , 'qconf -sconf' and 'qconf
-sp orte' as textfiles.

Now my problem:
I asked for 20 cores and if i run qstat -u '*' it shows that this job
is being run in slave07 using 20 cores but is not true! if i run qstat
-f -u '*' i see that this job is only using 3 cores in salve07 and
there are 17 cores in other nodes allocated to this job which are in fact
unused!

Or other example:
My job took say 6 cpus on slave07 and 14 on slave06 but nothing was
running on 06 and therefore a waste of ressource on 06 and overload on
07 becomes highly possible (the numbers are made up).
If i ran 1 Cpus in many independent jobs that would not be an issue, but
imagine i now request 60 cpus on slave07, that would seriously overload
the node in many cases.

Or other example:
 if i ask for say 50 CPUs, the job will start on one node, e.g,
slave01,  but only reserving say 15 CPUs out of 64 and reserve the rest
on many other nodes (obviously wasting space doing nothing).
This has the bad consequence of allocating many more CPUs than available
when many jobs are running, imagine you have 10 jobs like this one...
some nodes will run maybe 3 even if they only have 24 CPUs...

I hope that i have made clear what the issue is.

I also see that the `qstat` and `qstat -f` are in disagreement. The
latter is correct, i checked the processes running on the nodes.


Did somebody already encounter such a problem? Does somebody have an
idea where to look into or what to test?

With kind regards, ulrich

HOSTNAME                ARCH         NCPU NSOC NCOR NTHR  LOAD  MEMTOT  MEMUSE  
SWAPTO  SWAPUS
----------------------------------------------------------------------------------------------
global                  -               -    -    -    -     -       -       -  
     -       -
slave01                 lx-amd64       64    4   64   64  0.01  504.9G    2.3G  
 10.0G     0.0
slave02                 lx-amd64       64    4   64   64 123.0  504.8G  171.7G  
 10.0G     0.0
slave03                 lx-amd64       64    4   64   64  0.01  504.9G    2.0G  
 10.0G     0.0
slave04                 lx-amd64       64    4   64   64  0.01  504.9G    2.0G  
 10.0G     0.0
slave05                 lx-amd64       64    4   64   64  2.01  504.9G   40.2G  
 10.0G     0.0
slave06                 lx-amd64       40    4   40   40  0.01  314.8G    1.1G  
 32.0G     0.0
slave07                 lx-amd64       24    2   24   24 166.9  188.8G  188.0G  
 30.0G   28.7G
slave08                 lx-amd64       24    2   24   24  0.01  188.8G  660.5M  
 30.0G    7.5M

#global:
execd_spool_dir              /opt/sge/default/spool
mailer                       /bin/mail
xterm                        /usr/bin/xterm
load_sensor                  none
prolog                       none
epilog                       none
shell_start_mode             posix_compliant
login_shells                 sh,bash,ksh,csh,tcsh
min_uid                      100
min_gid                      100
user_lists                   none
xuser_lists                  none
projects                     none
xprojects                    none
enforce_project              false
enforce_user                 auto
load_report_time             00:00:40
max_unheard                  00:05:00
reschedule_unknown           00:00:00
loglevel                     log_warning
administrator_mail           hil...@mpia-hd.mpg.de
set_token_cmd                none
pag_cmd                      none
token_extend_time            none
shepherd_cmd                 none
qmaster_params               none
execd_params                 none
reporting_params             accounting=true reporting=false \
                             flush_time=00:00:15 joblog=false sharelog=00:00:00
finished_jobs                100
gid_range                    20000-20100
qlogin_command               builtin
qlogin_daemon                builtin
rlogin_command               builtin
rlogin_daemon                builtin
rsh_command                  builtin
rsh_daemon                   builtin
max_aj_instances             2000
max_aj_tasks                 75000
max_u_jobs                   0
max_jobs                     0
max_advance_reservations     0
auto_user_oticket            0
auto_user_fshare             0
auto_user_default_project    none
auto_user_delete_time        86400
delegated_file_staging       false
reprioritize                 0
jsv_url                      none
jsv_allowed_mod              ac,h,i,e,o,j,M,N,p,w

pe_name            orte
slots              408
user_lists         NONE
xuser_lists        NONE
start_proc_args    NONE
stop_proc_args     NONE
allocation_rule    $round_robin
control_slaves     TRUE
job_is_first_task  FALSE
urgency_slots      min
accounting_summary FALSE
qsort_args         NONE

qname                 all.q
hostlist              @allhosts
seq_no                0
load_thresholds       np_load_avg=8.75
suspend_thresholds    NONE
nsuspend              1
suspend_interval      00:05:00
priority              0
min_cpu_interval      00:05:00
processors            UNDEFINED
qtype                 BATCH INTERACTIVE
ckpt_list             NONE
pe_list               make smp mpi orte
rerun                 FALSE
slots                 1,[slave01=64],[slave02=64],[slave03=64],[slave04=64], \
                      [slave05=64],[slave06=40],[slave07=24],[slave08=24]
tmpdir                /tmp
shell                 /bin/sh
prolog                NONE
epilog                NONE
shell_start_mode      posix_compliant
starter_method        NONE
suspend_method        NONE
resume_method         NONE
terminate_method      NONE
notify                00:00:60
owner_list            NONE
user_lists            NONE
xuser_lists           NONE
subordinate_list      NONE
complex_values        NONE
projects              NONE
xprojects             NONE
calendar              NONE
initial_state         default
s_rt                  INFINITY
h_rt                  INFINITY
s_cpu                 INFINITY
h_cpu                 INFINITY
s_fsize               INFINITY
h_fsize               INFINITY
s_data                INFINITY
h_data                INFINITY
s_stack               INFINITY
h_stack               INFINITY
s_core                INFINITY
h_core                INFINITY
s_rss                 INFINITY
h_rss                 INFINITY
s_vmem                INFINITY
h_vmem                INFINITY

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

[gridengine users] jobs allocate cores on node but do nothing

Reply via email to