Re: [gridengine users] problem with concurrent jobs

Dimar Jaime González Soto Thu, 06 Dec 2018 08:01:12 -0800

qconf -sconf shows:

#global:
execd_spool_dir              /var/spool/gridengine/execd
mailer                       /usr/bin/mail
xterm                        /usr/bin/xterm
load_sensor                  none
prolog                       none
epilog                       none
shell_start_mode             posix_compliant
login_shells                 bash,sh,ksh,csh,tcsh
min_uid                      0
min_gid                      0
user_lists                   none
xuser_lists                  none
projects                     none
xprojects                    none
enforce_project              false
enforce_user                 auto
load_report_time             00:00:40
max_unheard                  00:05:00
reschedule_unknown           00:00:00
loglevel                     log_warning
administrator_mail           root
set_token_cmd                none
pag_cmd                      none
token_extend_time            none
shepherd_cmd                 none
qmaster_params               none
execd_params                 none
reporting_params             accounting=true reporting=false \
                             flush_time=00:00:15 joblog=false
sharelog=00:00:00
finished_jobs                100
gid_range                    65400-65500
max_aj_instances             2000
max_aj_tasks                 75000
max_u_jobs                   0
max_jobs                     0
auto_user_oticket            0
auto_user_fshare             0
auto_user_default_project    none
auto_user_delete_time        86400
delegated_file_staging       false
reprioritize                 0
rlogin_daemon                /usr/sbin/sshd -i
rlogin_command               /usr/bin/ssh
qlogin_daemon                /usr/sbin/sshd -i
qlogin_command               /usr/share/gridengine/qlogin-wrapper
rsh_daemon                   /usr/sbin/sshd -i
rsh_command                  /usr/bin/ssh
jsv_url                      none
jsv_allowed_mod              ac,h,i,e,o,j,M,N,p,w


El jue., 6 dic. 2018 a las 12:55, Reuti (<re...@staff.uni-marburg.de>)
escribió:

>
> > Am 06.12.2018 um 15:19 schrieb Dimar Jaime González Soto <
> dimar.gonzalez.s...@gmail.com>:
> >
> > qconf -se ubuntu-node2 :
> >
> > hostname              ubuntu-node2
> > load_scaling          NONE
> > complex_values        NONE
> > load_values
>  arch=lx26-amd64,num_proc=16,mem_total=48201.960938M, \
> >
>  swap_total=95746.996094M,virtual_total=143948.957031M, \
> >                       load_avg=3.740000,load_short=4.000000, \
> >                       load_medium=3.740000,load_long=2.360000, \
> >                       mem_free=47376.683594M,swap_free=95746.996094M, \
>
> Although it's unrelated to the main issue: the swap size can be limited to
> 2 GB nowadays (which is the default in openSUSE). RedHat suggests a little
> bit more, e.g. here:
>
>
> https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/storage_administration_guide/ch-swapspace
>
>
>
> >                       virtual_free=143123.679688M,mem_used=825.277344M, \
> >                       swap_used=0.000000M,virtual_used=825.277344M, \
> >
>  cpu=25.000000,m_topology=NONE,m_topology_inuse=NONE, \
> >                       m_socket=0,m_core=0,np_load_avg=0.233750, \
> >                       np_load_short=0.250000,np_load_medium=0.233750, \
> >                       np_load_long=0.147500
> > processors            16
> > user_lists            NONE
> > xuser_lists           NONE
> > projects              NONE
> > xprojects             NONE
> > usage_scaling         NONE
> > report_variables      NONE
> >
> > El jue., 6 dic. 2018 a las 11:17, Dimar Jaime González Soto (<
> dimar.gonzalez.s...@gmail.com>) escribió:
> > qhost :
> >
> > HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO
> SWAPUS
> >
> -------------------------------------------------------------------------------
> > global                  -               -     -       -       -       -
>      -
> > ubuntu-frontend         lx26-amd64     16  4.13   31.4G    1.2G     0.0
>    0.0
> > ubuntu-node11           lx26-amd64     16  4.55   47.1G  397.5M   93.5G
>    0.0
> > ubuntu-node12           lx26-amd64     16  3.64   47.1G    1.0G   93.5G
>    0.0
> > ubuntu-node13           lx26-amd64     16  4.54   47.1G  399.9M   93.5G
>    0.0
> > ubuntu-node2            lx26-amd64     16  3.67   47.1G  818.5M   93.5G
>    0.0
>
> This looks fine. So we have other settings to investigate:
>
> $ qconf -sconf
> #global:
> execd_spool_dir              /var/spool/sge
> ...
> max_aj_tasks                 75000
>
> Is max_aj_tasks  limited in your setup?
>
>
>
> -- Reuti
>
>
> >
> > El jue., 6 dic. 2018 a las 11:13, Reuti (<re...@staff.uni-marburg.de>)
> escribió:
> >
> > > Am 06.12.2018 um 15:07 schrieb Dimar Jaime González Soto <
> dimar.gonzalez.s...@gmail.com>:
> > >
> > >  qalter -w p doesn't shows anything, qstat shows 16 processes and not
> 60:
> > >
> > >     250 0.50000 OMA        cbuach       r     12/06/2018 11:04:15
> main.q@ubuntu-node2                1 1
> > >     250 0.50000 OMA        cbuach       r     12/06/2018 11:04:15
> main.q@ubuntu-node12               1 2
> > >     250 0.50000 OMA        cbuach       r     12/06/2018 11:04:15
> main.q@ubuntu-node13               1 3
> > >     250 0.50000 OMA        cbuach       r     12/06/2018 11:04:15
> main.q@ubuntu-node11               1 4
> > >     250 0.50000 OMA        cbuach       r     12/06/2018 11:04:15
> main.q@ubuntu-node11               1 5
> > >     250 0.50000 OMA        cbuach       r     12/06/2018 11:04:15
> main.q@ubuntu-node13               1 6
> > >     250 0.50000 OMA        cbuach       r     12/06/2018 11:04:15
> main.q@ubuntu-node12               1 7
> > >     250 0.50000 OMA        cbuach       r     12/06/2018 11:04:15
> main.q@ubuntu-node2                1 8
> > >     250 0.50000 OMA        cbuach       r     12/06/2018 11:04:15
> main.q@ubuntu-node2                1 9
> > >     250 0.50000 OMA        cbuach       r     12/06/2018 11:04:15
> main.q@ubuntu-node12               1 10
> > >     250 0.50000 OMA        cbuach       r     12/06/2018 11:04:15
> main.q@ubuntu-node13               1 11
> > >     250 0.50000 OMA        cbuach       r     12/06/2018 11:04:15
> main.q@ubuntu-node11               1 12
> > >     250 0.50000 OMA        cbuach       r     12/06/2018 11:04:15
> main.q@ubuntu-node11               1 13
> > >     250 0.50000 OMA        cbuach       r     12/06/2018 11:04:15
> main.q@ubuntu-node13               1 14
> > >     250 0.50000 OMA        cbuach       r     12/06/2018 11:04:15
> main.q@ubuntu-node12               1 15
> > >     250 0.50000 OMA        cbuach       r     12/06/2018 11:04:15
> main.q@ubuntu-node2                1 16
> > >     250 0.50000 OMA        cbuach       qw    12/06/2018 11:04:02
>                               1 17-60:1
> >
> > Aha, so they are running already on remote nodes – fine. As the setting
> in the queue configuration is per host, this should work and provide more
> processes per node instead of four.
> >
> > Is there a setting for the exechosts:
> >
> > qconf -se ubuntu-node2
> >
> > limiting the slots to 4 in complex_values? Can you please also provide
> the `qhost` output.
> >
> > -- Reuti
> >
> >
> >
> > >
> > > El jue., 6 dic. 2018 a las 10:59, Reuti (<re...@staff.uni-marburg.de>)
> escribió:
> > >
> > > > Am 06.12.2018 um 09:47 schrieb Hay, William <w....@ucl.ac.uk>:
> > > >
> > > > On Wed, Dec 05, 2018 at 03:29:23PM -0300, Dimar Jaime Gonz??lez Soto
> wrote:
> > > >>   the app site is https://omabrowser.org/standalone/ I tried to
> make a
> > > >>   parallel environment but it didn't work.
> > > > The website indicates that an array job should work for this.
> > > > Has the load average spiked to the point where np_load_avg>=1.75?
> > >
> > > Yes, I noticed this too. Hence we need no parallel environement at
> all, as OMA will just start several serial jobs as long as slots are
> available AFAICS.
> > >
> > > What does `qstat` show for a running job. There should be a line for
> each executing task while the waiting once are abbreviated in one line.
> > >
> > > -- Reuti
> > >
> > >
> > > >
> > > > I would try running qalter -w p  against the job id to see what it
> says.
> > > >
> > > > William
> > > >
> > > >
> > > >
> > > >>
> > > >>> Am 05.12.2018 um 19:10 schrieb Dimar Jaime Gonzalez Soto
> > > >>     <dimar.gonzalez.s...@gmail.com>:
> > > >>>
> > > >>> Hi everyone I'm trying to run OMA standalone on a grid engine setup
> > > >>     with this line:
> > > >>>
> > > >>> qsub -v NR_PROCESSES=60 -b y -j y -t 1-60 -cwd
> /usr/local/OMA/bin/OMA
> > > >>>
> > > >>> it works but only execute 4 processes  per node, there are 4 nodes
> > > >>     with 16 logical threads.  My main.q configuration is:
> > > >>>
> > > >>> qname                 main.q
> > > >>> hostlist              @allhosts
> > > >>> seq_no                0
> > > >>> load_thresholds       np_load_avg=1.75
> > > >>> suspend_thresholds    NONE
> > > >>> nsuspend              1
> > > >>> suspend_interval      00:05:00
> > > >>> priority              0
> > > >>> min_cpu_interval      00:05:00
> > > >>> processors            UNDIFINED
> > > >>> qtype                 BATCH INTERACTIVE
> > > >>> ckpt_list             NONE
> > > >>> pe_list               make
> > > >>> rerun                 FALSE
> > > >>> slots                 16
> > >
> > >
> > >
> > > --
> > > Atte.
> > >
> > > Dimar González Soto
> > > Ingeniero Civil en Informática
> > > Universidad Austral de Chile
> > >
> > >
> >
> >
> >
> > --
> > Atte.
> >
> > Dimar González Soto
> > Ingeniero Civil en Informática
> > Universidad Austral de Chile
> >
> >
> >
> >
> > --
> > Atte.
> >
> > Dimar González Soto
> > Ingeniero Civil en Informática
> > Universidad Austral de Chile
> >
> >
>
>

-- 
Atte.

Dimar González Soto
Ingeniero Civil en Informática
Universidad Austral de Chile

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] problem with concurrent jobs

Reply via email to