qconf -sconf shows: #global: execd_spool_dir /var/spool/gridengine/execd mailer /usr/bin/mail xterm /usr/bin/xterm load_sensor none prolog none epilog none shell_start_mode posix_compliant login_shells bash,sh,ksh,csh,tcsh min_uid 0 min_gid 0 user_lists none xuser_lists none projects none xprojects none enforce_project false enforce_user auto load_report_time 00:00:40 max_unheard 00:05:00 reschedule_unknown 00:00:00 loglevel log_warning administrator_mail root set_token_cmd none pag_cmd none token_extend_time none shepherd_cmd none qmaster_params none execd_params none reporting_params accounting=true reporting=false \ flush_time=00:00:15 joblog=false sharelog=00:00:00 finished_jobs 100 gid_range 65400-65500 max_aj_instances 2000 max_aj_tasks 75000 max_u_jobs 0 max_jobs 0 auto_user_oticket 0 auto_user_fshare 0 auto_user_default_project none auto_user_delete_time 86400 delegated_file_staging false reprioritize 0 rlogin_daemon /usr/sbin/sshd -i rlogin_command /usr/bin/ssh qlogin_daemon /usr/sbin/sshd -i qlogin_command /usr/share/gridengine/qlogin-wrapper rsh_daemon /usr/sbin/sshd -i rsh_command /usr/bin/ssh jsv_url none jsv_allowed_mod ac,h,i,e,o,j,M,N,p,w
El jue., 6 dic. 2018 a las 12:55, Reuti (<re...@staff.uni-marburg.de>) escribió: > > > Am 06.12.2018 um 15:19 schrieb Dimar Jaime González Soto < > dimar.gonzalez.s...@gmail.com>: > > > > qconf -se ubuntu-node2 : > > > > hostname ubuntu-node2 > > load_scaling NONE > > complex_values NONE > > load_values > arch=lx26-amd64,num_proc=16,mem_total=48201.960938M, \ > > > swap_total=95746.996094M,virtual_total=143948.957031M, \ > > load_avg=3.740000,load_short=4.000000, \ > > load_medium=3.740000,load_long=2.360000, \ > > mem_free=47376.683594M,swap_free=95746.996094M, \ > > Although it's unrelated to the main issue: the swap size can be limited to > 2 GB nowadays (which is the default in openSUSE). RedHat suggests a little > bit more, e.g. here: > > > https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/storage_administration_guide/ch-swapspace > > > > > virtual_free=143123.679688M,mem_used=825.277344M, \ > > swap_used=0.000000M,virtual_used=825.277344M, \ > > > cpu=25.000000,m_topology=NONE,m_topology_inuse=NONE, \ > > m_socket=0,m_core=0,np_load_avg=0.233750, \ > > np_load_short=0.250000,np_load_medium=0.233750, \ > > np_load_long=0.147500 > > processors 16 > > user_lists NONE > > xuser_lists NONE > > projects NONE > > xprojects NONE > > usage_scaling NONE > > report_variables NONE > > > > El jue., 6 dic. 2018 a las 11:17, Dimar Jaime González Soto (< > dimar.gonzalez.s...@gmail.com>) escribió: > > qhost : > > > > HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO > SWAPUS > > > ------------------------------------------------------------------------------- > > global - - - - - - > - > > ubuntu-frontend lx26-amd64 16 4.13 31.4G 1.2G 0.0 > 0.0 > > ubuntu-node11 lx26-amd64 16 4.55 47.1G 397.5M 93.5G > 0.0 > > ubuntu-node12 lx26-amd64 16 3.64 47.1G 1.0G 93.5G > 0.0 > > ubuntu-node13 lx26-amd64 16 4.54 47.1G 399.9M 93.5G > 0.0 > > ubuntu-node2 lx26-amd64 16 3.67 47.1G 818.5M 93.5G > 0.0 > > This looks fine. So we have other settings to investigate: > > $ qconf -sconf > #global: > execd_spool_dir /var/spool/sge > ... > max_aj_tasks 75000 > > Is max_aj_tasks limited in your setup? > > > > -- Reuti > > > > > > El jue., 6 dic. 2018 a las 11:13, Reuti (<re...@staff.uni-marburg.de>) > escribió: > > > > > Am 06.12.2018 um 15:07 schrieb Dimar Jaime González Soto < > dimar.gonzalez.s...@gmail.com>: > > > > > > qalter -w p doesn't shows anything, qstat shows 16 processes and not > 60: > > > > > > 250 0.50000 OMA cbuach r 12/06/2018 11:04:15 > main.q@ubuntu-node2 1 1 > > > 250 0.50000 OMA cbuach r 12/06/2018 11:04:15 > main.q@ubuntu-node12 1 2 > > > 250 0.50000 OMA cbuach r 12/06/2018 11:04:15 > main.q@ubuntu-node13 1 3 > > > 250 0.50000 OMA cbuach r 12/06/2018 11:04:15 > main.q@ubuntu-node11 1 4 > > > 250 0.50000 OMA cbuach r 12/06/2018 11:04:15 > main.q@ubuntu-node11 1 5 > > > 250 0.50000 OMA cbuach r 12/06/2018 11:04:15 > main.q@ubuntu-node13 1 6 > > > 250 0.50000 OMA cbuach r 12/06/2018 11:04:15 > main.q@ubuntu-node12 1 7 > > > 250 0.50000 OMA cbuach r 12/06/2018 11:04:15 > main.q@ubuntu-node2 1 8 > > > 250 0.50000 OMA cbuach r 12/06/2018 11:04:15 > main.q@ubuntu-node2 1 9 > > > 250 0.50000 OMA cbuach r 12/06/2018 11:04:15 > main.q@ubuntu-node12 1 10 > > > 250 0.50000 OMA cbuach r 12/06/2018 11:04:15 > main.q@ubuntu-node13 1 11 > > > 250 0.50000 OMA cbuach r 12/06/2018 11:04:15 > main.q@ubuntu-node11 1 12 > > > 250 0.50000 OMA cbuach r 12/06/2018 11:04:15 > main.q@ubuntu-node11 1 13 > > > 250 0.50000 OMA cbuach r 12/06/2018 11:04:15 > main.q@ubuntu-node13 1 14 > > > 250 0.50000 OMA cbuach r 12/06/2018 11:04:15 > main.q@ubuntu-node12 1 15 > > > 250 0.50000 OMA cbuach r 12/06/2018 11:04:15 > main.q@ubuntu-node2 1 16 > > > 250 0.50000 OMA cbuach qw 12/06/2018 11:04:02 > 1 17-60:1 > > > > Aha, so they are running already on remote nodes – fine. As the setting > in the queue configuration is per host, this should work and provide more > processes per node instead of four. > > > > Is there a setting for the exechosts: > > > > qconf -se ubuntu-node2 > > > > limiting the slots to 4 in complex_values? Can you please also provide > the `qhost` output. > > > > -- Reuti > > > > > > > > > > > > El jue., 6 dic. 2018 a las 10:59, Reuti (<re...@staff.uni-marburg.de>) > escribió: > > > > > > > Am 06.12.2018 um 09:47 schrieb Hay, William <w....@ucl.ac.uk>: > > > > > > > > On Wed, Dec 05, 2018 at 03:29:23PM -0300, Dimar Jaime Gonz??lez Soto > wrote: > > > >> the app site is https://omabrowser.org/standalone/ I tried to > make a > > > >> parallel environment but it didn't work. > > > > The website indicates that an array job should work for this. > > > > Has the load average spiked to the point where np_load_avg>=1.75? > > > > > > Yes, I noticed this too. Hence we need no parallel environement at > all, as OMA will just start several serial jobs as long as slots are > available AFAICS. > > > > > > What does `qstat` show for a running job. There should be a line for > each executing task while the waiting once are abbreviated in one line. > > > > > > -- Reuti > > > > > > > > > > > > > > I would try running qalter -w p against the job id to see what it > says. > > > > > > > > William > > > > > > > > > > > > > > > >> > > > >>> Am 05.12.2018 um 19:10 schrieb Dimar Jaime Gonzalez Soto > > > >> <dimar.gonzalez.s...@gmail.com>: > > > >>> > > > >>> Hi everyone I'm trying to run OMA standalone on a grid engine setup > > > >> with this line: > > > >>> > > > >>> qsub -v NR_PROCESSES=60 -b y -j y -t 1-60 -cwd > /usr/local/OMA/bin/OMA > > > >>> > > > >>> it works but only execute 4 processes per node, there are 4 nodes > > > >> with 16 logical threads. My main.q configuration is: > > > >>> > > > >>> qname main.q > > > >>> hostlist @allhosts > > > >>> seq_no 0 > > > >>> load_thresholds np_load_avg=1.75 > > > >>> suspend_thresholds NONE > > > >>> nsuspend 1 > > > >>> suspend_interval 00:05:00 > > > >>> priority 0 > > > >>> min_cpu_interval 00:05:00 > > > >>> processors UNDIFINED > > > >>> qtype BATCH INTERACTIVE > > > >>> ckpt_list NONE > > > >>> pe_list make > > > >>> rerun FALSE > > > >>> slots 16 > > > > > > > > > > > > -- > > > Atte. > > > > > > Dimar González Soto > > > Ingeniero Civil en Informática > > > Universidad Austral de Chile > > > > > > > > > > > > > > -- > > Atte. > > > > Dimar González Soto > > Ingeniero Civil en Informática > > Universidad Austral de Chile > > > > > > > > > > -- > > Atte. > > > > Dimar González Soto > > Ingeniero Civil en Informática > > Universidad Austral de Chile > > > > > > -- Atte. Dimar González Soto Ingeniero Civil en Informática Universidad Austral de Chile
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users