Hi all,
I’m a grid engine newbie and I need to run mpi jobs.
I configured a parallel environment named “mpi”:
[e4user@hactar greg]$ qconf -sp mpi
pe_name mpi
slots 336
user_lists NONE
xuser_lists NONE
start_proc_args /bin/true
stop_proc_args /bin/true
allocation_rule $fill_up
control_slaves TRUE
job_is_first_task FALSE
urgency_slots min
accounting_summary TRUE
[e4user@hactar greg]$
and I added it in the default queue:
[e4user@hactar greg]$ qconf -sq all.q
qname all.q
hostlist @allhosts
seq_no 0
load_thresholds np_load_avg=1.75
suspend_thresholds NONE
nsuspend 1
suspend_interval 00:05:00
priority 0
min_cpu_interval 00:05:00
processors UNDEFINED
qtype BATCH INTERACTIVE
ckpt_list NONE
pe_list make mpi
rerun FALSE
slots 1,[compute-1-5=24],[compute-1-9=24],[compute-1-8=24], \
[compute-1-6=24],[compute-1-1=24],[compute-1-2=24], \
[compute-1-11=24],[compute-1-14=24],[compute-1-3=24], \
[compute-1-4=24],[compute-1-7=24],[compute-1-10=24], \
[compute-1-12=24],[compute-1-13=24]
tmpdir /tmp
shell /bin/csh
prolog NONE
epilog NONE
shell_start_mode posix_compliant
starter_method NONE
suspend_method NONE
resume_method NONE
terminate_method NONE
notify 00:00:60
owner_list NONE
user_lists NONE
xuser_lists NONE
subordinate_list NONE
complex_values NONE
projects NONE
xprojects NONE
calendar NONE
initial_state default
s_rt INFINITY
h_rt INFINITY
s_cpu INFINITY
h_cpu INFINITY
s_fsize INFINITY
h_fsize INFINITY
s_data INFINITY
h_data INFINITY
s_stack INFINITY
h_stack INFINITY
s_core INFINITY
h_core INFINITY
s_rss INFINITY
h_rss INFINITY
s_vmem INFINITY
h_vmem INFINITY
[e4user@hactar greg]$
I also prepared a script:
[e4user@hactar greg]$ cat sge.sh
#!/bin/bash
mpirun -np 4 -mca ras_gridengine_verbose 100 date -v
[e4user@hactar greg]$
then I submit the job with:
[e4user@hactar greg]$ qsub -pe mpi 2 sge.sh
The problem is that the job doesn’t start, it is alwais in qw state:
[e4user@hactar greg]$ qstat
job-ID prior name user state submit/start at queue
slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
961 0.60500 mpi_date.s e4user qw 06/12/2015 17:45:09
8
962 0.50500 sge.sh e4user qw 06/13/2015 18:24:30
2
[e4user@hactar greg]$
mpi_date is a similar test job submitted before.
Any hint to start the job?
Thank you
D.
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users