Hi Rolf,
Thanks for answering!
Eli
Here is qstat -t, when I launch it with 8 processors. It looks to me
like it is actually using compute node 8. The mpirun job was submitted
on the head node 'nimbus'
Tried swapping out hostname for mpirun in the job script. For both 8
and 16 processors, I got about the same output. Only diff was which
node it used:
[emorris@nimbus ~/test]$ more mpi-ring.qsub.o254
compute-0-14.local
[emorris@nimbus ~/test]$ qsub -pe orte 8 mpi-ring.qsub
Your job 255 ("mpi-ring.qsub") has been submitted
[emorris@nimbus ~/test]$ qstat -t
job-ID prior name user state submit/start at
queue master ja-task-ID task-ID state
cpu mem io stat failed
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
255 0.55500 mpi-ring.q emorris r 08/05/2009 15:03:12 all.q@compute-0-8.local
MASTER
all.q@compute-0-8.local
SLAVE
all.q@compute-0-8.local
SLAVE
all.q@compute-0-8.local
SLAVE
all.q@compute-0-8.local
SLAVE
all.q@compute-0-8.local
SLAVE
all.q@compute-0-8.local
SLAVE
all.q@compute-0-8.local
SLAVE
all.q@compute-0-8.local
SLAVE
Here is entire job script:
[emorris@nimbus ~/test]$ more mpi-ring.qsub
#!/bin/bash
#
#$ -cwd
#$ -j y
#$ -S /bin/bash
#
#hostname
/opt/openmpi/bin/mpirun --debug-daemons --mca plm_base_verbose 40 --
mca plm_rsh_agent ssh -np $NSLOTS $HOME/test/mpi-ring
[root@nimbus gridengine]# qconf -sp orte
pe_name orte
slots 9999
user_lists NONE
xuser_lists NONE
start_proc_args /bin/true
stop_proc_args /bin/true
allocation_rule $fill_up
control_slaves TRUE
job_is_first_task FALSE
urgency_slots min
accounting_summary TRUE