Belaid MOA wrote:
Hi everyone,
Here is another elementary question. I tried the following steps found in the FAQ section of www.open-mpi.org with a simple hello world example (with PBS/torque):
 $  qsub -l nodes=2 my_script.sh

my_script.sh is pasted below:
#!/bin/sh -l
#PBS -N helloTest
#PBS -j eo
echo `cat $PBS_NODEFILE` # shows two nodes: WN1 WN2
/usr/local/bin/mpirun hello

When the job is submitted, only one process is ran. When I add the -n 2 option to the mpirun command, two processes are ran but on one node only.

Do you have a single CPU/core per node?
Or are they multi-socket/multi-core?

Check "man mpiexec" for the options that control on which nodes and
slots, etc your program will run.
("Man mpiexec" will tell you more than I possibly can.)

The default option is "-byslot",
which will use all "slots" (actually cores
or CPUs) available on a node before it moves to the next node.
Reading your question and your surprise with the result,
I would guess what you want is "-bynode" (not the default).

Also, if you have more than one CPU/core per node,
you need to put this information in your Torque/PBS "nodes" file
(and restart your pbs_server daemon).
Something like this (for 2 CPUs/cores per node):

WN1 np=2
WN2 np=2

I hope this helps,
Gus Correa
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA

Note that echo `cat $PBS_NODEFILE` outputs
the two nodes I am using: WN1 and WN2.

The output from ompi_info is shown below:

$ ompi_info | grep tm
              MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component v1.3.3)
                 MCA ras: tm (MCA v2.0, API v2.0, Component v1.3.3)
                 MCA plm: tm (MCA v2.0, API v2.0, Component v1.3.3)

Any help on why openMPI/mpirun is using only one PBS node is very appreciated.

Thanks a lot in advance and sorry for bothering you guys with my elementary questions!


