> If you run your cmd with the hostfile option and add
> --display-allocation, what does it say?

Thank you, Ralph.

This is the command I used inside my submission script:

  mpirun --display-allocation -np 4 -hostfile hosts ./program

And this is the output I got.

 Data for node: Name: node03  Num slots: 4    Max slots: 0
 Data for node: Name: node02  Num slots: 4    Max slots: 0
 Data for node: Name: node04  Num slots: 4    Max slots: 0
 Data for node: Name: node01  Num slots: 4    Max slots: 0

If I run the same mpirun command on the cluster head node "clhead" then this is what I get:

 Data for node: Name: clhead  Num slots: 0    Max slots: 0
 Data for node: Name: node01  Num slots: 1    Max slots: 0
 Data for node: Name: node02  Num slots: 1    Max slots: 0
 Data for node: Name: node03  Num slots: 1    Max slots: 0
 Data for node: Name: node04  Num slots: 1    Max slots: 0

The content of the 'hosts' file:

 node01 slots=1
 node02 slots=1
 node03 slots=1
 node04 slots=1

= Serge


On Apr 6, 2010, at 12:18 PM, Serge wrote:

Hi,

OpenMPI integrates with Sun Grid Engine really well, and one does not need to specify any parameters for the mpirun command to launch the processes on the compute nodes, that is having in the submission script "mpirun ./program" is enough; there is no need for "-np XX" or "-hostfile file_name".

However, there are cases when being able to specify the hostfile is important (hybrid jobs, users with MPICH jobs, etc.). For example, with Grid Engine I can request four 4-core nodes, that is total of 16 slots. But I also want to specify how to distribute processes on the nodes, so I create the file 'hosts'

node01 slots=1
node02 slots=1
node03 slots=1
node04 slots=1

and modify the line in the submission script to:
mpirun -hostfile hosts ./program

With Open MPI 1.2.x everything worked properly, meaning that Open MPI could count the number of slots specified in the 'hosts' file - 4 (i.e. effectively supplying the mpirun command with the -np parameter), as well as properly distribute processes on the compute nodes (one process per host).

It's different with Open MPI 1.4.1. It cannot process the 'hosts' file properly at all. All the processes get launched on just one node -- the shepherd host.

The format of the 'hosts' file does not matter. It can be, say

node01
node01
node02
node02

meaning 2 slots on each node. Open MPI 1.2.x would handle that with no problem, however Open MPI 1.4.x would not.

The problem appears with OMPI 1.4.1, SGE 6.1u6. It was also tested with OMPI 1.3.4 and SGE 6.2u4.

It's important to notice that if the mpirun command is run interactively, not from inside the Grid Engine script, then it interprets the content of the host file just fine.

I am wondering what changed from OMPI 1.2.x to OMPI 1.4.x that prevents expected behavior, and is it possible to get it from OMPI 1.4.x by, say, tuning some parameters?

= Serge

Reply via email to