If you run your cmd with the hostfile option and add --display-allocation, what 
does it say?

On Apr 6, 2010, at 12:18 PM, Serge wrote:

> Hi,
> 
> OpenMPI integrates with Sun Grid Engine really well, and one does not need to 
> specify any parameters for the mpirun command to launch the processes on the 
> compute nodes, that is having in the submission script "mpirun ./program" is 
> enough; there is no need for "-np XX" or "-hostfile file_name".
> 
> However, there are cases when being able to specify the hostfile is important 
> (hybrid jobs, users with MPICH jobs, etc.). For example, with Grid Engine I 
> can request four 4-core nodes, that is total of 16 slots. But I also want to 
> specify how to distribute processes on the nodes, so I create the file 'hosts'
> 
> node01 slots=1
> node02 slots=1
> node03 slots=1
> node04 slots=1
> 
> and modify the line in the submission script to:
> mpirun -hostfile hosts ./program
> 
> With Open MPI 1.2.x everything worked properly, meaning that Open MPI could 
> count the number of slots specified in the 'hosts' file - 4 (i.e. effectively 
> supplying the mpirun command with the -np parameter), as well as properly 
> distribute processes on the compute nodes (one process per host).
> 
> It's different with Open MPI 1.4.1. It cannot process the 'hosts' file 
> properly at all. All the processes get launched on just one node -- the 
> shepherd host.
> 
> The format of the 'hosts' file does not matter. It can be, say
> 
> node01
> node01
> node02
> node02
> 
> meaning 2 slots on each node. Open MPI 1.2.x would handle that with no 
> problem, however Open MPI 1.4.x would not.
> 
> The problem appears with OMPI 1.4.1, SGE 6.1u6. It was also tested with OMPI 
> 1.3.4 and SGE 6.2u4.
> 
> It's important to notice that if the mpirun command is run interactively, not 
> from inside the Grid Engine script, then it interprets the content of the 
> host file just fine.
> 
> I am wondering what changed from OMPI 1.2.x to OMPI 1.4.x that prevents 
> expected behavior, and is it possible to get it from OMPI 1.4.x by, say, 
> tuning some parameters?
> 
> = Serge
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to