If you run your cmd with the hostfile option and add --display-allocation, what does it say?
On Apr 6, 2010, at 12:18 PM, Serge wrote: > Hi, > > OpenMPI integrates with Sun Grid Engine really well, and one does not need to > specify any parameters for the mpirun command to launch the processes on the > compute nodes, that is having in the submission script "mpirun ./program" is > enough; there is no need for "-np XX" or "-hostfile file_name". > > However, there are cases when being able to specify the hostfile is important > (hybrid jobs, users with MPICH jobs, etc.). For example, with Grid Engine I > can request four 4-core nodes, that is total of 16 slots. But I also want to > specify how to distribute processes on the nodes, so I create the file 'hosts' > > node01 slots=1 > node02 slots=1 > node03 slots=1 > node04 slots=1 > > and modify the line in the submission script to: > mpirun -hostfile hosts ./program > > With Open MPI 1.2.x everything worked properly, meaning that Open MPI could > count the number of slots specified in the 'hosts' file - 4 (i.e. effectively > supplying the mpirun command with the -np parameter), as well as properly > distribute processes on the compute nodes (one process per host). > > It's different with Open MPI 1.4.1. It cannot process the 'hosts' file > properly at all. All the processes get launched on just one node -- the > shepherd host. > > The format of the 'hosts' file does not matter. It can be, say > > node01 > node01 > node02 > node02 > > meaning 2 slots on each node. Open MPI 1.2.x would handle that with no > problem, however Open MPI 1.4.x would not. > > The problem appears with OMPI 1.4.1, SGE 6.1u6. It was also tested with OMPI > 1.3.4 and SGE 6.2u4. > > It's important to notice that if the mpirun command is run interactively, not > from inside the Grid Engine script, then it interprets the content of the > host file just fine. > > I am wondering what changed from OMPI 1.2.x to OMPI 1.4.x that prevents > expected behavior, and is it possible to get it from OMPI 1.4.x by, say, > tuning some parameters? > > = Serge > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users