I have asked this question on TorqueUsers list.  Responses from that list 
suggests that the question be asked on this list:

The situation is:

I can submit my jobs as in:
> qsub -l nodes=6:ppn=2 /path/to/mpi_program

where "mpi_program" is:
/path/to/mpirun -np 12 /path/to/my_program

-- however everything went to run on the head node (one time on the first 
compute node).  Jobs can be done anyway.

While the mpirun can run on its own by specifying a "-machinefile", it is 
pointed out by Glen among others, and also on this web site 
http://wiki.hpc.ufl.edu/index.php/Common_Problems (I got the same error as the 
last example on that web page) that it's not a good idea to provide machinefile 
since it's "already handled by OpenMPI and Torque".

My question is, why the OpenMPI and Torque is not handling the jobs to all 
nodes?

ps 1:
The OpenMPI is configured and installed with the "--with-tm" option, and the 
"ompi_info" does show lines:

 MCA ras: tm (MCA v1.0, API v1.3, Component v1.2.7)
 MCA pls: tm (MCA v1.0, API v1.3, Component v1.2.7)

ps 2:
"/path/to/mpirun -np 12 -machinefile /path/to/machinefile /path/to/my_program"
works normal (send jobs to all nodes).

Thanks,

Zhiliang 

Reply via email to