Hi Kelly

Each instance of mpirun is independent - there is no cross-mpirun coordination. 
So they will indeed trip over each other as you describe.

In more recent versions, you can restrict the available cores for each mpirun 
execution by having the external system "bind" OMPI to some subset of the 
available cores. However, I don't believe Torque provides that capability.

You can also set the default cpu set to be used - try adding -mca orte_cpu_set 
1,2  where 1,2 are the cores you want that execution to use.

I can't guarantee it will work as I'm not sure it has been robustly tested, but 
it is supposed to do what you described (I added it for some other folks at 
LANL). Let me know and I'll fix it if required.

Alternatively, you can leave the procs unbound as you are doing and they'll run 
just fine, albeit a little slower.
Ralph

On Jan 9, 2012, at 8:24 AM, Thompson, Kelly G wrote:

> Hi,
>  
> I am interested in running a handful of mpirun jobs in a single allocation.  
> For example, my allocation is 2 nodes with 8 cores on each node (total of 16 
> cores).  I want to run 2 five-rank jobs and 3 two-rank jobs simultaneously 
> (total of 16 cores) and w/o oversubscribing any single core.  I am currently 
> using ‘--mca mpi_paffinity_alone 0’ and that appears to work, but it looks 
> like recent versions (1.4+) of OpenMPI have better controls for processor 
> affinity.  Is there a better choice of flags for my situation?
>  
> The bigger picture is that I am running 400-600 small unit tests in a single 
> Torque allocation.  My testing framework is aware of total available cores 
> and the cores required per test so that the total simultaneous core count 
> never exceeds my allocation.  However, if I use any option other than ‘--mca 
> mpi_paffinity_alone 0’, mpirun will place multiple jobs on the same cores and 
> leave many cores with nothing to do.  Is there a good description for how 
> mpirun assigns jobs to cores – particularly in the situation where there are 
> multiple mpirun jobs running on the same allocation?
>  
> TIA
>  
> -kt
> ---
> Kelly Thompson
> k...@lanl.gov
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to