Re: [OMPI users] Hybrid OpenMPI+OpenMP tasks using SLURM

marcin.krotkiewski Mon, 5 Oct 2015 16:25:58 -0400 (EDT)

Ralph,

Thank you for a fast response! Sounds very good, unfortunately I get anerror:


$ mpirun --map-by core:pe=4 ./affinity
--------------------------------------------------------------------------
A request for multiple cpus-per-proc was given, but a directive
was also give to map to an object level that cannot support that
directive.

Please specify a mapping level that has more than one cpu, or
else let us define a default mapping that will allow multiple
cpus-per-proc.
--------------------------------------------------------------------------

I have allocated my slurm job as

salloc --ntasks=2 --cpus-per-task=4

I have checked in 1.10.0 and 1.10.1rc1.




On 10/05/2015 09:58 PM, Ralph Castain wrote:

You would presently do:

mpirun —map-by core:pe=4

to get what you are seeking. If we don’t already set that qualifier when we see 
“cpus_per_task”, then we probably should do so as there isn’t any reason to 
make you set it twice (well, other than trying to track which envar slurm is 
using now).

On Oct 5, 2015, at 12:38 PM, marcin.krotkiewski <marcin.krotkiew...@gmail.com> 
wrote:

Yet another question about cpu binding under SLURM environment..

Short version: will OpenMPI support SLURM_CPUS_PER_TASK for the purpose of cpu 
binding?


Full version: When you allocate a job like, e.g., this

salloc --ntasks=2 --cpus-per-task=4

SLURM will allocate 8 cores in total, 4 for each 'assumed' MPI tasks. This is 
useful for hybrid jobs, where each MPI process spawns some internal worker 
threads (e.g., OpenMP). The intention is that there are 2 MPI procs started, 
each of them 'bound' to 4 cores. SLURM will also set an environment variable

SLURM_CPUS_PER_TASK=4

which should (probably?) be taken into account by the method that launches the 
MPI processes to figure out the cpuset. In case of OpenMPI + mpirun I think 
something should happen in orte/mca/ras/slurm/ras_slurm_module.c, where the 
variable _is_ actually parsed. Unfortunately, it is never really used...

As a result, cpuset of all tasks started on a given compute node includes all 
CPU cores of all MPI tasks on that node, just as provided by SLURM (in the 
above example - 8). In general, there is no simple way for the user code in the 
MPI procs to 'split' the cores between themselves. I imagine the original 
intention to support this in OpenMPI was something like

mpirun --bind-to subtask_cpuset

with an artificial bind target that would cause OpenMPI to divide the allocated 
cores between the mpi tasks. Is this right? If so, it seems that at this point 
this is not implemented. Is there plans to do this? If no, does anyone know 
another way to achieve that?

Thanks a lot!

Marcin



_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/10/27803.php

_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/10/27804.php

Re: [OMPI users] Hybrid OpenMPI+OpenMP tasks using SLURM

Reply via email to