See https://github.com/open-mpi/ompi/pull/2365 
<https://github.com/open-mpi/ompi/pull/2365>

Let me know if that solves it for you


> On Nov 3, 2016, at 9:48 AM, Andy Riebs <andy.ri...@hpe.com> wrote:
> 
> Getting that support into 2.1 would be terrific -- and might save us from 
> having to write some Slurm prolog scripts to effect that.
> 
> Thanks Ralph!
> 
> On 11/01/2016 11:36 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote:
>> Ah crumby!! We already solved this on master, but it cannot be backported to 
>> the 1.10 series without considerable pain. For some reason, the support for 
>> it has been removed from the 2.x series as well. I’ll try to resolve that 
>> issue and get the support reinstated there (probably not until 2.1).
>> 
>> Can you manage until then? I think the v2 RM’s are thinking Dec/Jan for 
>> 2.1.
>> Ralph
>> 
>> 
>>> On Nov 1, 2016, at 11:38 AM, Riebs, Andy <andy.ri...@hpe.com 
>>> <mailto:andy.ri...@hpe.com>> wrote:
>>> 
>>> To close the thread here… I got the following information:
>>>  
>>> Looking at SLURM_CPU_BIND is the right idea, but there are quite a few more 
>>> options. It misses map_cpu, rank, plus the NUMA-based options:
>>> rank_ldom, map_ldom, and mask_ldom. See the srun man pages for 
>>> documentation.
>>>  
>>>  
>>> From: Riebs, Andy 
>>> Sent: Thursday, October 27, 2016 1:53 PM
>>> To: users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>> Subject: Re: [OMPI users] Slurm binding not propagated to MPI jobs
>>>  
>>> Hi Ralph,
>>> 
>>> I haven't played around in this code, so I'll flip the question over to the 
>>> Slurm list, and report back here when I learn anything.
>>> 
>>> Cheers
>>> Andy
>>> 
>>> On 10/27/2016 01:44 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote:
>>> Sigh - of course it wouldn’t be simple :-( 
>>>  
>>> All right, let’s suppose we look for SLURM_CPU_BIND:
>>>  
>>> * if it includes the word “none”, then we know the user specified that 
>>> they don’t want us to bind
>>>  
>>> * if it includes the word mask_cpu, then we have to check the value of that 
>>> option.
>>>  
>>> * If it is all F’s, then they didn’t specify a binding and we should do 
>>> our thing.
>>>  
>>> * If it is anything else, then we assume they _did_ specify a binding, and 
>>> we leave it alone
>>>  
>>> Would that make sense? Is there anything else that could be in that envar 
>>> which would trip us up?
>>>  
>>>  
>>> On Oct 27, 2016, at 10:37 AM, Andy Riebs <andy.ri...@hpe.com 
>>> <mailto:andy.ri...@hpe.com>> wrote:
>>>  
>>> Yes, they still exist:
>>> $ srun --ntasks-per-node=2 -N1 env | grep BIND | sort -u
>>> SLURM_CPU_BIND_LIST=0xFFFF
>>> SLURM_CPU_BIND=quiet,mask_cpu:0xFFFF
>>> SLURM_CPU_BIND_TYPE=mask_cpu:
>>> SLURM_CPU_BIND_VERBOSE=quiet
>>> Here are the relevant Slurm configuration options that could conceivably 
>>> change the behavior from system to system:
>>> SelectType              = select/cons_res
>>> SelectTypeParameters    = CR_CPU
>>> 
>>>  
>>> On 10/27/2016 01:17 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote:
>>> And if there is no --cpu_bind on the cmd line? Do these not exist?
>>>  
>>> On Oct 27, 2016, at 10:14 AM, Andy Riebs <andy.ri...@hpe.com 
>>> <mailto:andy.ri...@hpe.com>> wrote:
>>>  
>>> Hi Ralph,
>>> 
>>> I think I've found the magic keys...
>>> 
>>> $ srun --ntasks-per-node=2 -N1 --cpu_bind=none env | grep BIND
>>> SLURM_CPU_BIND_VERBOSE=quiet
>>> SLURM_CPU_BIND_TYPE=none
>>> SLURM_CPU_BIND_LIST=
>>> SLURM_CPU_BIND=quiet,none
>>> SLURM_CPU_BIND_VERBOSE=quiet
>>> SLURM_CPU_BIND_TYPE=none
>>> SLURM_CPU_BIND_LIST=
>>> SLURM_CPU_BIND=quiet,none
>>> $ srun --ntasks-per-node=2 -N1 --cpu_bind=core env | grep BIND
>>> SLURM_CPU_BIND_VERBOSE=quiet
>>> SLURM_CPU_BIND_TYPE=mask_cpu:
>>> SLURM_CPU_BIND_LIST=0x1111,0x2222
>>> SLURM_CPU_BIND=quiet,mask_cpu:0x1111,0x2222
>>> SLURM_CPU_BIND_VERBOSE=quiet
>>> SLURM_CPU_BIND_TYPE=mask_cpu:
>>> SLURM_CPU_BIND_LIST=0x1111,0x2222
>>> SLURM_CPU_BIND=quiet,mask_cpu:0x1111,0x2222
>>> 
>>> Andy
>>> 
>>> On 10/27/2016 11:57 AM, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote:
>>> 
>>> Hey Andy
>>> 
>>> Is there a SLURM envar that would tell us the binding option from the srun 
>>> cmd line? We automatically bind when direct launched due to user complaints 
>>> of poor performance if we don’t. If the user specifies a 
>>> binding option, then we detect that we were already bound and 
>>> don’t do it.
>>> 
>>> However, if the user specifies that they not be bound, then we think they 
>>> simply didn’t specify anything - and that 
>>> isn’t the case. If we can see something that tells us 
>>> “they explicitly said not to do it”, then we can 
>>> avoid the situation.
>>> 
>>> Ralph
>>> 
>>> 
>>> On Oct 27, 2016, at 8:48 AM, Andy Riebs <andy.ri...@hpe.com 
>>> <mailto:andy.ri...@hpe.com>> wrote:
>>> 
>>> Hi All,
>>> 
>>> We are running Open MPI version 1.10.2, built with support for Slurm 
>>> version 16.05.0. When a user specifies "--cpu_bind=none", MPI tries to bind 
>>> by core, which segv's if there are more processes than cores.
>>> 
>>> The user reports:
>>> 
>>> What I found is that
>>> 
>>> % srun --ntasks-per-node=8 --cpu_bind=none  \
>>>     env SHMEM_SYMMETRIC_HEAP_SIZE=1024M bin/all2all.shmem.exe 0
>>> 
>>> will have the problem, but:
>>> 
>>> % srun --ntasks-per-node=8 --cpu_bind=none  \
>>>     env SHMEM_SYMMETRIC_HEAP_SIZE=1024M ./bindit.sh bin/all2all.shmem.exe 0
>>> 
>>> Will run as expected and print out the usage message because I 
>>> didn’t provide the right arguments to the code.
>>> 
>>> So, it appears that the binding has something to do with the issue. My 
>>> binding script is as follows:
>>> 
>>> % cat bindit.sh
>>> #!/bin/bash
>>> 
>>> #echo SLURM_LOCALID=$SLURM_LOCALID
>>> 
>>> stride=1
>>> 
>>> if [ ! -z "$SLURM_LOCALID" ]; then
>>>   let bindCPU=$SLURM_LOCALID*$stride
>>>   exec numactl --membind=0 --physcpubind=$bindCPU $*
>>> fi
>>> 
>>> $*
>>> 
>>> %
>>> 
>>> 
>>> -- 
>>> Andy Riebs
>>> andy.ri...@hpe.com <mailto:andy.ri...@hpe.com>
>>> Hewlett-Packard Enterprise
>>> High Performance Computing Software Engineering
>>> +1 404 648 9024
>>> My opinions are not necessarily those of HPE
>>>    May the source be with you!
>>> 
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
>>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
>>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>>> 
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
>>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>>>  
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
>>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>>>  
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
>>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>>>  
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
>>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>>>  
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
>>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>> 
>> 
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to