And if there is no --cpu_bind on the cmd line? Do these not exist?

> On Oct 27, 2016, at 10:14 AM, Andy Riebs <andy.ri...@hpe.com> wrote:
> 
> Hi Ralph,
> 
> I think I've found the magic keys...
> 
> $ srun --ntasks-per-node=2 -N1 --cpu_bind=none env | grep BIND
> SLURM_CPU_BIND_VERBOSE=quiet
> SLURM_CPU_BIND_TYPE=none
> SLURM_CPU_BIND_LIST=
> SLURM_CPU_BIND=quiet,none
> SLURM_CPU_BIND_VERBOSE=quiet
> SLURM_CPU_BIND_TYPE=none
> SLURM_CPU_BIND_LIST=
> SLURM_CPU_BIND=quiet,none
> $ srun --ntasks-per-node=2 -N1 --cpu_bind=core env | grep BIND
> SLURM_CPU_BIND_VERBOSE=quiet
> SLURM_CPU_BIND_TYPE=mask_cpu:
> SLURM_CPU_BIND_LIST=0x1111,0x2222
> SLURM_CPU_BIND=quiet,mask_cpu:0x1111,0x2222
> SLURM_CPU_BIND_VERBOSE=quiet
> SLURM_CPU_BIND_TYPE=mask_cpu:
> SLURM_CPU_BIND_LIST=0x1111,0x2222
> SLURM_CPU_BIND=quiet,mask_cpu:0x1111,0x2222
> 
> Andy
> 
> On 10/27/2016 11:57 AM, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote:
>> Hey Andy
>> 
>> Is there a SLURM envar that would tell us the binding option from the srun 
>> cmd line? We automatically bind when direct launched due to user complaints 
>> of poor performance if we don’t. If the user specifies a binding option, 
>> then we detect that we were already bound and don’t do it.
>> 
>> However, if the user specifies that they not be bound, then we think they 
>> simply didn’t specify anything - and that isn’t the case. If we can see 
>> something that tells us “they explicitly said not to do it”, then we can 
>> avoid the situation.
>> 
>> Ralph
>> 
>>> On Oct 27, 2016, at 8:48 AM, Andy Riebs <andy.ri...@hpe.com 
>>> <mailto:andy.ri...@hpe.com>> wrote:
>>> 
>>> Hi All,
>>> 
>>> We are running Open MPI version 1.10.2, built with support for Slurm 
>>> version 16.05.0. When a user specifies "--cpu_bind=none", MPI tries to bind 
>>> by core, which segv's if there are more processes than cores.
>>> 
>>> The user reports:
>>> 
>>> What I found is that
>>> 
>>> % srun --ntasks-per-node=8 --cpu_bind=none  \
>>>     env SHMEM_SYMMETRIC_HEAP_SIZE=1024M bin/all2all.shmem.exe 0
>>> 
>>> will have the problem, but:
>>> 
>>> % srun --ntasks-per-node=8 --cpu_bind=none  \
>>>     env SHMEM_SYMMETRIC_HEAP_SIZE=1024M ./bindit.sh bin/all2all.shmem.exe 0
>>> 
>>> Will run as expected and print out the usage message because I didn’t 
>>> provide the right arguments to the code.
>>> 
>>> So, it appears that the binding has something to do with the issue. My 
>>> binding script is as follows:
>>> 
>>> % cat bindit.sh
>>> #!/bin/bash
>>> 
>>> #echo SLURM_LOCALID=$SLURM_LOCALID
>>> 
>>> stride=1
>>> 
>>> if [ ! -z "$SLURM_LOCALID" ]; then
>>>   let bindCPU=$SLURM_LOCALID*$stride
>>>   exec numactl --membind=0 --physcpubind=$bindCPU $*
>>> fi
>>> 
>>> $*
>>> 
>>> %
>>> 
>>> 
>>> -- 
>>> Andy Riebs
>>> andy.ri...@hpe.com
>>> Hewlett-Packard Enterprise
>>> High Performance Computing Software Engineering
>>> +1 404 648 9024
>>> My opinions are not necessarily those of HPE
>>>    May the source be with you!
>>> 
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to