See https://github.com/open-mpi/ompi/pull/2365 <https://github.com/open-mpi/ompi/pull/2365>
Let me know if that solves it for you > On Nov 3, 2016, at 9:48 AM, Andy Riebs <andy.ri...@hpe.com> wrote: > > Getting that support into 2.1 would be terrific -- and might save us from > having to write some Slurm prolog scripts to effect that. > > Thanks Ralph! > > On 11/01/2016 11:36 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote: >> Ah crumby!! We already solved this on master, but it cannot be backported to >> the 1.10 series without considerable pain. For some reason, the support for >> it has been removed from the 2.x series as well. I’ll try to resolve that >> issue and get the support reinstated there (probably not until 2.1). >> >> Can you manage until then? I think the v2 RM’s are thinking Dec/Jan for >> 2.1. >> Ralph >> >> >>> On Nov 1, 2016, at 11:38 AM, Riebs, Andy <andy.ri...@hpe.com >>> <mailto:andy.ri...@hpe.com>> wrote: >>> >>> To close the thread here… I got the following information: >>> >>> Looking at SLURM_CPU_BIND is the right idea, but there are quite a few more >>> options. It misses map_cpu, rank, plus the NUMA-based options: >>> rank_ldom, map_ldom, and mask_ldom. See the srun man pages for >>> documentation. >>> >>> >>> From: Riebs, Andy >>> Sent: Thursday, October 27, 2016 1:53 PM >>> To: users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >>> Subject: Re: [OMPI users] Slurm binding not propagated to MPI jobs >>> >>> Hi Ralph, >>> >>> I haven't played around in this code, so I'll flip the question over to the >>> Slurm list, and report back here when I learn anything. >>> >>> Cheers >>> Andy >>> >>> On 10/27/2016 01:44 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote: >>> Sigh - of course it wouldn’t be simple :-( >>> >>> All right, let’s suppose we look for SLURM_CPU_BIND: >>> >>> * if it includes the word “noneâ€, then we know the user specified that >>> they don’t want us to bind >>> >>> * if it includes the word mask_cpu, then we have to check the value of that >>> option. >>> >>> * If it is all F’s, then they didn’t specify a binding and we should do >>> our thing. >>> >>> * If it is anything else, then we assume they _did_ specify a binding, and >>> we leave it alone >>> >>> Would that make sense? Is there anything else that could be in that envar >>> which would trip us up? >>> >>> >>> On Oct 27, 2016, at 10:37 AM, Andy Riebs <andy.ri...@hpe.com >>> <mailto:andy.ri...@hpe.com>> wrote: >>> >>> Yes, they still exist: >>> $ srun --ntasks-per-node=2 -N1 env | grep BIND | sort -u >>> SLURM_CPU_BIND_LIST=0xFFFF >>> SLURM_CPU_BIND=quiet,mask_cpu:0xFFFF >>> SLURM_CPU_BIND_TYPE=mask_cpu: >>> SLURM_CPU_BIND_VERBOSE=quiet >>> Here are the relevant Slurm configuration options that could conceivably >>> change the behavior from system to system: >>> SelectType = select/cons_res >>> SelectTypeParameters = CR_CPU >>> >>> >>> On 10/27/2016 01:17 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote: >>> And if there is no --cpu_bind on the cmd line? Do these not exist? >>> >>> On Oct 27, 2016, at 10:14 AM, Andy Riebs <andy.ri...@hpe.com >>> <mailto:andy.ri...@hpe.com>> wrote: >>> >>> Hi Ralph, >>> >>> I think I've found the magic keys... >>> >>> $ srun --ntasks-per-node=2 -N1 --cpu_bind=none env | grep BIND >>> SLURM_CPU_BIND_VERBOSE=quiet >>> SLURM_CPU_BIND_TYPE=none >>> SLURM_CPU_BIND_LIST= >>> SLURM_CPU_BIND=quiet,none >>> SLURM_CPU_BIND_VERBOSE=quiet >>> SLURM_CPU_BIND_TYPE=none >>> SLURM_CPU_BIND_LIST= >>> SLURM_CPU_BIND=quiet,none >>> $ srun --ntasks-per-node=2 -N1 --cpu_bind=core env | grep BIND >>> SLURM_CPU_BIND_VERBOSE=quiet >>> SLURM_CPU_BIND_TYPE=mask_cpu: >>> SLURM_CPU_BIND_LIST=0x1111,0x2222 >>> SLURM_CPU_BIND=quiet,mask_cpu:0x1111,0x2222 >>> SLURM_CPU_BIND_VERBOSE=quiet >>> SLURM_CPU_BIND_TYPE=mask_cpu: >>> SLURM_CPU_BIND_LIST=0x1111,0x2222 >>> SLURM_CPU_BIND=quiet,mask_cpu:0x1111,0x2222 >>> >>> Andy >>> >>> On 10/27/2016 11:57 AM, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote: >>> >>> Hey Andy >>> >>> Is there a SLURM envar that would tell us the binding option from the srun >>> cmd line? We automatically bind when direct launched due to user complaints >>> of poor performance if we don’t. If the user specifies a >>> binding option, then we detect that we were already bound and >>> don’t do it. >>> >>> However, if the user specifies that they not be bound, then we think they >>> simply didn’t specify anything - and that >>> isn’t the case. If we can see something that tells us >>> “they explicitly said not to do itâ€, then we can >>> avoid the situation. >>> >>> Ralph >>> >>> >>> On Oct 27, 2016, at 8:48 AM, Andy Riebs <andy.ri...@hpe.com >>> <mailto:andy.ri...@hpe.com>> wrote: >>> >>> Hi All, >>> >>> We are running Open MPI version 1.10.2, built with support for Slurm >>> version 16.05.0. When a user specifies "--cpu_bind=none", MPI tries to bind >>> by core, which segv's if there are more processes than cores. >>> >>> The user reports: >>> >>> What I found is that >>> >>> % srun --ntasks-per-node=8 --cpu_bind=none \ >>> env SHMEM_SYMMETRIC_HEAP_SIZE=1024M bin/all2all.shmem.exe 0 >>> >>> will have the problem, but: >>> >>> % srun --ntasks-per-node=8 --cpu_bind=none \ >>> env SHMEM_SYMMETRIC_HEAP_SIZE=1024M ./bindit.sh bin/all2all.shmem.exe 0 >>> >>> Will run as expected and print out the usage message because I >>> didn’t provide the right arguments to the code. >>> >>> So, it appears that the binding has something to do with the issue. My >>> binding script is as follows: >>> >>> % cat bindit.sh >>> #!/bin/bash >>> >>> #echo SLURM_LOCALID=$SLURM_LOCALID >>> >>> stride=1 >>> >>> if [ ! -z "$SLURM_LOCALID" ]; then >>> let bindCPU=$SLURM_LOCALID*$stride >>> exec numactl --membind=0 --physcpubind=$bindCPU $* >>> fi >>> >>> $* >>> >>> % >>> >>> >>> -- >>> Andy Riebs >>> andy.ri...@hpe.com <mailto:andy.ri...@hpe.com> >>> Hewlett-Packard Enterprise >>> High Performance Computing Software Engineering >>> +1 404 648 9024 >>> My opinions are not necessarily those of HPE >>> May the source be with you! >>> >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users> >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users> >>> >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users> >>> >>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users> >>> >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users> >>> >>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users> >>> >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users> >> >> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users> > _______________________________________________ > users mailing list > users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users