Thank you for the clarification, Gilles. That has saved me trying to debug something that is not expected to work.
Can this be added to the OpenMPI documentation at https://docs.open-mpi.org/en/main/launching-apps/slurm.html please? I feel like this should be called out as a reason to use srun instead of mpirun. Thanks, Chris On Sat, 24 Feb 2024 at 11:11, <users-requ...@lists.open-mpi.org> wrote: > Send users mailing list submissions to > users@lists.open-mpi.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.open-mpi.org/mailman/listinfo/users > or, via email, send a message with subject or body 'help' to > users-requ...@lists.open-mpi.org > > You can reach the person managing the list at > users-ow...@lists.open-mpi.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of users digest..." > > > Today's Topics: > > 1. Subject: Clarification about mpirun behavior in Slurm jobs > (Christopher Daley) > 2. Re: Subject: Clarification about mpirun behavior in Slurm > jobs (Andrew Reid) > 3. Re: Subject: Clarification about mpirun behavior in Slurm > jobs (Gilles Gouaillardet) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 23 Feb 2024 14:03:06 -0800 > From: Christopher Daley <chrisdale...@gmail.com> > To: users@lists.open-mpi.org > Subject: [OMPI users] Subject: Clarification about mpirun behavior in > Slurm jobs > Message-ID: > <CAAh2_i9= > nicad8z8uomxtz+egvwxwdgxp1+ft525xfxub8p...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Dear Support, > > I'm seeking clarification about the expected behavior of mpirun in Slurm > jobs. > > Our setup consists of using Slurm for resource allocation and OpenMPI > mpirun to launch MPI applications. We have found that when two Slurm jobs > have been allocated different cores on the same compute node that the MPI > ranks in Slurm job 1 map to the same cores as Slurm job 2. It appears that > OpenMPI mpirun is not considering the details of the Slurm allocation. We > get expected behavior when srun is employed as the MPI launcher instead of > mpirun, i.e. the MPI ranks in Slurm job 1 use different cores than the MPI > ranks in Slurm job 2. > > We have observed this with OpenMPI-4.1.6 and OpenMPI-5.0.2. Should we > expect that the mpirun in each job will only use the exact cores that were > allocated by Slurm? > > Thanks, > Chris > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > https://lists.open-mpi.org/mailman/private/users/attachments/20240223/676efca9/attachment.html > > > > ------------------------------ > > Message: 2 > Date: Fri, 23 Feb 2024 19:38:16 -0500 > From: Andrew Reid <andrew.ce.r...@gmail.com> > To: Open MPI Users <users@lists.open-mpi.org> > Cc: Christopher Daley <chrisdale...@gmail.com> > Subject: Re: [OMPI users] Subject: Clarification about mpirun behavior > in Slurm jobs > Message-ID: > < > cag5pxp06rpwbt6kwk0b1rm+j5zhjdjkumocg+uixash72ur...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > I had something like this happen on a test cluster of Raspberry Pis several > years ago, and in my case, I was able to isolate it to being an MPI issue > unrelated to SLURM. If you can run directly on the nodes, that might be a > useful distinction for you to try to make. (Running "directly" might mean > manually doing "miprun -n <n>" inside an srun-dispatched shell, if you > can''t bypass SLURM, which you shouldn't anyways.) > > In my case, on the four-core RPi, running e.g. two two-way jobs just > oversubscribed the first two cores, whether run from SLURM or directly. > > I found a work-around, was to use the "--map-by socket" argument to mpirun. > > I don't think I ever figured it out -- it was a short-lifetime test cluster > that I was using to explore SLURM config options. I also don't recall which > version of OpenMPI it was, but I'd guess it was the one that's packaged for > Debian/Raspbian 11, which is 4.1.0. > > -- A. > > On Fri, Feb 23, 2024 at 5:06?PM Christopher Daley via users < > users@lists.open-mpi.org> wrote: > > > Dear Support, > > > > I'm seeking clarification about the expected behavior of mpirun in Slurm > > jobs. > > > > Our setup consists of using Slurm for resource allocation and OpenMPI > > mpirun to launch MPI applications. We have found that when two Slurm jobs > > have been allocated different cores on the same compute node that the MPI > > ranks in Slurm job 1 map to the same cores as Slurm job 2. It appears > that > > OpenMPI mpirun is not considering the details of the Slurm allocation. We > > get expected behavior when srun is employed as the MPI launcher instead > of > > mpirun, i.e. the MPI ranks in Slurm job 1 use different cores than the > MPI > > ranks in Slurm job 2. > > > > We have observed this with OpenMPI-4.1.6 and OpenMPI-5.0.2. Should we > > expect that the mpirun in each job will only use the exact cores that > were > > allocated by Slurm? > > > > Thanks, > > Chris > > > > > -- > Andrew Reid / andrew.ce.r...@gmail.com > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > https://lists.open-mpi.org/mailman/private/users/attachments/20240223/0163655b/attachment.html > > > > ------------------------------ > > Message: 3 > Date: Sat, 24 Feb 2024 17:29:42 +0900 > From: Gilles Gouaillardet <gilles.gouaillar...@gmail.com> > To: Open MPI Users <users@lists.open-mpi.org> > Subject: Re: [OMPI users] Subject: Clarification about mpirun behavior > in Slurm jobs > Message-ID: > <CAAkFZ5tOorFw3c-eF6= > bt4_vfhfk4r_yxxjjwj4ffy+aa0u...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Christopher, > > I do not think Open MPI explicitly asks SLURM which cores have been > assigned on each node. > So if you are planning to run multiple jobs on the same node, your best bet > is probably to have SLURM > use cpusets. > > Cheers, > > Gilles > > On Sat, Feb 24, 2024 at 7:25?AM Christopher Daley via users < > users@lists.open-mpi.org> wrote: > > > Dear Support, > > > > I'm seeking clarification about the expected behavior of mpirun in Slurm > > jobs. > > > > Our setup consists of using Slurm for resource allocation and OpenMPI > > mpirun to launch MPI applications. We have found that when two Slurm jobs > > have been allocated different cores on the same compute node that the MPI > > ranks in Slurm job 1 map to the same cores as Slurm job 2. It appears > that > > OpenMPI mpirun is not considering the details of the Slurm allocation. We > > get expected behavior when srun is employed as the MPI launcher instead > of > > mpirun, i.e. the MPI ranks in Slurm job 1 use different cores than the > MPI > > ranks in Slurm job 2. > > > > We have observed this with OpenMPI-4.1.6 and OpenMPI-5.0.2. Should we > > expect that the mpirun in each job will only use the exact cores that > were > > allocated by Slurm? > > > > Thanks, > > Chris > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > https://lists.open-mpi.org/mailman/private/users/attachments/20240224/d421c2f7/attachment.html > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users > > ------------------------------ > > End of users Digest, Vol 4903, Issue 1 > ************************************** >