Thank you for the clarification, Gilles. That has saved me trying to debug
something that is not expected to work.

Can this be added to the OpenMPI documentation at
https://docs.open-mpi.org/en/main/launching-apps/slurm.html please? I feel
like this should be called out as a reason to use srun instead of mpirun.

Thanks,
Chris

On Sat, 24 Feb 2024 at 11:11, <users-requ...@lists.open-mpi.org> wrote:

> Send users mailing list submissions to
>         users@lists.open-mpi.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://lists.open-mpi.org/mailman/listinfo/users
> or, via email, send a message with subject or body 'help' to
>         users-requ...@lists.open-mpi.org
>
> You can reach the person managing the list at
>         users-ow...@lists.open-mpi.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of users digest..."
>
>
> Today's Topics:
>
>    1. Subject: Clarification about mpirun behavior in Slurm jobs
>       (Christopher Daley)
>    2. Re: Subject: Clarification about mpirun behavior in Slurm
>       jobs (Andrew Reid)
>    3. Re: Subject: Clarification about mpirun behavior in Slurm
>       jobs (Gilles Gouaillardet)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 23 Feb 2024 14:03:06 -0800
> From: Christopher Daley <chrisdale...@gmail.com>
> To: users@lists.open-mpi.org
> Subject: [OMPI users] Subject: Clarification about mpirun behavior in
>         Slurm jobs
> Message-ID:
>         <CAAh2_i9=
> nicad8z8uomxtz+egvwxwdgxp1+ft525xfxub8p...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Dear Support,
>
> I'm seeking clarification about the expected behavior of mpirun in Slurm
> jobs.
>
> Our setup consists of using Slurm for resource allocation and OpenMPI
> mpirun to launch MPI applications. We have found that when two Slurm jobs
> have been allocated different cores on the same compute node that the MPI
> ranks in Slurm job 1 map to the same cores as Slurm job 2. It appears that
> OpenMPI mpirun is not considering the details of the Slurm allocation. We
> get expected behavior when srun is employed as the MPI launcher instead of
> mpirun, i.e. the MPI ranks in Slurm job 1 use different cores than the MPI
> ranks in Slurm job 2.
>
> We have observed this with OpenMPI-4.1.6 and OpenMPI-5.0.2. Should we
> expect that the mpirun in each job will only use the exact cores that were
> allocated by Slurm?
>
> Thanks,
> Chris
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> https://lists.open-mpi.org/mailman/private/users/attachments/20240223/676efca9/attachment.html
> >
>
> ------------------------------
>
> Message: 2
> Date: Fri, 23 Feb 2024 19:38:16 -0500
> From: Andrew Reid <andrew.ce.r...@gmail.com>
> To: Open MPI Users <users@lists.open-mpi.org>
> Cc: Christopher Daley <chrisdale...@gmail.com>
> Subject: Re: [OMPI users] Subject: Clarification about mpirun behavior
>         in Slurm jobs
> Message-ID:
>         <
> cag5pxp06rpwbt6kwk0b1rm+j5zhjdjkumocg+uixash72ur...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> I had something like this happen on a test cluster of Raspberry Pis several
> years ago, and in my case, I was able to isolate it to being an MPI issue
> unrelated to SLURM. If you can run directly on the nodes, that might be a
> useful distinction for you to try to make. (Running "directly" might mean
> manually doing "miprun -n <n>" inside an srun-dispatched shell, if you
> can''t bypass SLURM, which you shouldn't anyways.)
>
> In my case, on the four-core RPi, running e.g. two two-way jobs just
> oversubscribed the first two cores, whether run from SLURM or directly.
>
> I found a work-around, was to use the "--map-by socket" argument to mpirun.
>
> I don't think I ever figured it out -- it was a short-lifetime test cluster
> that I was using to explore SLURM config options. I also don't recall which
> version of OpenMPI it was, but I'd guess it was the one that's packaged for
> Debian/Raspbian 11, which is 4.1.0.
>
>   -- A.
>
> On Fri, Feb 23, 2024 at 5:06?PM Christopher Daley via users <
> users@lists.open-mpi.org> wrote:
>
> > Dear Support,
> >
> > I'm seeking clarification about the expected behavior of mpirun in Slurm
> > jobs.
> >
> > Our setup consists of using Slurm for resource allocation and OpenMPI
> > mpirun to launch MPI applications. We have found that when two Slurm jobs
> > have been allocated different cores on the same compute node that the MPI
> > ranks in Slurm job 1 map to the same cores as Slurm job 2. It appears
> that
> > OpenMPI mpirun is not considering the details of the Slurm allocation. We
> > get expected behavior when srun is employed as the MPI launcher instead
> of
> > mpirun, i.e. the MPI ranks in Slurm job 1 use different cores than the
> MPI
> > ranks in Slurm job 2.
> >
> > We have observed this with OpenMPI-4.1.6 and OpenMPI-5.0.2. Should we
> > expect that the mpirun in each job will only use the exact cores that
> were
> > allocated by Slurm?
> >
> > Thanks,
> > Chris
> >
>
>
> --
> Andrew Reid / andrew.ce.r...@gmail.com
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> https://lists.open-mpi.org/mailman/private/users/attachments/20240223/0163655b/attachment.html
> >
>
> ------------------------------
>
> Message: 3
> Date: Sat, 24 Feb 2024 17:29:42 +0900
> From: Gilles Gouaillardet <gilles.gouaillar...@gmail.com>
> To: Open MPI Users <users@lists.open-mpi.org>
> Subject: Re: [OMPI users] Subject: Clarification about mpirun behavior
>         in Slurm jobs
> Message-ID:
>         <CAAkFZ5tOorFw3c-eF6=
> bt4_vfhfk4r_yxxjjwj4ffy+aa0u...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Christopher,
>
> I do not think Open MPI explicitly asks SLURM which cores have been
> assigned on each node.
> So if you are planning to run multiple jobs on the same node, your best bet
> is probably to have SLURM
> use cpusets.
>
> Cheers,
>
> Gilles
>
> On Sat, Feb 24, 2024 at 7:25?AM Christopher Daley via users <
> users@lists.open-mpi.org> wrote:
>
> > Dear Support,
> >
> > I'm seeking clarification about the expected behavior of mpirun in Slurm
> > jobs.
> >
> > Our setup consists of using Slurm for resource allocation and OpenMPI
> > mpirun to launch MPI applications. We have found that when two Slurm jobs
> > have been allocated different cores on the same compute node that the MPI
> > ranks in Slurm job 1 map to the same cores as Slurm job 2. It appears
> that
> > OpenMPI mpirun is not considering the details of the Slurm allocation. We
> > get expected behavior when srun is employed as the MPI launcher instead
> of
> > mpirun, i.e. the MPI ranks in Slurm job 1 use different cores than the
> MPI
> > ranks in Slurm job 2.
> >
> > We have observed this with OpenMPI-4.1.6 and OpenMPI-5.0.2. Should we
> > expect that the mpirun in each job will only use the exact cores that
> were
> > allocated by Slurm?
> >
> > Thanks,
> > Chris
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> https://lists.open-mpi.org/mailman/private/users/attachments/20240224/d421c2f7/attachment.html
> >
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
> ------------------------------
>
> End of users Digest, Vol 4903, Issue 1
> **************************************
>

Reply via email to