Re: [OMPI users] NUMA: Non-local memory access and performance effects on OpenMPI

John Hearns Mon, 27 Jul 2015 05:46:23 -0400 (EDT)

As an aside, with Slurm you can use:

sbatch  --ntasks-per-socket=<N>


I would hazard a guess that this uses the OpenMPI syntax as above to
perform the binding to core!

On 27 July 2015 at 09:47, Ralph Castain <r...@open-mpi.org> wrote:

> As you say, it all depends on your kernel  :-)
>
> If the numactl libraries are available, we will explicitly set the memory
> policy to follow the bindings we apply. So doing as I suggested will cause
> the first process to have its memory “bound” to the first socket, even
> thought the process will also be using a core from the other region. If
> your process spawns a few threads to ensure that core exercises the memory,
> you’ll get plenty of cross-NUMA behavior to test against.
>
> Which is why we recommend that users “don’t do that” :-)
>
>
> > On Jul 27, 2015, at 1:25 AM, Davide Cesari <dces...@arpa.emr.it> wrote:
> >
> > Hi Bill and Ralph,
> >       well the Linux kernel does all its best to allocate memory on the
> local NUMA node if it's available, so it is difficult to convince it to do
> something harmful in this sense. I think that a way to test such a
> situation would be to start mpi processes on a node in an usual way
> -reasonably the processes will be bound to a socket or a core-, wait for
> the processes to allocate their working memory, then either migrate the
> processes on the other NUMA node (usually ==socket) or migrate its memory
> pages, the command-line tools distributed with the numactl package (numactl
> or migratepages) can probably allow to perform such a vandalism; this would
> put your system into a worst-case scenario from the NUMA point of view.
> >       In our system, I noticed in the past some strong slowdowns related
> to NUMA in parallel processes when a single MPI process doing much more I/O
> than the others tended to occupy all the local memory as disk cache, then
> the processes on that NUMA node were forced to allocate memory on the other
> NUMA node rather than reclaiming cache memory on the local node. I solved
> this in a brutal way by cleaning the disk cache regularly on the computing
> nodes. In my view this is the only case where (recent) Linux kernel does
> not have a NUMA-aware behavior, I wonder whether there are HPC-optimized
> patches or something has changed in this direction in recent kernel
> development.
> >
> >       Best regards, Davide
> >
> >> Date: Fri, 24 Jul 2015 13:36:55 -0700
> >> From: Ralph Castain <r...@open-mpi.org>
> >> To: Open MPI Users <us...@open-mpi.org>
> >> Subject: Re: [OMPI users] NUMA: Non-local memory access and
> >>      performance     effects on OpenMPI
> >> Hi Bill
> >>
> >> You actually can get OMPI to split a process across sockets. Let?s say
> there are 4 cores/socket and 2 sockets/node. You could run two procs on the
> same node, one split across sockets, by:
> >>
> >> mpirun -n 1 ?map-by core:pe=5 ./app : -n 1 ?map-by core:pe=3 ./app
> >>
> >> The first proc will run on all cores of the 1st socket plus the 1st
> core of the 2nd socket. The second proc will run on the remaining 3 cores
> of the 2nd socket.
> >>
> >> HTH
> >> Ralph
> >>
> >>
> >>> On Jul 24, 2015, at 12:48 PM, Lane, William <william.l...@cshs.org>
> wrote:
> >>>
> >>> I'm just curious, if we run an OpenMPI job and it makes use of
> non-local memory
> >>> (i.e. memory tied to another socket) what kind of effects are seen on
> performance?
> >>>
> >>> How would you go about testing the above? I can't think of any command
> line parameter that
> >>> would allow one to split an OpenMPI process across sockets.
> >>>
> >>> I'd imagine it would be pretty bad since you can't cache non-local
> memory locally,
> >>> the fact both the request and data have to flow through an IOH, the
> local CPU would
> >>> have to compete w/the non-local CPU for access to its own memory and
> that doing this
> >>> would have to implemented w/some sort of software semaphore locks
> (which would add
> >>> even more overhead).
> >>>
> >>> Bill L.
> >>> IMPORTANT WARNING: This message is intended for the use of the person
> or entity to which it is addressed and may contain information that is
> privileged and confidential, the disclosure of which is governed by
> applicable law. If the reader of this message is not the intended
> recipient, or the employee or agent responsible for delivering it to the
> intended recipient, you are hereby notified that any dissemination,
> distribution or copying of this information is strictly prohibited. Thank
> you for your cooperation. _______________________________________________
> >>> users mailing list
> >>> us...@open-mpi.org <mailto:us...@open-mpi.org>
> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users <
> http://www.open-mpi.org/mailman/listinfo.cgi/users>
> >>> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/07/27322.php <
> http://www.open-mpi.org/community/lists/users/2015/07/27322.php>
> >
> >
> > --
> > ============================= Davide Cesari ============================
> > Dott**(0.5) Davide Cesari
> > ARPA-Emilia Romagna, Servizio IdroMeteoClima
> > NWP modelling - Modellistica numerica previsionale
> > Tel. +39 051525926
> > ========================================================================
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/07/27331.php
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/07/27332.php

Re: [OMPI users] NUMA: Non-local memory access and performance effects on OpenMPI

Reply via email to