As you say, it all depends on your kernel :-) If the numactl libraries are available, we will explicitly set the memory policy to follow the bindings we apply. So doing as I suggested will cause the first process to have its memory “bound” to the first socket, even thought the process will also be using a core from the other region. If your process spawns a few threads to ensure that core exercises the memory, you’ll get plenty of cross-NUMA behavior to test against.
Which is why we recommend that users “don’t do that” :-) > On Jul 27, 2015, at 1:25 AM, Davide Cesari <dces...@arpa.emr.it> wrote: > > Hi Bill and Ralph, > well the Linux kernel does all its best to allocate memory on the local > NUMA node if it's available, so it is difficult to convince it to do > something harmful in this sense. I think that a way to test such a situation > would be to start mpi processes on a node in an usual way -reasonably the > processes will be bound to a socket or a core-, wait for the processes to > allocate their working memory, then either migrate the processes on the other > NUMA node (usually ==socket) or migrate its memory pages, the command-line > tools distributed with the numactl package (numactl or migratepages) can > probably allow to perform such a vandalism; this would put your system into a > worst-case scenario from the NUMA point of view. > In our system, I noticed in the past some strong slowdowns related to > NUMA in parallel processes when a single MPI process doing much more I/O than > the others tended to occupy all the local memory as disk cache, then the > processes on that NUMA node were forced to allocate memory on the other NUMA > node rather than reclaiming cache memory on the local node. I solved this in > a brutal way by cleaning the disk cache regularly on the computing nodes. In > my view this is the only case where (recent) Linux kernel does not have a > NUMA-aware behavior, I wonder whether there are HPC-optimized patches or > something has changed in this direction in recent kernel development. > > Best regards, Davide > >> Date: Fri, 24 Jul 2015 13:36:55 -0700 >> From: Ralph Castain <r...@open-mpi.org> >> To: Open MPI Users <us...@open-mpi.org> >> Subject: Re: [OMPI users] NUMA: Non-local memory access and >> performance effects on OpenMPI >> Hi Bill >> >> You actually can get OMPI to split a process across sockets. Let?s say there >> are 4 cores/socket and 2 sockets/node. You could run two procs on the same >> node, one split across sockets, by: >> >> mpirun -n 1 ?map-by core:pe=5 ./app : -n 1 ?map-by core:pe=3 ./app >> >> The first proc will run on all cores of the 1st socket plus the 1st core of >> the 2nd socket. The second proc will run on the remaining 3 cores of the 2nd >> socket. >> >> HTH >> Ralph >> >> >>> On Jul 24, 2015, at 12:48 PM, Lane, William <william.l...@cshs.org> wrote: >>> >>> I'm just curious, if we run an OpenMPI job and it makes use of non-local >>> memory >>> (i.e. memory tied to another socket) what kind of effects are seen on >>> performance? >>> >>> How would you go about testing the above? I can't think of any command line >>> parameter that >>> would allow one to split an OpenMPI process across sockets. >>> >>> I'd imagine it would be pretty bad since you can't cache non-local memory >>> locally, >>> the fact both the request and data have to flow through an IOH, the local >>> CPU would >>> have to compete w/the non-local CPU for access to its own memory and that >>> doing this >>> would have to implemented w/some sort of software semaphore locks (which >>> would add >>> even more overhead). >>> >>> Bill L. >>> IMPORTANT WARNING: This message is intended for the use of the person or >>> entity to which it is addressed and may contain information that is >>> privileged and confidential, the disclosure of which is governed by >>> applicable law. If the reader of this message is not the intended >>> recipient, or the employee or agent responsible for delivering it to the >>> intended recipient, you are hereby notified that any dissemination, >>> distribution or copying of this information is strictly prohibited. Thank >>> you for your cooperation. _______________________________________________ >>> users mailing list >>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2015/07/27322.php >>> <http://www.open-mpi.org/community/lists/users/2015/07/27322.php> > > > -- > ============================= Davide Cesari ============================ > Dott**(0.5) Davide Cesari > ARPA-Emilia Romagna, Servizio IdroMeteoClima > NWP modelling - Modellistica numerica previsionale > Tel. +39 051525926 > ======================================================================== > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/07/27331.php