As you say, it all depends on your kernel  :-)

If the numactl libraries are available, we will explicitly set the memory 
policy to follow the bindings we apply. So doing as I suggested will cause the 
first process to have its memory “bound” to the first socket, even thought the 
process will also be using a core from the other region. If your process spawns 
a few threads to ensure that core exercises the memory, you’ll get plenty of 
cross-NUMA behavior to test against.

Which is why we recommend that users “don’t do that” :-)


> On Jul 27, 2015, at 1:25 AM, Davide Cesari <dces...@arpa.emr.it> wrote:
> 
> Hi Bill and Ralph,
>       well the Linux kernel does all its best to allocate memory on the local 
> NUMA node if it's available, so it is difficult to convince it to do 
> something harmful in this sense. I think that a way to test such a situation 
> would be to start mpi processes on a node in an usual way -reasonably the 
> processes will be bound to a socket or a core-, wait for the processes to 
> allocate their working memory, then either migrate the processes on the other 
> NUMA node (usually ==socket) or migrate its memory pages, the command-line 
> tools distributed with the numactl package (numactl or migratepages) can 
> probably allow to perform such a vandalism; this would put your system into a 
> worst-case scenario from the NUMA point of view.
>       In our system, I noticed in the past some strong slowdowns related to 
> NUMA in parallel processes when a single MPI process doing much more I/O than 
> the others tended to occupy all the local memory as disk cache, then the 
> processes on that NUMA node were forced to allocate memory on the other NUMA 
> node rather than reclaiming cache memory on the local node. I solved this in 
> a brutal way by cleaning the disk cache regularly on the computing nodes. In 
> my view this is the only case where (recent) Linux kernel does not have a 
> NUMA-aware behavior, I wonder whether there are HPC-optimized patches or 
> something has changed in this direction in recent kernel development.
> 
>       Best regards, Davide
> 
>> Date: Fri, 24 Jul 2015 13:36:55 -0700
>> From: Ralph Castain <r...@open-mpi.org>
>> To: Open MPI Users <us...@open-mpi.org>
>> Subject: Re: [OMPI users] NUMA: Non-local memory access and
>>      performance     effects on OpenMPI
>> Hi Bill
>> 
>> You actually can get OMPI to split a process across sockets. Let?s say there 
>> are 4 cores/socket and 2 sockets/node. You could run two procs on the same 
>> node, one split across sockets, by:
>> 
>> mpirun -n 1 ?map-by core:pe=5 ./app : -n 1 ?map-by core:pe=3 ./app
>> 
>> The first proc will run on all cores of the 1st socket plus the 1st core of 
>> the 2nd socket. The second proc will run on the remaining 3 cores of the 2nd 
>> socket.
>> 
>> HTH
>> Ralph
>> 
>> 
>>> On Jul 24, 2015, at 12:48 PM, Lane, William <william.l...@cshs.org> wrote:
>>> 
>>> I'm just curious, if we run an OpenMPI job and it makes use of non-local 
>>> memory
>>> (i.e. memory tied to another socket) what kind of effects are seen on 
>>> performance?
>>> 
>>> How would you go about testing the above? I can't think of any command line 
>>> parameter that
>>> would allow one to split an OpenMPI process across sockets.
>>> 
>>> I'd imagine it would be pretty bad since you can't cache non-local memory 
>>> locally,
>>> the fact both the request and data have to flow through an IOH, the local 
>>> CPU would
>>> have to compete w/the non-local CPU for access to its own memory and that 
>>> doing this
>>> would have to implemented w/some sort of software semaphore locks (which 
>>> would add
>>> even more overhead).
>>> 
>>> Bill L.
>>> IMPORTANT WARNING: This message is intended for the use of the person or 
>>> entity to which it is addressed and may contain information that is 
>>> privileged and confidential, the disclosure of which is governed by 
>>> applicable law. If the reader of this message is not the intended 
>>> recipient, or the employee or agent responsible for delivering it to the 
>>> intended recipient, you are hereby notified that any dissemination, 
>>> distribution or copying of this information is strictly prohibited. Thank 
>>> you for your cooperation. _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2015/07/27322.php 
>>> <http://www.open-mpi.org/community/lists/users/2015/07/27322.php>
> 
> 
> -- 
> ============================= Davide Cesari ============================
> Dott**(0.5) Davide Cesari
> ARPA-Emilia Romagna, Servizio IdroMeteoClima
> NWP modelling - Modellistica numerica previsionale
> Tel. +39 051525926
> ========================================================================
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/07/27331.php

Reply via email to