Re: [OMPI users] Memory affinity

Tim Prince Mon, 27 Sep 2010 20:11:15 -0400

 On 9/27/2010 2:50 PM, David Singleton wrote:

On 09/28/2010 06:52 AM, Tim Prince wrote:

On 9/27/2010 12:21 PM, Gabriele Fatigati wrote:

HI Tim,


I have read that link, but I haven't understood if enabling processor
affinity are enabled also memory affinity because is written that:

"Note that memory affinity support is enabled only when processor
affinity is enabled"

Can i set processory affinity without memory affinity? This is my
question..


2010/9/27 Tim Prince<n...@aol.com>

On 9/27/2010 9:01 AM, Gabriele Fatigati wrote:

if OpenMPI is numa-compiled, memory affinity is enabled by default?
Because I didn't find memory affinity alone ( similar) parameter to
set at 1.

The FAQ http://www.open-mpi.org/faq/?category=tuning#using-paffinity
has a useful introduction to affinity. It's available in a default
build, but not enabled by default.

Memory affinity is implied by processor affinity. Your system libraries
are set up so as to cause any memory allocated to be made local to the
processor, if possible. That's one of the primary benefits of processor
affinity. Not being an expert in openmpi, I assume, in the absence of
further easily accessible documentation, there's no useful explicit way
to disable maffinity while using paffinity on platforms other than the
specified legacy platforms.


Memory allocation policy really needs to be independent of processor
binding policy.  The default memory policy (memory affinity) of "attempt
to allocate to the NUMA node of the cpu that made the allocation request

but fallback as needed" is flawed in a number of situations. This istrueeven when MPI jobs are given dedicated access to processors. A commonone is

where the local NUMA node is full of pagecache pages (from the checkpoint

of the last job to complete). For those sites that supportsuspend/resume

based scheduling, NUMA nodes will generally contain pages from suspended

jobs. Ideally, the new (suspending) job should suffer a little bit ofpagingoverhead (pushing out the suspended job) to get ideal memory placementfor

the next 6 or whatever hours of execution.

An mbind (MPOL_BIND) policy of binding to the one local NUMA node willnotwork in the case of one process requiring more memory than that localNUMA

node.  One scenario is a master-slave where you might want:
  master (rank 0) bound to processor 0 but not memory bound

slave (rank i) bound to processor i and memory bound to the localmemory

        of processor i.

They really are independent requirements.

Cheers,
David

_______________________________________________

interesting; I agree with those of your points on which I have enoughexperience to have an opinion.However, the original question was not whether it would be desirable tohave independent memory affinity, but whether it is possible currentlywithin openmpi to avoid memory placements being influenced by processoraffinity.I have seen the case you mention, where performance of a long jobsuffers because the state of memory from a previous job results in anabnormal number of allocations falling over to other NUMA nodes, but Idon't know the practical solution.


--
Tim Prince

Re: [OMPI users] Memory affinity

Reply via email to