On 9/27/2010 2:50 PM, David Singleton wrote:
On 09/28/2010 06:52 AM, Tim Prince wrote:
On 9/27/2010 12:21 PM, Gabriele Fatigati wrote:
HI Tim,

I have read that link, but I haven't understood if enabling processor
affinity are enabled also memory affinity because is written that:

"Note that memory affinity support is enabled only when processor
affinity is enabled"

Can i set processory affinity without memory affinity? This is my
question..


2010/9/27 Tim Prince<n...@aol.com>
On 9/27/2010 9:01 AM, Gabriele Fatigati wrote:
if OpenMPI is numa-compiled, memory affinity is enabled by default?
Because I didn't find memory affinity alone ( similar) parameter to
set at 1.


The FAQ http://www.open-mpi.org/faq/?category=tuning#using-paffinity
has a useful introduction to affinity. It's available in a default
build, but not enabled by default.

Memory affinity is implied by processor affinity. Your system libraries
are set up so as to cause any memory allocated to be made local to the
processor, if possible. That's one of the primary benefits of processor
affinity. Not being an expert in openmpi, I assume, in the absence of
further easily accessible documentation, there's no useful explicit way
to disable maffinity while using paffinity on platforms other than the
specified legacy platforms.


Memory allocation policy really needs to be independent of processor
binding policy.  The default memory policy (memory affinity) of "attempt
to allocate to the NUMA node of the cpu that made the allocation request
but fallback as needed" is flawed in a number of situations. This is true even when MPI jobs are given dedicated access to processors. A common one is
where the local NUMA node is full of pagecache pages (from the checkpoint
of the last job to complete). For those sites that support suspend/resume
based scheduling, NUMA nodes will generally contain pages from suspended
jobs. Ideally, the new (suspending) job should suffer a little bit of paging overhead (pushing out the suspended job) to get ideal memory placement for
the next 6 or whatever hours of execution.

An mbind (MPOL_BIND) policy of binding to the one local NUMA node will not work in the case of one process requiring more memory than that local NUMA
node.  One scenario is a master-slave where you might want:
  master (rank 0) bound to processor 0 but not memory bound
slave (rank i) bound to processor i and memory bound to the local memory
        of processor i.

They really are independent requirements.

Cheers,
David

_______________________________________________
interesting; I agree with those of your points on which I have enough experience to have an opinion. However, the original question was not whether it would be desirable to have independent memory affinity, but whether it is possible currently within openmpi to avoid memory placements being influenced by processor affinity. I have seen the case you mention, where performance of a long job suffers because the state of memory from a previous job results in an abnormal number of allocations falling over to other NUMA nodes, but I don't know the practical solution.

--
Tim Prince

Reply via email to