Re: [OMPI users] some NUMA / affinity questions

Jeff Squyres Mon, 24 Mar 2008 13:31:07 -0400

On Mar 24, 2008, at 6:29 AM, Mark Kosmowski wrote:

I have a successful ompi installation and my software runs across my
humble cluster of three dual-Opteron (single core) nodes on OpenSUSE
10.2.  I'm planning to upgrade some RAM soon and have been thinking of
playing with affinity, since each cpu will have it's own DIMMs after
the upgrade.  I have read the FAQ and know to use "--mca
mpi_paffinity_alone 1" to enable affinity.


It looks like I am running ompi 1.1.4 (see below).

mark@LT:~> ompi_info | grep affinity
          MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.1.4)

MCA maffinity: first_use (MCA v1.0, API v1.0, Componentv1.1.4)MCA maffinity: libnuma (MCA v1.0, API v1.0, Componentv1.1.4)


Does this old version of ompi do a good job of implementing affinity
or would it behoove me to use the current version if I am interested
in trying affinity?


It's the same level of affinity support as in the 1.2 series.

There are a few affinity upgrades in development, some of which willhit for the v1.3 series, some of which will be later:

- upgrade to a newer embedded version of PLPA; this probably won'taffect you much (will be in v1.3)

- allow assigning MPI processes to specific socket/core combinationsvia a file specification (will be in v1.3)

- have some "better" launch support such that resource managers whoimplement their own affinity controls (e.g., SLURM) can directly setthe affinity for MPI processes (some future version; probably won't beready for v1.3).

What sorts of time gains do people typically see with affinity?  (I'm
a chemistry student running planewave solid state calculation software
if this helps with the question)


As with everything, it depends.  :-)

- If you're just running one MPI process per core and you only haveone core per socket, you might simply see a "smoothing" of results --meaning that multiple runs of the same job will have slightly moreconsistent timing results (e.g., less "jitter" in the timings)

- If you have a NUMA architecture (e.g., AMD) and have multiple NICs,you can play games to get the MPI processes who are actually doing thecommunicating to be "close" to the NIC in the internal host topology.If your app is using a lot of short messages over low-latencyinterconnects, this can make a difference. If you're using TCP, itlikely won't make much of a difference. :-)

Lastly, two of the three machines will have all of their DIMM slots
populated by equal sized DIMMs.  However, one of my machines has two
processors, each of which having four DIMM slots.  This machine will
be getting 4 @ 1 Gb DIMMs and 2 @ 2 Gb DIMMs.  i am assuming that the
best thing for affinity would be to put all of the 1 Gb DIMMs to one
processor and the 2 Gb DIMMs to the other and to put the 2 Gb DIMMs in
slots 0 and 1.  Does it matter which processor gets which set of
DIMMs?

It depends on what your application is doing. You generally want tohave enough "local" RAM for the [MPI] processes that will be runningon each socket.


--
Jeff Squyres
Cisco Systems

Re: [OMPI users] some NUMA / affinity questions

Reply via email to