This is to help anyone else having this problem, as it doesn't seem to
be mentioned anywhere I can find, rather surprisingly.

Core binding is broken on Interlagos with open-mpi 1.5.4.  I guess it
also bites on Magny-Cours, but all our systems are currently busy and I
can't check.

It does work, at least basically, in 1.5.5rc1, but the release notes for
that don't give any indication.  Perhaps someone could mention
Interlagos in the notes, and any other hardware that might be affected
(presumably Magny-Cours and some Power if it's confusion introduced by
the extra NUMA level).

As an example of the error, with 1.5.4 on 32-core Interlagos invoked
like

  mpirun -np 32 --bind-to-core --bycore  --report-bindings ...

you get

  ...
  [compute002:18153] [[14894,0],0] odls:default:fork binding child 
[[14894,1],15] to cpus 40000000
  --------------------------------------------------------------------------
  An invalid physical processor id was returned when attempting to
  set processor affinity - please check to ensure that your system
  supports such functionality. If so, then this is probably something
  that should be reported to the OMPI developers.
  --------------------------------------------------------------------------
  ...

It works up to 16 cores.

We seem to have issues even with 1.5.5rc1, but I'll try to get bug
reports into the tracker.  I hope the heads-up here is useful though.

Reply via email to