On Feb 24, 2009, at 1:07 PM, Jeff Squyres wrote:
The minimpi(.py) Python module loads the minimpiext(.c) module and
calls
its minimpiext.init() method (defined in minimpiext.c) which in turn
calls MPI_Init(). "minimpiext.c" is linked against libmpi. Libmpi is
loaded as soon as Python evaluates "import minimpi".
Ah, ok. I wonder if you're not building properly. -lmpi is not
usually suffucient to build an Open MPI application; we hide a bunch
of flags inside mpicc you can see via mpicc --showme.
How does one add more ldflags to your setup.py script?
Never mind; a few quick google searches and I found the
extra_linker_args python param. That turned out to be a red herring,
anyway.
The issue is that Python is apparently dlopen'ing libmpi in a private
scope. Specifically, if I "strace python" and type in "import
minimpi", I see the following go by:
-----
open("/home/jsquyres/bogus/lib/libmpi.so.0", O_RDONLY) = 5
read(5, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\241\1\0"...,
832) = 832
fstat(5, {st_mode=S_IFREG|0755, st_size=799503, ...}) = 0
mmap(NULL, 1690760, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 5,
0) = 0x2a987f6000
-----
Which I think corresponds to calling dlopen() without RTLD_GLOBAL.
The problem is that Open MPI is also built upon plugins. And OMPI's
plugins use symbols in OMPI's libraries. Hence, when we dynamically
load OMPI's plugins, they need to be able to resolve some symbols with
symbols that can be found in the process. Hence, if libmpi is loaded
in a private scope, and then libmpi turns around and calls dlopen() to
open a plugin, libmpi's symbols (and libopen-rte and libopen-pal) are
not available to the plugin. Hence, the plugin fails to load. This
error propagates up the stack and MPI_INIT eventually fails.
You have some possible workarounds:
- We recommended to the PyMPI author a while ago that he add his own
dlopen() of libmpi before calling MPI_INIT, but specifically using
RTLD_GLOBAL, so that the library is opened in the global process space
(not a private space in the process). Then libmpi's (and friends)
symbols will be available to its plugins. If you're unhappy with the
non-portability of dlopen, try lt_dlopen_advise() -- it's a portable
version that is linked inside Open MPI.
- Another option is to configure/compile Open MPI with "--disable-
dlopen" or "--enable-static --disable-shared" configure options.
Either of these options will cause Open MPI to slurp all of its
plugins up into libmpi (etc) and not dynamically open them at run-
time, thereby avoiding the problem of Python opening libmpi in a
private scope.
- Get Python to give you the possibility of opening dependent
libraries in the global scope. This may be somewhat controversial;
there are good reasons to open plugins in private scopes. But I have
to imagine that OMPI is not the only python extension out there that
wants to open plugins of its own; other such projects should be
running into similar issues.
--
Jeff Squyres
Cisco Systems