I am seeing the same error, but I am using mpi4py (Lisandro Dalcin's
Python MPI bindings).  I don't think that libmpi.so is being dlopen'd
directly at runtime, but, the shared library that is linked at compile
time to libmpi.so is probably being loaded at runtime.  The odd thing
is that mpi4py has been tested extensively with openmpi and this is
the first version of openmpi that we have seen this issue.  I tried
1.2.3 again yesterday and it works fine.  What changed with 1.2.4?

The problem with our case is that the code that is doing the dlopen is
deep inside Python itself (not just mpi4py).  It is the same code that
is responsible for loading _everything_ into Python, and I am pretty
sure that  there is no way that people would be willing to change it.
I am cc'ing this to Lisandro - maybe he has some ideas on this front.

Thanks

Brian

On 10/10/07, Brian Barrett <brbar...@open-mpi.org> wrote:
> On Oct 10, 2007, at 1:27 PM, Dirk Eddelbuettel wrote:
> > | Does this happen for all MPI programs (potentially only those that
> > | use the MPI-2 one-sided stuff), or just your R environment?
> >
> > This is the likely winner.
> >
> > It seems indeed due to R's Rmpi package. Running a simple mpitest.c
> > shows no
> > error message. We will look at the Rmpi initialization to see what
> > could
> > cause this.
>
> Does rmpi link in libmpi.so or dynamically load it at run-time?  The
> pt2pt one-sided component uses the MPI-1 point-to-point calls for
> communication (hence, the pt2pt name). If those symbols were
> unavailable (say, because libmpi.so was dynamically loaded) I could
> see how this would cause problems.
>
> The pt2pt component (rightly) does not have a -lmpi in its link
> line.  The other components that use symbols in libmpi.so (wrongly)
> do  have a -lmpi in their link line.  This can cause some problems on
> some platforms (Linux tends to do dynamic linking / dynamic loading
> better than most).  That's why only the pt2pt component fails.
>
> My guess is that Rmpi is dynamically loading libmpi.so, but not
> specifying the RTLD_GLOBAL flag.  This means that libmpi.so is not
> available to the components the way it should be, and all goes
> downhill from there.  It only mostly works because we do something
> silly with how we link most of our components, and Linux is just
> smart enough to cover our rears (thankfully).
>
> Solutions:
>
>    - Someone could make the pt2pt osc component link in libmpi.so
>      like the rest of the components and hope that no one ever
>      tries this on a non-friendly platform.
>    - Debian (and all Rmpi users) could configure Open MPI with the
>       --disable-dlopen flag and ignore the problem.
>    - Someone could fix Rmpi to dlopen libmpi.so with the RTLD_GLOBAL
>      flag and fix the problem properly.
>
> I think it's clear I'm in favor of Option 3.
>
> Brian
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Reply via email to