On 10 October 2007 at 15:27, Brian Granger wrote:
| I am seeing the same error, but I am using mpi4py (Lisandro Dalcin's
| Python MPI bindings).  I don't think that libmpi.so is being dlopen'd
| directly at runtime, but, the shared library that is linked at compile
| time to libmpi.so is probably being loaded at runtime.  The odd thing
| is that mpi4py has been tested extensively with openmpi and this is
| the first version of openmpi that we have seen this issue.  I tried
| 1.2.3 again yesterday and it works fine.  What changed with 1.2.4?
| 
| The problem with our case is that the code that is doing the dlopen is
| deep inside Python itself (not just mpi4py).  It is the same code that

That's the same for R. We don;t touch the innert guts of module loading for
this . What Hao realized after looking at the corresponding FAQ item was that
right before calling MPI_Init, one can load libmpi explicitly, and -- and
that;s the important bit -- set the proper RTLD_GLOBAL argument.  

So you could adapt the patch we used :

   a) add an include for dlfcn.h

   b) explicitly call dlopen on libmpi.so with RTLD_GLOBAL

That should be reasonably easy to test as you only need to rebuild mpi4py,


--- rmpi-0.5-4.orig/src/Rmpi.c
+++ rmpi-0.5-4/src/Rmpi.c
@@ -16,6 +16,7 @@
  */

 #include "Rmpi.h"
+#include <dlfcn.h>

 static MPI_Comm        *comm;
 static MPI_Status *status;
@@ -32,7 +33,9 @@
 if (flag)
                return AsInt(1);
        else {  
-               MPI_Init((void *)0,(void *)0);
+               char *libm="libmpi.so";
+               dlopen(libm,RTLD_GLOBAL);
+               MPI_Init((void *)0,(void *)0);
                MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
                MPI_Errhandler_set(MPI_COMM_SELF, MPI_ERRORS_RETURN);
                comm=(MPI_Comm *)Calloc(COMM_MAXSIZE, MPI_Comm); 


| is responsible for loading _everything_ into Python, and I am pretty
| sure that  there is no way that people would be willing to change it.
| I am cc'ing this to Lisandro - maybe he has some ideas on this front.

Actually, looked like you didn't CC him.

Hth, Dirk

| 
| Thanks
| 
| Brian
| 
| On 10/10/07, Brian Barrett <brbar...@open-mpi.org> wrote:
| > On Oct 10, 2007, at 1:27 PM, Dirk Eddelbuettel wrote:
| > > | Does this happen for all MPI programs (potentially only those that
| > > | use the MPI-2 one-sided stuff), or just your R environment?
| > >
| > > This is the likely winner.
| > >
| > > It seems indeed due to R's Rmpi package. Running a simple mpitest.c
| > > shows no
| > > error message. We will look at the Rmpi initialization to see what
| > > could
| > > cause this.
| >
| > Does rmpi link in libmpi.so or dynamically load it at run-time?  The
| > pt2pt one-sided component uses the MPI-1 point-to-point calls for
| > communication (hence, the pt2pt name). If those symbols were
| > unavailable (say, because libmpi.so was dynamically loaded) I could
| > see how this would cause problems.
| >
| > The pt2pt component (rightly) does not have a -lmpi in its link
| > line.  The other components that use symbols in libmpi.so (wrongly)
| > do  have a -lmpi in their link line.  This can cause some problems on
| > some platforms (Linux tends to do dynamic linking / dynamic loading
| > better than most).  That's why only the pt2pt component fails.
| >
| > My guess is that Rmpi is dynamically loading libmpi.so, but not
| > specifying the RTLD_GLOBAL flag.  This means that libmpi.so is not
| > available to the components the way it should be, and all goes
| > downhill from there.  It only mostly works because we do something
| > silly with how we link most of our components, and Linux is just
| > smart enough to cover our rears (thankfully).
| >
| > Solutions:
| >
| >    - Someone could make the pt2pt osc component link in libmpi.so
| >      like the rest of the components and hope that no one ever
| >      tries this on a non-friendly platform.
| >    - Debian (and all Rmpi users) could configure Open MPI with the
| >       --disable-dlopen flag and ignore the problem.
| >    - Someone could fix Rmpi to dlopen libmpi.so with the RTLD_GLOBAL
| >      flag and fix the problem properly.
| >
| > I think it's clear I'm in favor of Option 3.
| >
| > Brian
| > _______________________________________________
| > users mailing list
| > us...@open-mpi.org
| > http://www.open-mpi.org/mailman/listinfo.cgi/users
| >
| _______________________________________________
| users mailing list
| us...@open-mpi.org
| http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Three out of two people have difficulties with fractions.

Reply via email to