On Oct 10, 2007, at 1:27 PM, Dirk Eddelbuettel wrote:
| Does this happen for all MPI programs (potentially only those that
| use the MPI-2 one-sided stuff), or just your R environment?

This is the likely winner.

It seems indeed due to R's Rmpi package. Running a simple mpitest.c shows no error message. We will look at the Rmpi initialization to see what could
cause this.

Does rmpi link in libmpi.so or dynamically load it at run-time? The pt2pt one-sided component uses the MPI-1 point-to-point calls for communication (hence, the pt2pt name). If those symbols were unavailable (say, because libmpi.so was dynamically loaded) I could see how this would cause problems.

The pt2pt component (rightly) does not have a -lmpi in its link line. The other components that use symbols in libmpi.so (wrongly) do have a -lmpi in their link line. This can cause some problems on some platforms (Linux tends to do dynamic linking / dynamic loading better than most). That's why only the pt2pt component fails.

My guess is that Rmpi is dynamically loading libmpi.so, but not specifying the RTLD_GLOBAL flag. This means that libmpi.so is not available to the components the way it should be, and all goes downhill from there. It only mostly works because we do something silly with how we link most of our components, and Linux is just smart enough to cover our rears (thankfully).

Solutions:

  - Someone could make the pt2pt osc component link in libmpi.so
    like the rest of the components and hope that no one ever
    tries this on a non-friendly platform.
  - Debian (and all Rmpi users) could configure Open MPI with the
     --disable-dlopen flag and ignore the problem.
  - Someone could fix Rmpi to dlopen libmpi.so with the RTLD_GLOBAL
    flag and fix the problem properly.

I think it's clear I'm in favor of Option 3.

Brian

Reply via email to