Nope - we don’t currently support cross-job shared memory operations. Nathan has talked about doing so for vader, but not at this time.
> On Jun 14, 2016, at 12:38 PM, Louis Williams <louis.willi...@gatech.edu> > wrote: > > Hi, > > I am attempting to use the sm and vader BTLs between a client and server > process, but it seems impossible to use fast transports (i.e. not TCP) > between two independent groups started with two separate mpirun invocations. > Am I correct, or is there a way to communicate using shared memory between a > client and server like this? It seems this might be the case: > https://github.com/open-mpi/ompi/blob/master/ompi/dpm/dpm.c#L495 > <https://github.com/open-mpi/ompi/blob/master/ompi/dpm/dpm.c#L495> > > The server calls MPI::COMM_WORLD.Accept() and the client calls > MPI::COMM_WORLD.Connect(). Each program is started with "mpirun --np 1 --mca > btl self,sm,vader <exectuable>" where the executable is either the client or > server program. When no BTL is specified, both establish a TCP connection > just fine. But when the sm and vader BTLs are specified, immediately after > the Connect() call, both client and server exit with the message, copied at > the end. It seems as though intergroup communication can't use fast transport > and only uses TCP. > > Also, as expected, running the Accept() and Connect() calls within a single > program with "mpirun -np 2 --mca btl self,sm,vader ..." uses shared memory as > transport. > > $> mpirun --ompi-server "3414491136.0;tcp://10.4.131.16:49775 > <http://10.4.131.16:49775/>" -np 1 --mca btl self,vader ./server > > At least one pair of MPI processes are unable to reach each other for > MPI communications. This means that no Open MPI device has indicated > that it can be used to communicate between these processes. This is > an error; Open MPI requires that all MPI processes be able to reach > each other. This error can sometimes be the result of forgetting to > specify the "self" BTL. > > Process 1 ([[50012,1],0]) is on host: MacBook-Pro-80 > Process 2 ([[50010,1],0]) is on host: MacBook-Pro-80 > BTLs attempted: self > > Your MPI job is now going to abort; sorry. > -------------------------------------------------------------------------- > [MacBook-Pro-80.local:57315] [[50012,1],0] ORTE_ERROR_LOG: Unreachable in > file dpm_orte.c at line 523 > [MacBook-Pro-80:57315] *** An error occurred in MPI_Comm_accept > [MacBook-Pro-80:57315] *** reported by process [7572553729,4294967296] > [MacBook-Pro-80:57315] *** on communicator MPI_COMM_WORLD > [MacBook-Pro-80:57315] *** MPI_ERR_INTERN: internal error > [MacBook-Pro-80:57315] *** MPI_ERRORS_ARE_FATAL (processes in this > communicator will now abort, > [MacBook-Pro-80:57315] *** and potentially your MPI job) > ------------------------------------------------------- > Primary job terminated normally, but 1 process returned > a non-zero exit code.. Per user-direction, the job has been aborted. > ------------------------------------------------------- > -------------------------------------------------------------------------- > mpirun detected that one or more processes exited with non-zero status, thus > causing > the job to be terminated. The first process to do so was: > > Process name: [[50012,1],0] > Exit code: 17 > -------------------------------------------------------------------------- > > Thanks, > Louis > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/06/29441.php