Nope - we don’t currently support cross-job shared memory operations. Nathan 
has talked about doing so for vader, but not at this time.


> On Jun 14, 2016, at 12:38 PM, Louis Williams <louis.willi...@gatech.edu> 
> wrote:
> 
> Hi,
> 
> I am attempting to use the sm and vader BTLs between a client and server 
> process, but it seems impossible to use fast transports (i.e. not TCP) 
> between two independent groups started with two separate mpirun invocations. 
> Am I correct, or is there a way to communicate using shared memory between a 
> client and server like this? It seems this might be the case: 
> https://github.com/open-mpi/ompi/blob/master/ompi/dpm/dpm.c#L495 
> <https://github.com/open-mpi/ompi/blob/master/ompi/dpm/dpm.c#L495>
> 
> The server calls MPI::COMM_WORLD.Accept() and the client calls 
> MPI::COMM_WORLD.Connect(). Each program is started with "mpirun --np 1 --mca 
> btl self,sm,vader <exectuable>" where the executable is either the client or 
> server program. When no BTL is specified, both establish a TCP connection 
> just fine. But when the sm and vader BTLs are specified, immediately after 
> the Connect() call, both client and server exit with the message, copied at 
> the end. It seems as though intergroup communication can't use fast transport 
> and only uses TCP. 
> 
> Also, as expected, running the Accept() and Connect() calls within a single 
> program with "mpirun -np 2 --mca btl self,sm,vader ..." uses shared memory as 
> transport.
> 
> $> mpirun --ompi-server "3414491136.0;tcp://10.4.131.16:49775 
> <http://10.4.131.16:49775/>" -np 1 --mca btl self,vader ./server
> 
> At least one pair of MPI processes are unable to reach each other for
> MPI communications.  This means that no Open MPI device has indicated
> that it can be used to communicate between these processes.  This is
> an error; Open MPI requires that all MPI processes be able to reach
> each other.  This error can sometimes be the result of forgetting to
> specify the "self" BTL.
> 
>   Process 1 ([[50012,1],0]) is on host: MacBook-Pro-80
>   Process 2 ([[50010,1],0]) is on host: MacBook-Pro-80
>   BTLs attempted: self
> 
> Your MPI job is now going to abort; sorry.
> --------------------------------------------------------------------------
> [MacBook-Pro-80.local:57315] [[50012,1],0] ORTE_ERROR_LOG: Unreachable in 
> file dpm_orte.c at line 523
> [MacBook-Pro-80:57315] *** An error occurred in MPI_Comm_accept
> [MacBook-Pro-80:57315] *** reported by process [7572553729,4294967296]
> [MacBook-Pro-80:57315] *** on communicator MPI_COMM_WORLD
> [MacBook-Pro-80:57315] *** MPI_ERR_INTERN: internal error
> [MacBook-Pro-80:57315] *** MPI_ERRORS_ARE_FATAL (processes in this 
> communicator will now abort,
> [MacBook-Pro-80:57315] ***    and potentially your MPI job)
> -------------------------------------------------------
> Primary job  terminated normally, but 1 process returned
> a non-zero exit code.. Per user-direction, the job has been aborted.
> -------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun detected that one or more processes exited with non-zero status, thus 
> causing
> the job to be terminated. The first process to do so was:
> 
>   Process name: [[50012,1],0]
>   Exit code:    17
> -------------------------------------------------------------------------- 
> 
> Thanks,
> Louis
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/06/29441.php

Reply via email to