Re: [OMPI users] cuIpcOpenMemHandle failure when using OpenMPI 1.8.5 with CUDA 7.0 and Multi-Process Service

Lev Givon Thu, 21 May 2015 14:18:39 -0400 (EDT)

Received from Lev Givon on Thu, May 21, 2015 at 11:32:33AM EDT:
> Received from Rolf vandeVaart on Wed, May 20, 2015 at 07:48:15AM EDT:
> 
> (snip)
> 
> > I see that you mentioned you are starting 4 MPS daemons.  Are you following
> > the instructions here?
> > 
> > http://cudamusing.blogspot.de/2013/07/enabling-cuda-multi-process-service-mps.html
> >  
> 
> Yes - also
> https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf
> 
> > This relies on setting CUDA_VISIBLE_DEVICES which can cause problems for 
> > CUDA
> > IPC. Since you are using CUDA 7 there is no more need to start multiple
> > daemons. You simply leave CUDA_VISIBLE_DEVICES untouched and start a single
> > MPS control daemon which will handle all GPUs.  Can you try that?  
> 
> I assume that this means that only one CUDA_MPS_PIPE_DIRECTORY value should be
> passed to all MPI processes. 
> 
> Several questions related to your comment above:
> 
> - Should the MPI processes select and initialize the GPUs they respectively 
> need
>   to access as they normally would when MPS is not in use?
> - Can CUDA_VISIBLE_DEVICES be used to control what GPUs are visible to MPS 
> (and
>   hence the client processes)? I ask because SLURM uses CUDA_VISIBLE_DEVICES 
> to
>   control GPU resource allocation, and I would like to run my program (and the
>   MPS control daemon) on a cluster via SLURM.
> - Does the clash between setting CUDA_VISIBLE_DEVICES and CUDA IPC imply that
>   MPS and CUDA IPC cannot reliably be used simultaneously in a multi-GPU 
> setting
>   with CUDA 6.5 even when one starts multiple MPS control daemons as described
>   in the aforementioned blog post?


Using a single control daemon with CUDA_VISIBLE_DEVICES unset appears to solve
the problem when IPC is enabled.
-- 
Lev Givon
Bionet Group | Neurokernel Project
http://www.columbia.edu/~lev/
http://lebedov.github.io/
http://neurokernel.github.io/

Re: [OMPI users] cuIpcOpenMemHandle failure when using OpenMPI 1.8.5 with CUDA 7.0 and Multi-Process Service

Reply via email to