Received from Lev Givon on Thu, May 21, 2015 at 11:32:33AM EDT: > Received from Rolf vandeVaart on Wed, May 20, 2015 at 07:48:15AM EDT: > > (snip) > > > I see that you mentioned you are starting 4 MPS daemons. Are you following > > the instructions here? > > > > http://cudamusing.blogspot.de/2013/07/enabling-cuda-multi-process-service-mps.html > > > > Yes - also > https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf > > > This relies on setting CUDA_VISIBLE_DEVICES which can cause problems for > > CUDA > > IPC. Since you are using CUDA 7 there is no more need to start multiple > > daemons. You simply leave CUDA_VISIBLE_DEVICES untouched and start a single > > MPS control daemon which will handle all GPUs. Can you try that? > > I assume that this means that only one CUDA_MPS_PIPE_DIRECTORY value should be > passed to all MPI processes. > > Several questions related to your comment above: > > - Should the MPI processes select and initialize the GPUs they respectively > need > to access as they normally would when MPS is not in use? > - Can CUDA_VISIBLE_DEVICES be used to control what GPUs are visible to MPS > (and > hence the client processes)? I ask because SLURM uses CUDA_VISIBLE_DEVICES > to > control GPU resource allocation, and I would like to run my program (and the > MPS control daemon) on a cluster via SLURM. > - Does the clash between setting CUDA_VISIBLE_DEVICES and CUDA IPC imply that > MPS and CUDA IPC cannot reliably be used simultaneously in a multi-GPU > setting > with CUDA 6.5 even when one starts multiple MPS control daemons as described > in the aforementioned blog post?
Using a single control daemon with CUDA_VISIBLE_DEVICES unset appears to solve the problem when IPC is enabled. -- Lev Givon Bionet Group | Neurokernel Project http://www.columbia.edu/~lev/ http://lebedov.github.io/ http://neurokernel.github.io/