Hi!

In my current application, MPI_Send/MPI_Recv hangs when using buffers in GPU device memory of a Nvidia GPU. I realized this is due to the fact that OpenMPI uses the synchronous cuMempcy rather than the asynchornous cuMemcpyAsync (see stacktrace at the bottom). However, in my application, synchronous copies cannot be used.

I scanned through the source and saw support for async memcpy's are available. It's controlled by 'mca_common_cuda_cumemcpy_async' in
./ompi/mca/common/cuda/common_cuda.c
However, I can't find a way to enable it. It's not exposed in 'ompi_info' (but registered?). How can I enforce the use of cuMemcpyAsync in OpenMPI? Version used is OpenMPI 1.8.5.

Thank you,
Jeremia

(gdb) bt
#0  0x00002aaaaaaaba11 in clock_gettime ()
#1  0x00000039e5803e46 in clock_gettime () from /lib64/librt.so.1
#2  0x00002aaaab58a7ae in ?? () from /usr/lib64/libcuda.so.1
#3  0x00002aaaaaf41dfb in ?? () from /usr/lib64/libcuda.so.1
#4  0x00002aaaaaf1f623 in ?? () from /usr/lib64/libcuda.so.1
#5  0x00002aaaaaf17361 in ?? () from /usr/lib64/libcuda.so.1
#6  0x00002aaaaaf180b6 in ?? () from /usr/lib64/libcuda.so.1
#7  0x00002aaaaae860c2 in ?? () from /usr/lib64/libcuda.so.1
#8  0x00002aaaaae8621a in ?? () from /usr/lib64/libcuda.so.1
#9  0x00002aaaaae69d85 in cuMemcpy () from /usr/lib64/libcuda.so.1
#10 0x00002aaaaf0a7dea in mca_common_cuda_cu_memcpy () from /home/jbaer/local_root/opt/openmpi_from_src_1.8.5/lib/libmca_common_cuda.so.1 #11 0x00002aaaac992544 in opal_cuda_memcpy () from /home/jbaer/local_root/opt/openmpi_from_src_1.8.5/lib/libopen-pal.so.6 #12 0x00002aaaac98adf7 in opal_convertor_pack () from /home/jbaer/local_root/opt/openmpi_from_src_1.8.5/lib/libopen-pal.so.6 #13 0x00002aaab167c611 in mca_pml_ob1_send_request_start_copy () from /home/jbaer/local_root/opt/openmpi_from_src_1.8.5/lib/openmpi/mca_pml_ob1.so #14 0x00002aaab167353f in mca_pml_ob1_send () from /home/jbaer/local_root/opt/openmpi_from_src_1.8.5/lib/openmpi/mca_pml_ob1.so #15 0x00002aaaabf4f322 in PMPI_Send () from /users/jbaer/local_root/opt/openmpi_from_src_1.8.5/lib/libmpi.so.1

Reply via email to