Dear Rolf, thank for looking into this. Here is the complete backtrace for execution using 2 GPUs on the same node:
(cuda-gdb) bt #0 0x00007ffff711d885 in raise () from /lib64/libc.so.6 #1 0x00007ffff711f065 in abort () from /lib64/libc.so.6 #2 0x00007ffff0387b8d in psmi_errhandler_psm (ep=<value optimized out>, err=PSM_INTERNAL_ERR, error_string=<value optimized out>, token=<value optimized out>) at psm_error.c:76 #3 0x00007ffff0387df1 in psmi_handle_error (ep=0xfffffffffffffffe, error=PSM_INTERNAL_ERR, buf=<value optimized out>) at psm_error.c:154 #4 0x00007ffff0382f6a in psmi_am_mq_handler_rtsmatch (toki=0x7fffffffc6a0, args=0x7fffed0461d0, narg=<value optimized out>, buf=<value optimized out>, len=<value optimized out>) at ptl.c:200 #5 0x00007ffff037a832 in process_packet (ptl=0x737818, pkt=0x7fffed0461c0, isreq=<value optimized out>) at am_reqrep_shmem.c:2164 #6 0x00007ffff037d90f in amsh_poll_internal_inner (ptl=0x737818, replyonly=0) at am_reqrep_shmem.c:1756 #7 amsh_poll (ptl=0x737818, replyonly=0) at am_reqrep_shmem.c:1810 #8 0x00007ffff03a0329 in __psmi_poll_internal (ep=0x737538, poll_amsh=<value optimized out>) at psm.c:465 #9 0x00007ffff039f0af in psmi_mq_wait_inner (ireq=0x7fffffffc848) at psm_mq.c:299 #10 psmi_mq_wait_internal (ireq=0x7fffffffc848) at psm_mq.c:334 #11 0x00007ffff037db21 in amsh_mq_send_inner (ptl=0x737818, mq=<value optimized out>, epaddr=0x6eb418, flags=<value optimized out>, tag=844424930131968, ubuf=0x1308350000, len=32768) ---Type <return> to continue, or q <return> to quit--- at am_reqrep_shmem.c:2339 #12 amsh_mq_send (ptl=0x737818, mq=<value optimized out>, epaddr=0x6eb418, flags=<value optimized out>, tag=844424930131968, ubuf=0x1308350000, len=32768) at am_reqrep_shmem.c:2387 #13 0x00007ffff039ed71 in __psm_mq_send (mq=<value optimized out>, dest=<value optimized out>, flags=<value optimized out>, stag=<value optimized out>, buf=<value optimized out>, len=<value optimized out>) at psm_mq.c:413 #14 0x00007ffff05c4ea8 in ompi_mtl_psm_send () from /gpfslocal/pub/openmpi/1.7.3/lib/openmpi/mca_mtl_psm.so #15 0x00007ffff1eeddea in mca_pml_cm_send () from /gpfslocal/pub/openmpi/1.7.3/lib/openmpi/mca_pml_cm.so #16 0x00007ffff79253da in PMPI_Sendrecv () from /gpfslocal/pub/openmpi/1.7.3/lib/libmpi.so.1 #17 0x00000000004045ef in ExchangeHalos (cartComm=0x715460, devSend=0x1308350000, hostSend=0x7b8710, hostRecv=0x7c0720, devRecv=0x1308358000, neighbor=1, elemCount=4096) at CUDA_Aware_MPI.c:70 #18 0x00000000004033d8 in TransferAllHalos (cartComm=0x715460, domSize=0x7fffffffcd80, topIndex=0x7fffffffcd60, neighbors=0x7fffffffcd90, copyStream=0xaa4450, devBlocks=0x7fffffffcd30, devSideEdges=0x7fffffffcd20, devHaloLines=0x7fffffffcd10, hostSendLines=0x7fffffffcd00, hostRecvLines=0x7fffffffccf0) at Host.c:400 #19 0x000000000040363c in RunJacobi (cartComm=0x715460, rank=0, size=2, ---Type <return> to continue, or q <return> to quit--- domSize=0x7fffffffcd80, topIndex=0x7fffffffcd60, neighbors=0x7fffffffcd90, useFastSwap=0, devBlocks=0x7fffffffcd30, devSideEdges=0x7fffffffcd20, devHaloLines=0x7fffffffcd10, hostSendLines=0x7fffffffcd00, hostRecvLines=0x7fffffffccf0, devResidue=0x1310480000, copyStream=0xaa4450, iterations=0x7fffffffcd44, avgTransferTime=0x7fffffffcd48) at Host.c:466 #20 0x0000000000401ccb in main (argc=4, argv=0x7fffffffcea8) at Jacobi.c:60 Pierre. ________________________________ De : KESTENER Pierre Date d'envoi : mercredi 30 octobre 2013 16:34 À : us...@open-mpi.org Cc: KESTENER Pierre Objet : OpenMPI-1.7.3 - cuda support Hello, I'm having problems running a simple cuda-aware mpi application; the one found at https://github.com/parallel-forall/code-samples/tree/master/posts/cuda-aware-mpi-example I have modified symbol ENV_LOCAL_RANK into OMPI_COMM_WORLD_LOCAL_RANK My cluster has 2 K20m GPUs per node, with QLogic IB stack. The normal CUDA/MPI application works fine; but the cuda-ware mpi app is crashing when using 2 MPI proc over the 2 GPUs of the same node: the error message is: Assertion failure at ptl.c:200: nbytes == msglen I can send the complete backtrace from cuda-gdb if needed. The same app when running on 2 GPUs on 2 different nodes give another error: jacobi_cuda_aware_mpi:28280 terminated with signal 11 at PC=2aae9d7c9f78 SP=7fffc06c21f8. Backtrace: /gpfslocal/pub/local/lib64/libinfinipath.so.4(+0x8f78)[0x2aae9d7c9f78] Can someone give me hints where to look to track this problem ? Thank you. Pierre Kestener.