I am getting an error and crash when trying to use PRRTE to run a containerized instance of OSU Micro-Benchmarks built against OpenMPI. The same container works using PMI2 support in Slurm. Full details are available at https://github.com/openpmix/prrte/issues/1635, but they suggested I reach out to OMPI.
Error output follows. Can anyone point me in the right direction to understand what I'm doing wrong? $ prterun -n 2 --map-by=ppr:1:node --hostfile ~/janderson/workflows/util/prrte/hostfile.txt ./osu-micro-benchmarks.sif osu_init -------------------------------------------------------------------------- Open MPI's OFI driver detected multiple equidistant NICs from the current process, but had insufficient information to ensure MPI processes fairly pick a NIC for use. This may negatively impact performance. A more modern PMIx server is necessary to resolve this issue. Note: This message is displayed only when the OFI component's verbosity level is 1851085648 or higher. -------------------------------------------------------------------------- c5.190935map_hfi_mem: mmap of rcvhdr_bufbase (0xdabbad00040b0000) size 262144 failed: Resource temporarily unavailable c5.190935osu_init: An unrecoverable error occurred while communicating with the driver [c5:190935] *** Process received signal *** [c5:190935] Signal: Aborted (6) [c5:190935] Signal code: (-6) [c5:190935] [ 0] /lib64/libpthread.so.0(+0x12cf0)[0x7f8c6ec62cf0] [c5:190935] [ 1] /lib64/libc.so.6(gsignal+0x10f)[0x7f8c6e8d9acf] [c5:190935] [ 2] /lib64/libc.so.6(abort+0x127)[0x7f8c6e8acea5] [c5:190935] [ 3] /opt/software/linux-centos8-zen/gcc-8.5.0/opa-psm2-11.2.230-k66aykcpei5ijztxoafbzaqmplh3pu42/lib/libpsm2.so.2(+0x47804)[0x7f8c6c5af804] [c5:190935] [ 4] /opt/software/linux-centos8-zen/gcc-8.5.0/opa-psm2-11.2.230-k66aykcpei5ijztxoafbzaqmplh3pu42/lib/libpsm2.so.2(+0xde3e)[0x7f8c6c575e3e] [c5:190935] [ 5] /opt/software/linux-centos8-zen/gcc-8.5.0/opa-psm2-11.2.230-k66aykcpei5ijztxoafbzaqmplh3pu42/lib/libpsm2.so.2(+0xecdb)[0x7f8c6c576cdb] [c5:190935] [ 6] /opt/software/linux-centos8-zen/gcc-8.5.0/opa-psm2-11.2.230-k66aykcpei5ijztxoafbzaqmplh3pu42/lib/libpsm2.so.2(+0x11353)[0x7f8c6c579353] [c5:190935] [ 7] /opt/software/linux-centos8-zen/gcc-8.5.0/opa-psm2-11.2.230-k66aykcpei5ijztxoafbzaqmplh3pu42/lib/libpsm2.so.2(psm2_ep_open+0x209)[0x7f8c6c57aa49] [c5:190935] [ 8] /opt/software/linux-centos8-zen/gcc-8.5.0/libfabric-1.16.1-apf5ltuppxfa5sbg4vjtv7xv3gpj6gpj/lib/libfabric.so.1(+0x9cb14)[0x7f8c6dfdfb14] [c5:190935] [ 9] /opt/software/linux-centos8-zen/gcc-8.5.0/libfabric-1.16.1-apf5ltuppxfa5sbg4vjtv7xv3gpj6gpj/lib/libfabric.so.1(+0xa62be)[0x7f8c6dfe92be] [c5:190935] [10] /opt/software/linux-centos8-zen/gcc-8.5.0/openmpi-4.1.4-u2e2bpyhubhxg7tq5j3tctorf4ep4xiv/lib/libopen-pal.so.40(+0x8cd2d)[0x7f8c6e2d0d2d] [c5:190935] [11] /opt/software/linux-centos8-zen/gcc-8.5.0/openmpi-4.1.4-u2e2bpyhubhxg7tq5j3tctorf4ep4xiv/lib/libopen-pal.so.40(mca_btl_base_select+0xe3)[0x7f8c6e2c0b83] [c5:190935] [12] /opt/software/linux-centos8-zen/gcc-8.5.0/openmpi-4.1.4-u2e2bpyhubhxg7tq5j3tctorf4ep4xiv/lib/libmpi.so.40(mca_bml_r2_component_init+0x12)[0x7f8c6ef47f42] [c5:190935] [13] /opt/software/linux-centos8-zen/gcc-8.5.0/openmpi-4.1.4-u2e2bpyhubhxg7tq5j3tctorf4ep4xiv/lib/libmpi.so.40(mca_bml_base_init+0x94)[0x7f8c6ef46084] [c5:190935] [14] /opt/software/linux-centos8-zen/gcc-8.5.0/openmpi-4.1.4-u2e2bpyhubhxg7tq5j3tctorf4ep4xiv/lib/libmpi.so.40(ompi_mpi_init+0x64c)[0x7f8c6f1105cc] [c5:190935] [15] /opt/software/linux-centos8-zen/gcc-8.5.0/openmpi-4.1.4-u2e2bpyhubhxg7tq5j3tctorf4ep4xiv/lib/libmpi.so.40(MPI_Init+0x5e)[0x7f8c6ef1fa4e] [c5:190935] [16] /opt/view/libexec/osu-micro-benchmarks/mpi/startup/osu_init[0x4015be] [c5:190935] [17] /lib64/libc.so.6(__libc_start_main+0xe5)[0x7f8c6e8c5d85] [c5:190935] [18] /opt/view/libexec/osu-micro-benchmarks/mpi/startup/osu_init[0x40176e] [c5:190935] *** End of error message *** -------------------------------------------------------------------------- Open MPI's OFI driver detected multiple equidistant NICs from the current process, but had insufficient information to ensure MPI processes fairly pick a NIC for use. This may negatively impact performance. A more modern PMIx server is necessary to resolve this issue. Note: This message is displayed only when the OFI component's verbosity level is -1891646640 or higher. -------------------------------------------------------------------------- c6.191679map_hfi_mem: mmap of rcvhdr_bufbase (0xdabbad00040b0000) size 262144 failed: Resource temporarily unavailable c6.191679osu_init: An unrecoverable error occurred while communicating with the driver [c6:191679] *** Process received signal *** [c6:191679] Signal: Aborted (6) [c6:191679] Signal code: (-6) [c6:191679] [ 0] /lib64/libpthread.so.0(+0x12cf0)[0x7f518fb09cf0] [c6:191679] [ 1] /lib64/libc.so.6(gsignal+0x10f)[0x7f518f780acf] [c6:191679] [ 2] /lib64/libc.so.6(abort+0x127)[0x7f518f753ea5] [c6:191679] [ 3] /opt/software/linux-centos8-zen/gcc-8.5.0/opa-psm2-11.2.230-k66aykcpei5ijztxoafbzaqmplh3pu42/lib/libpsm2.so.2(+0x47804)[0x7f518d456804] [c6:191679] [ 4] /opt/software/linux-centos8-zen/gcc-8.5.0/opa-psm2-11.2.230-k66aykcpei5ijztxoafbzaqmplh3pu42/lib/libpsm2.so.2(+0xde3e)[0x7f518d41ce3e] [c6:191679] [ 5] /opt/software/linux-centos8-zen/gcc-8.5.0/opa-psm2-11.2.230-k66aykcpei5ijztxoafbzaqmplh3pu42/lib/libpsm2.so.2(+0xecdb)[0x7f518d41dcdb] [c6:191679] [ 6] /opt/software/linux-centos8-zen/gcc-8.5.0/opa-psm2-11.2.230-k66aykcpei5ijztxoafbzaqmplh3pu42/lib/libpsm2.so.2(+0x11353)[0x7f518d420353] [c6:191679] [ 7] /opt/software/linux-centos8-zen/gcc-8.5.0/opa-psm2-11.2.230-k66aykcpei5ijztxoafbzaqmplh3pu42/lib/libpsm2.so.2(psm2_ep_open+0x209)[0x7f518d421a49] [c6:191679] [ 8] /opt/software/linux-centos8-zen/gcc-8.5.0/libfabric-1.16.1-apf5ltuppxfa5sbg4vjtv7xv3gpj6gpj/lib/libfabric.so.1(+0x9cb14)[0x7f518ee86b14] [c6:191679] [ 9] /opt/software/linux-centos8-zen/gcc-8.5.0/libfabric-1.16.1-apf5ltuppxfa5sbg4vjtv7xv3gpj6gpj/lib/libfabric.so.1(+0xa62be)[0x7f518ee902be] [c6:191679] [10] /opt/software/linux-centos8-zen/gcc-8.5.0/openmpi-4.1.4-u2e2bpyhubhxg7tq5j3tctorf4ep4xiv/lib/libopen-pal.so.40(+0x8cd2d)[0x7f518f177d2d] [c6:191679] [11] /opt/software/linux-centos8-zen/gcc-8.5.0/openmpi-4.1.4-u2e2bpyhubhxg7tq5j3tctorf4ep4xiv/lib/libopen-pal.so.40(mca_btl_base_select+0xe3)[0x7f518f167b83] [c6:191679] [12] /opt/software/linux-centos8-zen/gcc-8.5.0/openmpi-4.1.4-u2e2bpyhubhxg7tq5j3tctorf4ep4xiv/lib/libmpi.so.40(mca_bml_r2_component_init+0x12)[0x7f518fdeef42] [c6:191679] [13] /opt/software/linux-centos8-zen/gcc-8.5.0/openmpi-4.1.4-u2e2bpyhubhxg7tq5j3tctorf4ep4xiv/lib/libmpi.so.40(mca_bml_base_init+0x94)[0x7f518fded084] [c6:191679] [14] /opt/software/linux-centos8-zen/gcc-8.5.0/openmpi-4.1.4-u2e2bpyhubhxg7tq5j3tctorf4ep4xiv/lib/libmpi.so.40(ompi_mpi_init+0x64c)[0x7f518ffb75cc] [c6:191679] [15] /opt/software/linux-centos8-zen/gcc-8.5.0/openmpi-4.1.4-u2e2bpyhubhxg7tq5j3tctorf4ep4xiv/lib/libmpi.so.40(MPI_Init+0x5e)[0x7f518fdc6a4e] [c6:191679] [16] /opt/view/libexec/osu-micro-benchmarks/mpi/startup/osu_init[0x4015be] [c6:191679] [17] /lib64/libc.so.6(__libc_start_main+0xe5)[0x7f518f76cd85] [c6:191679] [18] /opt/view/libexec/osu-micro-benchmarks/mpi/startup/osu_init[0x40176e] [c6:191679] *** End of error message *** -------------------------------------------------------------------------- prterun noticed that process rank 0 with PID 0 on node c5 exited on signal 6 (Aborted). -------------------------------------------------------------------------- -- Jonathon Anderson HPC Engineer, Sr. E jander...@ciq.co | W ciq.co <http://www.ciq.co/> C https://calendly.com/janderson-ciq <http://www.ciq.co/>