I am getting an error and crash when trying to use PRRTE to run a
containerized instance of OSU Micro-Benchmarks built against OpenMPI. The
same container works using PMI2 support in Slurm. Full details are
available at https://github.com/openpmix/prrte/issues/1635, but they
suggested I reach out to OMPI.

Error output follows. Can anyone point me in the right direction to
understand what I'm doing wrong?

$ prterun -n 2 --map-by=ppr:1:node --hostfile
~/janderson/workflows/util/prrte/hostfile.txt
./osu-micro-benchmarks.sif osu_init
--------------------------------------------------------------------------
Open MPI's OFI driver detected multiple equidistant NICs from the
current process,
but had insufficient information to ensure MPI processes fairly pick a
NIC for use.
This may negatively impact performance. A more modern PMIx server is
necessary to
resolve this issue.

Note: This message is displayed only when the OFI component's verbosity level is
1851085648 or higher.
--------------------------------------------------------------------------
c5.190935map_hfi_mem: mmap of rcvhdr_bufbase (0xdabbad00040b0000) size
262144 failed: Resource temporarily unavailable
c5.190935osu_init: An unrecoverable error occurred while communicating
with the driver
[c5:190935] *** Process received signal ***
[c5:190935] Signal: Aborted (6)
[c5:190935] Signal code:  (-6)
[c5:190935] [ 0] /lib64/libpthread.so.0(+0x12cf0)[0x7f8c6ec62cf0]
[c5:190935] [ 1] /lib64/libc.so.6(gsignal+0x10f)[0x7f8c6e8d9acf]
[c5:190935] [ 2] /lib64/libc.so.6(abort+0x127)[0x7f8c6e8acea5]
[c5:190935] [ 3]
/opt/software/linux-centos8-zen/gcc-8.5.0/opa-psm2-11.2.230-k66aykcpei5ijztxoafbzaqmplh3pu42/lib/libpsm2.so.2(+0x47804)[0x7f8c6c5af804]
[c5:190935] [ 4]
/opt/software/linux-centos8-zen/gcc-8.5.0/opa-psm2-11.2.230-k66aykcpei5ijztxoafbzaqmplh3pu42/lib/libpsm2.so.2(+0xde3e)[0x7f8c6c575e3e]
[c5:190935] [ 5]
/opt/software/linux-centos8-zen/gcc-8.5.0/opa-psm2-11.2.230-k66aykcpei5ijztxoafbzaqmplh3pu42/lib/libpsm2.so.2(+0xecdb)[0x7f8c6c576cdb]
[c5:190935] [ 6]
/opt/software/linux-centos8-zen/gcc-8.5.0/opa-psm2-11.2.230-k66aykcpei5ijztxoafbzaqmplh3pu42/lib/libpsm2.so.2(+0x11353)[0x7f8c6c579353]
[c5:190935] [ 7]
/opt/software/linux-centos8-zen/gcc-8.5.0/opa-psm2-11.2.230-k66aykcpei5ijztxoafbzaqmplh3pu42/lib/libpsm2.so.2(psm2_ep_open+0x209)[0x7f8c6c57aa49]
[c5:190935] [ 8]
/opt/software/linux-centos8-zen/gcc-8.5.0/libfabric-1.16.1-apf5ltuppxfa5sbg4vjtv7xv3gpj6gpj/lib/libfabric.so.1(+0x9cb14)[0x7f8c6dfdfb14]
[c5:190935] [ 9]
/opt/software/linux-centos8-zen/gcc-8.5.0/libfabric-1.16.1-apf5ltuppxfa5sbg4vjtv7xv3gpj6gpj/lib/libfabric.so.1(+0xa62be)[0x7f8c6dfe92be]
[c5:190935] [10]
/opt/software/linux-centos8-zen/gcc-8.5.0/openmpi-4.1.4-u2e2bpyhubhxg7tq5j3tctorf4ep4xiv/lib/libopen-pal.so.40(+0x8cd2d)[0x7f8c6e2d0d2d]
[c5:190935] [11]
/opt/software/linux-centos8-zen/gcc-8.5.0/openmpi-4.1.4-u2e2bpyhubhxg7tq5j3tctorf4ep4xiv/lib/libopen-pal.so.40(mca_btl_base_select+0xe3)[0x7f8c6e2c0b83]
[c5:190935] [12]
/opt/software/linux-centos8-zen/gcc-8.5.0/openmpi-4.1.4-u2e2bpyhubhxg7tq5j3tctorf4ep4xiv/lib/libmpi.so.40(mca_bml_r2_component_init+0x12)[0x7f8c6ef47f42]
[c5:190935] [13]
/opt/software/linux-centos8-zen/gcc-8.5.0/openmpi-4.1.4-u2e2bpyhubhxg7tq5j3tctorf4ep4xiv/lib/libmpi.so.40(mca_bml_base_init+0x94)[0x7f8c6ef46084]
[c5:190935] [14]
/opt/software/linux-centos8-zen/gcc-8.5.0/openmpi-4.1.4-u2e2bpyhubhxg7tq5j3tctorf4ep4xiv/lib/libmpi.so.40(ompi_mpi_init+0x64c)[0x7f8c6f1105cc]
[c5:190935] [15]
/opt/software/linux-centos8-zen/gcc-8.5.0/openmpi-4.1.4-u2e2bpyhubhxg7tq5j3tctorf4ep4xiv/lib/libmpi.so.40(MPI_Init+0x5e)[0x7f8c6ef1fa4e]
[c5:190935] [16]
/opt/view/libexec/osu-micro-benchmarks/mpi/startup/osu_init[0x4015be]
[c5:190935] [17] /lib64/libc.so.6(__libc_start_main+0xe5)[0x7f8c6e8c5d85]
[c5:190935] [18]
/opt/view/libexec/osu-micro-benchmarks/mpi/startup/osu_init[0x40176e]
[c5:190935] *** End of error message ***
--------------------------------------------------------------------------
Open MPI's OFI driver detected multiple equidistant NICs from the
current process,
but had insufficient information to ensure MPI processes fairly pick a
NIC for use.
This may negatively impact performance. A more modern PMIx server is
necessary to
resolve this issue.

Note: This message is displayed only when the OFI component's verbosity level is
-1891646640 or higher.
--------------------------------------------------------------------------
c6.191679map_hfi_mem: mmap of rcvhdr_bufbase (0xdabbad00040b0000) size
262144 failed: Resource temporarily unavailable
c6.191679osu_init: An unrecoverable error occurred while communicating
with the driver
[c6:191679] *** Process received signal ***
[c6:191679] Signal: Aborted (6)
[c6:191679] Signal code:  (-6)
[c6:191679] [ 0] /lib64/libpthread.so.0(+0x12cf0)[0x7f518fb09cf0]
[c6:191679] [ 1] /lib64/libc.so.6(gsignal+0x10f)[0x7f518f780acf]
[c6:191679] [ 2] /lib64/libc.so.6(abort+0x127)[0x7f518f753ea5]
[c6:191679] [ 3]
/opt/software/linux-centos8-zen/gcc-8.5.0/opa-psm2-11.2.230-k66aykcpei5ijztxoafbzaqmplh3pu42/lib/libpsm2.so.2(+0x47804)[0x7f518d456804]
[c6:191679] [ 4]
/opt/software/linux-centos8-zen/gcc-8.5.0/opa-psm2-11.2.230-k66aykcpei5ijztxoafbzaqmplh3pu42/lib/libpsm2.so.2(+0xde3e)[0x7f518d41ce3e]
[c6:191679] [ 5]
/opt/software/linux-centos8-zen/gcc-8.5.0/opa-psm2-11.2.230-k66aykcpei5ijztxoafbzaqmplh3pu42/lib/libpsm2.so.2(+0xecdb)[0x7f518d41dcdb]
[c6:191679] [ 6]
/opt/software/linux-centos8-zen/gcc-8.5.0/opa-psm2-11.2.230-k66aykcpei5ijztxoafbzaqmplh3pu42/lib/libpsm2.so.2(+0x11353)[0x7f518d420353]
[c6:191679] [ 7]
/opt/software/linux-centos8-zen/gcc-8.5.0/opa-psm2-11.2.230-k66aykcpei5ijztxoafbzaqmplh3pu42/lib/libpsm2.so.2(psm2_ep_open+0x209)[0x7f518d421a49]
[c6:191679] [ 8]
/opt/software/linux-centos8-zen/gcc-8.5.0/libfabric-1.16.1-apf5ltuppxfa5sbg4vjtv7xv3gpj6gpj/lib/libfabric.so.1(+0x9cb14)[0x7f518ee86b14]
[c6:191679] [ 9]
/opt/software/linux-centos8-zen/gcc-8.5.0/libfabric-1.16.1-apf5ltuppxfa5sbg4vjtv7xv3gpj6gpj/lib/libfabric.so.1(+0xa62be)[0x7f518ee902be]
[c6:191679] [10]
/opt/software/linux-centos8-zen/gcc-8.5.0/openmpi-4.1.4-u2e2bpyhubhxg7tq5j3tctorf4ep4xiv/lib/libopen-pal.so.40(+0x8cd2d)[0x7f518f177d2d]
[c6:191679] [11]
/opt/software/linux-centos8-zen/gcc-8.5.0/openmpi-4.1.4-u2e2bpyhubhxg7tq5j3tctorf4ep4xiv/lib/libopen-pal.so.40(mca_btl_base_select+0xe3)[0x7f518f167b83]
[c6:191679] [12]
/opt/software/linux-centos8-zen/gcc-8.5.0/openmpi-4.1.4-u2e2bpyhubhxg7tq5j3tctorf4ep4xiv/lib/libmpi.so.40(mca_bml_r2_component_init+0x12)[0x7f518fdeef42]
[c6:191679] [13]
/opt/software/linux-centos8-zen/gcc-8.5.0/openmpi-4.1.4-u2e2bpyhubhxg7tq5j3tctorf4ep4xiv/lib/libmpi.so.40(mca_bml_base_init+0x94)[0x7f518fded084]
[c6:191679] [14]
/opt/software/linux-centos8-zen/gcc-8.5.0/openmpi-4.1.4-u2e2bpyhubhxg7tq5j3tctorf4ep4xiv/lib/libmpi.so.40(ompi_mpi_init+0x64c)[0x7f518ffb75cc]
[c6:191679] [15]
/opt/software/linux-centos8-zen/gcc-8.5.0/openmpi-4.1.4-u2e2bpyhubhxg7tq5j3tctorf4ep4xiv/lib/libmpi.so.40(MPI_Init+0x5e)[0x7f518fdc6a4e]
[c6:191679] [16]
/opt/view/libexec/osu-micro-benchmarks/mpi/startup/osu_init[0x4015be]
[c6:191679] [17] /lib64/libc.so.6(__libc_start_main+0xe5)[0x7f518f76cd85]
[c6:191679] [18]
/opt/view/libexec/osu-micro-benchmarks/mpi/startup/osu_init[0x40176e]
[c6:191679] *** End of error message ***
--------------------------------------------------------------------------
prterun noticed that process rank 0 with PID 0 on node c5 exited on
signal 6 (Aborted).
--------------------------------------------------------------------------



-- 

Jonathon Anderson HPC Engineer, Sr.

E jander...@ciq.co  |  W ciq.co <http://www.ciq.co/>

C https://calendly.com/janderson-ciq

<http://www.ciq.co/>

Reply via email to