Hi Howard, I had tried to send config.log of my 2.1.0 build, but I guess it was too big for the list. I'm trying again with a compressed file. I have based it on the OpenHPC package. Unfortunately, it still crashes with disabling the vader btl with this command line: mpirun --mca btl "^vader" IMB-MPI1
[pax11-10:44753] *** Process received signal *** [pax11-10:44753] Signal: Bus error (7) [pax11-10:44753] Signal code: Non-existant physical address (2) [pax11-10:44753] Failing at address: 0x2b3989e27a00 [pax11-10:44753] [ 0] /usr/lib64/libpthread.so.0(+0xf370)[0x2b3976f44370] [pax11-10:44753] [ 1] /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_btl_sm.so(+0x559a)[0x2b398545259a] [pax11-10:44753] [ 2] /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libopen-pal.so.20(opal_free_list_grow_st+0x1df)[0x2b39777bb78f] [pax11-10:44753] [ 3] /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_btl_sm.so(mca_btl_sm_sendi+0x272)[0x2b3985450562] [pax11-10:44753] [ 4] /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_pml_ob1.so(+0x8a3f)[0x2b3985d78a3f] [pax11-10:44753] [ 5] /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send+0x4a7)[0x2b3985d79ad7] [pax11-10:44753] [ 6] /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(ompi_coll_base_sendrecv_nonzero_actual+0x110)[0x2b3976cda620] [pax11-10:44753] [ 7] /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(ompi_coll_base_allreduce_intra_ring+0x860)[0x2b3976cdb8f0] [pax11-10:44753] [ 8] /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(PMPI_Allreduce+0x17b)[0x2b3976ca36ab] [pax11-10:44753] [ 9] IMB-MPI1[0x40b2ff] [pax11-10:44753] [10] IMB-MPI1[0x402646] [pax11-10:44753] [11] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2b3977172b35] [pax11-10:44753] [12] IMB-MPI1[0x401f79] [pax11-10:44753] *** End of error message *** [pax11-10:44752] *** Process received signal *** [pax11-10:44752] Signal: Bus error (7) [pax11-10:44752] Signal code: Non-existant physical address (2) [pax11-10:44752] Failing at address: 0x2ab0d270d3e8 [pax11-10:44752] [ 0] /usr/lib64/libpthread.so.0(+0xf370)[0x2ab0bf7ec370] [pax11-10:44752] [ 1] /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_allocator_bucket.so(mca_allocator_bucket_alloc_align+0x89)[0x2ab0c2eed1c9] [pax11-10:44752] [ 2] /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmca_common_sm.so.20(+0x1495)[0x2ab0cde8d495] [pax11-10:44752] [ 3] /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libopen-pal.so.20(opal_free_list_grow_st+0x277)[0x2ab0c0063827] [pax11-10:44752] [ 4] /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_btl_sm.so(mca_btl_sm_sendi+0x272)[0x2ab0cdc87562] [pax11-10:44752] [ 5] /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_pml_ob1.so(+0x8a3f)[0x2ab0ce630a3f] [pax11-10:44752] [ 6] /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send+0x4a7)[0x2ab0ce631ad7] [pax11-10:44752] [ 7] /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(ompi_coll_base_sendrecv_nonzero_actual+0x110)[0x2ab0bf582620] [pax11-10:44752] [ 8] /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(ompi_coll_base_allreduce_intra_ring+0x860)[0x2ab0bf5838f0] [pax11-10:44752] [ 9] /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(PMPI_Allreduce+0x17b)[0x2ab0bf54b6ab] [pax11-10:44752] [10] IMB-MPI1[0x40b2ff] [pax11-10:44752] [11] IMB-MPI1[0x402646] [pax11-10:44752] [12] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2ab0bfa1ab35] [pax11-10:44752] [13] IMB-MPI1[0x401f79] [pax11-10:44752] *** End of error message *** -------------------------------------------------------------------------- mpirun noticed that process rank 340 with PID 44753 on node pax11-10
config.log.xz
Description: application/xz
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users