Can you please try
mpirun --mca btl tcp,self ...
And if it works
mpirun --mca btl openib,self ...

Then can you try
mpirun --mca coll ^tuned --mca btl tcp,self ...

That will help figuring out whether the error is in the pml or the coll
framework/module

Cheers,

Gilles

On Thursday, March 23, 2017, Götz Waschk <goetz.was...@gmail.com> wrote:

> Hi Howard,
>
> I have attached my config.log file for version 2.1.0. I have based it
> on the OpenHPC package. Unfortunately, it still crashes with disabling
> the vader btl with this command line:
> mpirun --mca btl "^vader" IMB-MPI1
>
>
> [pax11-10:44753] *** Process received signal ***
> [pax11-10:44753] Signal: Bus error (7)
> [pax11-10:44753] Signal code: Non-existant physical address (2)
> [pax11-10:44753] Failing at address: 0x2b3989e27a00
> [pax11-10:44753] [ 0] /usr/lib64/libpthread.so.0(+0xf370)[0x2b3976f44370]
> [pax11-10:44753] [ 1]
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_btl_sm.
> so(+0x559a)[0x2b398545259a]
> [pax11-10:44753] [ 2]
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libopen-pal.so.20(
> opal_free_list_grow_st+0x1df)[0x2b39777bb78f]
> [pax11-10:44753] [ 3]
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_btl_sm.
> so(mca_btl_sm_sendi+0x272)[0x2b3985450562]
> [pax11-10:44753] [ 4]
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_pml_ob1.
> so(+0x8a3f)[0x2b3985d78a3f]
> [pax11-10:44753] [ 5]
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_pml_ob1.
> so(mca_pml_ob1_send+0x4a7)[0x2b3985d79ad7]
> [pax11-10:44753] [ 6]
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(ompi_
> coll_base_sendrecv_nonzero_actual+0x110)[0x2b3976cda620]
> [pax11-10:44753] [ 7]
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(ompi_
> coll_base_allreduce_intra_ring+0x860)[0x2b3976cdb8f0]
> [pax11-10:44753] [ 8]
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(PMPI_
> Allreduce+0x17b)[0x2b3976ca36ab]
> [pax11-10:44753] [ 9] IMB-MPI1[0x40b2ff]
> [pax11-10:44753] [10] IMB-MPI1[0x402646]
> [pax11-10:44753] [11]
> /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2b3977172b35]
> [pax11-10:44753] [12] IMB-MPI1[0x401f79]
> [pax11-10:44753] *** End of error message ***
> [pax11-10:44752] *** Process received signal ***
> [pax11-10:44752] Signal: Bus error (7)
> [pax11-10:44752] Signal code: Non-existant physical address (2)
> [pax11-10:44752] Failing at address: 0x2ab0d270d3e8
> [pax11-10:44752] [ 0] /usr/lib64/libpthread.so.0(+0xf370)[0x2ab0bf7ec370]
> [pax11-10:44752] [ 1]
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_
> allocator_bucket.so(mca_allocator_bucket_alloc_align+0x89)[0x2ab0c2eed1c9]
> [pax11-10:44752] [ 2]
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmca_common_sm.so.
> 20(+0x1495)[0x2ab0cde8d495]
> [pax11-10:44752] [ 3]
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libopen-pal.so.20(
> opal_free_list_grow_st+0x277)[0x2ab0c0063827]
> [pax11-10:44752] [ 4]
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_btl_sm.
> so(mca_btl_sm_sendi+0x272)[0x2ab0cdc87562]
> [pax11-10:44752] [ 5]
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_pml_ob1.
> so(+0x8a3f)[0x2ab0ce630a3f]
> [pax11-10:44752] [ 6]
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/openmpi/mca_pml_ob1.
> so(mca_pml_ob1_send+0x4a7)[0x2ab0ce631ad7]
> [pax11-10:44752] [ 7]
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(ompi_
> coll_base_sendrecv_nonzero_actual+0x110)[0x2ab0bf582620]
> [pax11-10:44752] [ 8]
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(ompi_
> coll_base_allreduce_intra_ring+0x860)[0x2ab0bf5838f0]
> [pax11-10:44752] [ 9]
> /opt/ohpc/pub/mpi/openmpi-gnu/2.1.0/lib/libmpi.so.20(PMPI_
> Allreduce+0x17b)[0x2ab0bf54b6ab]
> [pax11-10:44752] [10] IMB-MPI1[0x40b2ff]
> [pax11-10:44752] [11] IMB-MPI1[0x402646]
> [pax11-10:44752] [12]
> /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2ab0bfa1ab35]
> [pax11-10:44752] [13] IMB-MPI1[0x401f79]
> [pax11-10:44752] *** End of error message ***
> --------------------------------------------------------------------------
> mpirun noticed that process rank 340 with PID 44753 on node pax11-10
> exited on signal 7 (Bus error).
> --------------------------------------------------------------------------
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to