HI Adam,

As a sanity check, if you try to use --mca btl self,vader,tcp

do you still see the segmentation fault?

Howard


Am Mi., 20. Feb. 2019 um 08:50 Uhr schrieb Adam LeBlanc <
alebl...@iol.unh.edu>:

> Hello,
>
> When I do a run with OpenMPI v4.0.0 on Infiniband with this command:
> mpirun --mca btl_openib_warn_no_device_params_found 0 --map-by node --mca
> orte_base_help_aggregate 0 --mca btl openib,vader,self --mca pml ob1 --mca
> btl_openib_allow_ib 1 -np 6
>  -hostfile /home/aleblanc/ib-mpi-hosts IMB-MPI1
>
> I get this error:
>
> #----------------------------------------------------------------
> # Benchmarking Reduce_scatter
> # #processes = 4
> # ( 2 additional processes waiting in MPI_Barrier)
> #----------------------------------------------------------------
>        #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
>             0         1000         0.14         0.15         0.14
>             4         1000         5.00         7.58         6.28
>             8         1000         5.13         7.68         6.41
>            16         1000         5.05         7.74         6.39
>            32         1000         5.43         7.96         6.75
>            64         1000         6.78         8.56         7.69
>           128         1000         7.77         9.55         8.59
>           256         1000         8.28        10.96         9.66
>           512         1000         9.19        12.49        10.85
>          1024         1000        11.78        15.01        13.38
>          2048         1000        17.41        19.51        18.52
>          4096         1000        25.73        28.22        26.89
>          8192         1000        47.75        49.44        48.79
>         16384         1000        81.10        90.15        84.75
>         32768         1000       163.01       178.58       173.19
>         65536          640       315.63       340.51       333.18
>        131072          320       475.48       528.82       510.85
>        262144          160       979.70      1063.81      1035.61
>        524288           80      2070.51      2242.58      2150.15
>       1048576           40      4177.36      4527.25      4431.65
>       2097152           20      8738.08      9340.50      9147.89
> [pandora:04500] *** Process received signal ***
> [pandora:04500] Signal: Segmentation fault (11)
> [pandora:04500] Signal code: Address not mapped (1)
> [pandora:04500] Failing at address: 0x7f310ebffff0
> [pandora:04499] *** Process received signal ***
> [pandora:04499] Signal: Segmentation fault (11)
> [pandora:04499] Signal code: Address not mapped (1)
> [pandora:04499] Failing at address: 0x7f28b11ffff0
> [pandora:04500] [ 0] /usr/lib64/libpthread.so.0(+0xf680)[0x7f3126bef680]
> [pandora:04500] [ 1] /usr/lib64/libc.so.6(+0x14c4a0)[0x7f312695c4a0]
> [pandora:04500] [ 2]
> /opt/openmpi/4.0.0/lib/libopen-pal.so.40(+0x4be55)[0x7f312628be55]
> [pandora:04500] [ 3] [pandora:04499] [ 0]
> /opt/openmpi/4.0.0/lib/libmpi.so.40(ompi_coll_base_reduce_scatter_intra_ring+0x23b)[0x7f3126ea798b]
> [pandora:04500] [ 4] /usr/lib64/libpthread.so.0(+0xf680)[0x7f28c91ef680]
> [pandora:04499] [ 1]
> /opt/openmpi/4.0.0/lib/libmpi.so.40(PMPI_Reduce_scatter+0x1c7)[0x7f3126e7eda7]
> [pandora:04500] [ 5] IMB-MPI1[0x40b83b]
> [pandora:04500] [ 6] IMB-MPI1[0x407155]
> [pandora:04500] [ 7] IMB-MPI1[0x4022ea]
> [pandora:04500] [ 8] /usr/lib64/libc.so.6(+0x14c4a0)[0x7f28c8f5c4a0]
> [pandora:04499] [ 2]
> /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f31268323d5]
> [pandora:04500] [ 9] IMB-MPI1[0x401d49]
> [pandora:04500] *** End of error message ***
> /opt/openmpi/4.0.0/lib/libopen-pal.so.40(+0x4be55)[0x7f28c888be55]
> [pandora:04499] [ 3]
> /opt/openmpi/4.0.0/lib/libmpi.so.40(ompi_coll_base_reduce_scatter_intra_ring+0x23b)[0x7f28c94a798b]
> [pandora:04499] [ 4]
> /opt/openmpi/4.0.0/lib/libmpi.so.40(PMPI_Reduce_scatter+0x1c7)[0x7f28c947eda7]
> [pandora:04499] [ 5] IMB-MPI1[0x40b83b]
> [pandora:04499] [ 6] IMB-MPI1[0x407155]
> [pandora:04499] [ 7] IMB-MPI1[0x4022ea]
> [pandora:04499] [ 8]
> /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f28c8e323d5]
> [pandora:04499] [ 9] IMB-MPI1[0x401d49]
> [pandora:04499] *** End of error message ***
> [phoebe:03779] *** Process received signal ***
> [phoebe:03779] Signal: Segmentation fault (11)
> [phoebe:03779] Signal code: Address not mapped (1)
> [phoebe:03779] Failing at address: 0x7f483d6ffff0
> [phoebe:03779] [ 0] /usr/lib64/libpthread.so.0(+0xf680)[0x7f48556c7680]
> [phoebe:03779] [ 1] /usr/lib64/libc.so.6(+0x14c4a0)[0x7f48554344a0]
> [phoebe:03779] [ 2]
> /opt/openmpi/4.0.0/lib/libopen-pal.so.40(+0x4be55)[0x7f4854d63e55]
> [phoebe:03779] [ 3]
> /opt/openmpi/4.0.0/lib/libmpi.so.40(ompi_coll_base_reduce_scatter_intra_ring+0x23b)[0x7f485597f98b]
> [phoebe:03779] [ 4]
> /opt/openmpi/4.0.0/lib/libmpi.so.40(PMPI_Reduce_scatter+0x1c7)[0x7f4855956da7]
> [phoebe:03779] [ 5] IMB-MPI1[0x40b83b]
> [phoebe:03779] [ 6] IMB-MPI1[0x407155]
> [phoebe:03779] [ 7] IMB-MPI1[0x4022ea]
> [phoebe:03779] [ 8]
> /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f485530a3d5]
> [phoebe:03779] [ 9] IMB-MPI1[0x401d49]
> [phoebe:03779] *** End of error message ***
> --------------------------------------------------------------------------
> Primary job  terminated normally, but 1 process returned
> a non-zero exit code. Per user-direction, the job has been aborted.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun noticed that process rank 1 with PID 3779 on node phoebe-ib exited
> on signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
>
> Also if I reinstall 3.1.2 I do not have this issue at all.
>
> Any thoughts on what could be the issue?
>
> Thanks,
> Adam LeBlanc
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to