Hi Open MPI Users, Just wanted to report a bug we have seen with OpenMPI 3.1.3 and 4.0.0 when using the Intel 2019 Update 1 compilers on our Skylake/OmniPath-1 cluster. The bug occurs when running the Github master src_c variant of the Intel MPI Benchmarks.
Configuration: ./configure --prefix=/home/projects/x86-64-skylake/openmpi/3.1.3/intel/19.1.144 --with-slurm --with-psm2 CC=/home/projects/x86-64/intel/compilers/2019/compilers_and_libraries_2019.1.144/linux/bin/intel64/icc CXX=/home/projects/x86-64/intel/compilers/2019/compilers_and_libraries_2019.1.144/linux/bin/intel64/icpc FC=/home/projects/x86-64/intel/compilers/2019/compilers_and_libraries_2019.1.144/linux/bin/intel64/ifort --with-zlib=/home/projects/x86-64/zlib/1.2.11 --with-valgrind=/home/projects/x86-64/valgrind/3.13.0 Operating System is RedHat 7.4 release and we utilize a local build of GCC 7.2.0 for our Intel compiler (C++) header files. Everything makes correctly, and passes a make check without any issues. We then compile IMB and run IMB-MPI1 on 24 nodes and get the following: #---------------------------------------------------------------- # Benchmarking Reduce_scatter # #processes = 64 # ( 1088 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.18 0.19 0.18 4 1000 7.39 10.37 8.68 8 1000 7.84 11.14 9.23 16 1000 8.50 12.37 10.14 32 1000 10.37 14.66 12.15 64 1000 13.76 18.82 16.17 128 1000 21.63 27.61 24.87 256 1000 39.98 47.27 43.96 512 1000 72.93 78.59 75.15 1024 1000 147.21 152.98 149.94 2048 1000 413.41 426.90 420.15 4096 1000 421.28 442.58 434.52 8192 1000 418.31 450.20 438.51 16384 1000 1082.85 1221.44 1140.92 32768 1000 2434.11 2529.90 2476.72 65536 640 5469.57 6048.60 5687.08 131072 320 11702.94 12435.06 12075.07 262144 160 19214.42 20433.83 19883.80 524288 80 49462.22 53896.43 52101.56 1048576 40 119422.53 131922.20 126920.99 2097152 20 256345.97 288185.72 275767.05 [node06:351648] *** Process received signal *** [node06:351648] Signal: Segmentation fault (11) [node06:351648] Signal code: Invalid permissions (2) [node06:351648] Failing at address: 0x7fdb6efc4000 [node06:351648] [ 0] /lib64/libpthread.so.0(+0xf5e0)[0x7fdb8646c5e0] [node06:351648] [ 1] ./IMB-MPI1(__intel_avx_rep_memcpy+0x140)[0x415380] [node06:351648] [ 2] /home/projects/x86-64-skylake/openmpi/3.1.3/intel/19.1.144/lib/libopen-pal.so.40(opal_datatype_copy_content_same_ddt+0xca)[0x7fdb858d847a] [node06:351648] [ 3] /home/projects/x86-64-skylake/openmpi/3.1.3/intel/19.1.144/lib/libmpi.so.40(ompi_coll_base_reduce_scatter_intra_ring+0x3f9)[0x7fdb86c43b29] [node06:351648] [ 4] /home/projects/x86-64-skylake/openmpi/3.1.3/intel/19.1.144/lib/libmpi.so.40(PMPI_Reduce_scatter+0x1d7)[0x7fdb86c1de67] [node06:351648] [ 5] ./IMB-MPI1[0x40d624] [node06:351648] [ 6] ./IMB-MPI1[0x407d16] [node06:351648] [ 7] ./IMB-MPI1[0x403356] [node06:351648] [ 8] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fdb860bbc05] [node06:351648] [ 9] ./IMB-MPI1[0x402da9] [node06:351648] *** End of error message *** [node06:351649] *** Process received signal *** [node06:351649] Signal: Segmentation fault (11) [node06:351649] Signal code: Invalid permissions (2) [node06:351649] Failing at address: 0x7f9b19c6f000 [node06:351649] [ 0] /lib64/libpthread.so.0(+0xf5e0)[0x7f9b311295e0] [node06:351649] [ 1] ./IMB-MPI1(__intel_avx_rep_memcpy+0x140)[0x415380] [node06:351649] [ 2] /home/projects/x86-64-skylake/openmpi/3.1.3/intel/19.1.144/lib/libopen-pal.so.40(opal_datatype_copy_content_same_ddt+0xca)[0x7f9b3059547a] [node06:351649] [ 3] /home/projects/x86-64-skylake/openmpi/3.1.3/intel/19.1.144/lib/libmpi.so.40(ompi_coll_base_reduce_scatter_intra_ring+0x3f9)[0x7f9b31900b29] [node06:351649] [ 4] /home/projects/x86-64-skylake/openmpi/3.1.3/intel/19.1.144/lib/libmpi.so.40(PMPI_Reduce_scatter+0x1d7)[0x7f9b318dae67] [node06:351649] [ 5] ./IMB-MPI1[0x40d624] [node06:351649] [ 6] ./IMB-MPI1[0x407d16] [node06:351649] [node06:351657] *** Process received signal *** -- Si Hammond Scalable Computer Architectures Sandia National Laboratories, NM, USA _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users