Hi, We are migrating to Open MPI 1.6 but since 1.6 dropped support for Myricom GM driver so we have to switch to the MX driver. We have the Myricom MX2G 1.2.16 driver installed. However upon testing the new build of Open MPI on a node without the actual Myrinet device, we are getting the following segmentation fault.
<----> [yqin@n0007.scs00 ~]$ mpirun -np 2 -np 2 osu_bw [n0007.scs00:03075] Error in mx_open_endpoint (error No MX device entry in /dev.) [n0007.scs00:03074] Error in mx_open_endpoint (error No MX device entry in /dev.) -------------------------------------------------------------------------- [[32626,1],0]: A high-performance Open MPI point-to-point messaging module was unable to find any relevant network interfaces: Module: Myrinet/MX Host: n0007.scs00 Another transport will be used instead, although this may result in lower performance. -------------------------------------------------------------------------- [n0007:03074] *** Process received signal *** [n0007:03074] Signal: Segmentation fault (11) [n0007:03074] Signal code: Invalid permissions (2) [n0007:03074] Failing at address: 0x2b9112128130 [n0007:03075] *** Process received signal *** [n0007:03075] Signal: Segmentation fault (11) [n0007:03075] Signal code: Invalid permissions (2) [n0007:03075] Failing at address: 0x2b041c9f1130 -------------------------------------------------------------------------- mpirun noticed that process rank 1 with PID 3075 on node n0007.scs00 exited on signal 11 (Segmentation fault). -------------------------------------------------------------------------- [n0007.scs00:03073] 1 more process has sent help message help-mpi-btl-base.txt / btl:no-nics [n0007.scs00:03073] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages <----> Excluding the MX BTL does not get anywhere further. <----> [yqin@n0007.scs00 ~]$ mpirun -np 2 -mca btl ^mx -np 2 osu_bw [n0007.scs00:03453] Error in mx_open_endpoint (error No MX device entry in /dev.) [n0007.scs00:03454] Error in mx_open_endpoint (error No MX device entry in /dev.) [n0007:03453] *** Process received signal *** [n0007:03453] Signal: Segmentation fault (11) [n0007:03453] Signal code: Address not mapped (1) [n0007:03453] Failing at address: 0x2b3c1fe73130 [n0007:03454] *** Process received signal *** [n0007:03454] Signal: Segmentation fault (11) [n0007:03454] Signal code: Address not mapped (1) [n0007:03454] Failing at address: 0x2b2431bf0130 -------------------------------------------------------------------------- mpirun noticed that process rank 1 with PID 3454 on node n0007.scs00 exited on signal 11 (Segmentation fault). -------------------------------------------------------------------------- <----> If we use only designated BTL such as SM and SELF, the binary runs but still getting segmentation fault towards the end. <----> [yqin@n0007.scs00 ~]$ mpirun -np 2 -mca btl sm,self -np 2 osu_bw [n0007.scs00:03460] Error in mx_open_endpoint (error No MX device entry in /dev.) [n0007.scs00:03461] Error in mx_open_endpoint (error No MX device entry in /dev.) # OSU MPI Bandwidth Test v3.3 # Size Bandwidth (MB/s) 1 2.54 2 5.22 4 10.92 8 21.61 16 43.89 32 62.19 64 121.95 128 212.28 256 337.52 512 516.67 1024 701.29 2048 845.69 4096 836.45 8192 934.31 16384 1035.53 32768 1186.90 65536 1390.41 131072 1519.14 262144 1562.96 524288 1596.78 1048576 1611.48 2097152 1616.09 4194304 1620.47 [n0007:03461] *** Process received signal *** [n0007:03460] *** Process received signal *** [n0007:03460] Signal: Segmentation fault (11) [n0007:03460] Signal code: Address not mapped (1) [n0007:03460] Failing at address: 0x2acac044d130 [n0007:03461] Signal: Segmentation fault (11) [n0007:03461] Signal code: Address not mapped (1) [n0007:03461] Failing at address: 0x2b8bc4121130 -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 3460 on node n0007.scs00 exited on signal 11 (Segmentation fault). -------------------------------------------------------------------------- <----> Can anybody shed some light here? It looks like ompi is trying to open the MX device no matter what. This is on a fresh build of Open MPI 1.6 with "--with-mx --with-openib" options. We didn't have such an issue with the old GM BTL. Thanks, Yong Qin