
I am upgrading from 1.4.1 to 1.4.2 on both a cluster with IB and one without.
I have no problem on the GE cluster without IB which requires no special 
options for the IB.  1.4.2 works perfectly there with both the latest Intel and 

On the IB system 1.4.1 has worked fine with the following configure line:

./configure CC=icc CXX=icpc F77=ifort FC=ifort --enable-openib-ibcm 
--with-openib --prefix=/share/apps/openmpi-intel/1.4.1 

I have now built 1.4.2. with the almost identical:

 $ ./configure CC=icc CXX=icpc F77=ifort FC=ifort --enable-openib-ibcm 
--with-openib --prefix=/share/apps/openmpi-intel/1.4.2 

When I run a basic MPI test program with:

/share/apps/openmpi-intel/1.4.2/bin/mpirun -np 16 -machinefile $PBS_NODEFILE 

which defaults to using the IB switch, or with:

/share/apps/openmpi-intel/1.4.2/bin/mpirun -mca btl tcp,self -np 16 
-machinefile $PBS_NODEFILE ./hello_mpi.exe

which forces the use of GE, I get the same error:

[compute-0-3:22515] *** Process received signal ***
[compute-0-3:22515] Signal: Segmentation fault (11)
[compute-0-3:22515] Signal code: Address not mapped (1)
[compute-0-3:22515] Failing at address: 0x3f
[compute-0-3:22515] [ 0] /lib64/ [0x3639e0e7c0]
[compute-0-3:22515] [ 1] 
[compute-0-3:22515] [ 2] 
[compute-0-3:22515] [ 3] 
/share/apps/openmpi-intel/1.4.2/lib/openmpi/ [0x2b7b546d868c]
[compute-0-3:22515] [ 4] 
[compute-0-3:22515] [ 5] 
/share/apps/openmpi-intel/1.4.2/lib/openmpi/ [0x2b7b546d791c]
[compute-0-3:22515] [ 6] /share/apps/openmpi-intel/1.4.2/bin/mpirun [0x404c27]
[compute-0-3:22515] [ 7] /share/apps/openmpi-intel/1.4.2/bin/mpirun [0x403e38]
[compute-0-3:22515] [ 8] /lib64/ [0x363961d994]
[compute-0-3:22515] [ 9] /share/apps/openmpi-intel/1.4.2/bin/mpirun [0x403d69]
[compute-0-3:22515] *** End of error message ***
/var/spool/PBS/mom_priv/jobs/ line 42: 22515 
Segmentation fault      /share/apps/openmpi-intel/1.4.2/bin/mpirun -mca btl 
tcp,self -np 16 -machinefile $PBS_NODEFILE ./hello_mpi.exe

When compiling with the PGI compiler suite I get the same result
although the traceback gives less detail.  I notice postings that suggest
the if I disable the memory-manager I might be able to get around
this problem, but that will result in a performance hit on this IB

Have others seen this?  Suggestions?


Richard Walsh

