All,

I am upgrading from 1.4.1 to 1.4.2 on both a cluster with IB and one without.
I have no problem on the GE cluster without IB which requires no special 
configure
options for the IB.  1.4.2 works perfectly there with both the latest Intel and 
PGI
compiler.

On the IB system 1.4.1 has worked fine with the following configure line:

./configure CC=icc CXX=icpc F77=ifort FC=ifort --enable-openib-ibcm 
--with-openib --prefix=/share/apps/openmpi-intel/1.4.1 
--with-tm=/share/apps/pbs/10.1.0.91350

I have now built 1.4.2. with the almost identical:

 $ ./configure CC=icc CXX=icpc F77=ifort FC=ifort --enable-openib-ibcm 
--with-openib --prefix=/share/apps/openmpi-intel/1.4.2 
--with-tm=/share/apps/pbs/default

When I run a basic MPI test program with:

/share/apps/openmpi-intel/1.4.2/bin/mpirun -np 16 -machinefile $PBS_NODEFILE 
./hello_mpi.exe

which defaults to using the IB switch, or with:

/share/apps/openmpi-intel/1.4.2/bin/mpirun -mca btl tcp,self -np 16 
-machinefile $PBS_NODEFILE ./hello_mpi.exe

which forces the use of GE, I get the same error:

[compute-0-3:22515] *** Process received signal ***
[compute-0-3:22515] Signal: Segmentation fault (11)
[compute-0-3:22515] Signal code: Address not mapped (1)
[compute-0-3:22515] Failing at address: 0x3f
[compute-0-3:22515] [ 0] /lib64/libpthread.so.0 [0x3639e0e7c0]
[compute-0-3:22515] [ 1] 
/share/apps/openmpi-intel/1.4.2/lib/openmpi/mca_plm_tm.so(discui_+0x84) 
[0x2b7b546dd3d0]
[compute-0-3:22515] [ 2] 
/share/apps/openmpi-intel/1.4.2/lib/openmpi/mca_plm_tm.so(diswsi+0xc3) 
[0x2b7b546da9e3]
[compute-0-3:22515] [ 3] 
/share/apps/openmpi-intel/1.4.2/lib/openmpi/mca_plm_tm.so [0x2b7b546d868c]
[compute-0-3:22515] [ 4] 
/share/apps/openmpi-intel/1.4.2/lib/openmpi/mca_plm_tm.so(tm_init+0x1fe) 
[0x2b7b546d8978]
[compute-0-3:22515] [ 5] 
/share/apps/openmpi-intel/1.4.2/lib/openmpi/mca_plm_tm.so [0x2b7b546d791c]
[compute-0-3:22515] [ 6] /share/apps/openmpi-intel/1.4.2/bin/mpirun [0x404c27]
[compute-0-3:22515] [ 7] /share/apps/openmpi-intel/1.4.2/bin/mpirun [0x403e38]
[compute-0-3:22515] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x363961d994]
[compute-0-3:22515] [ 9] /share/apps/openmpi-intel/1.4.2/bin/mpirun [0x403d69]
[compute-0-3:22515] *** End of error message ***
/var/spool/PBS/mom_priv/jobs/9909.bob.csi.cuny.edu.SC: line 42: 22515 
Segmentation fault      /share/apps/openmpi-intel/1.4.2/bin/mpirun -mca btl 
tcp,self -np 16 -machinefile $PBS_NODEFILE ./hello_mpi.exe

When compiling with the PGI compiler suite I get the same result
although the traceback gives less detail.  I notice postings that suggest
the if I disable the memory-manager I might be able to get around
this problem, but that will result in a performance hit on this IB
system.

Have others seen this?  Suggestions?

Thanks,

Richard Walsh
CUNY HPC Center

   Richard Walsh
   Parallel Applications and Systems Manager
   CUNY HPC Center, Staten Island, NY
   718-982-3319
   612-382-4620

   Mighty the Wizard
   Who found me at sunrise
   Sleeping, and woke me
   And learn'd me Magic!

Think green before you print this email.

Reply via email to