OpenMPI Users,

After encountering difficulty with the Intel compilers (see the "intermittent 
segfaults with openib on ring_c.c" thread), I installed GCC-4.8.3 and 
recompiled OpenMPI. I ran the simple examples (ring, etc.) with the openib BTL 
in a typical BASH environment. Everything appeared to work fine, so I went on 
my merry way compiling the rest of my dependencies.

After getting my dependencies and applications compiled, I began observing 
segfaults when submitting the applications through Torque. I recompiled OpenMPI 
with debug options, ran "ring_c" over the openib BTL in an interactive Torque 
session ("qsub -I"), and got the backtrace below. All other system settings 
described in the previous thread are the same. Any thoughts on how to resolve 
this issue?

Core was generated by `ring_c'.
Program terminated with signal 6, Aborted.
#0  0x00007f7f5920ab55 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007f7f5920ab55 in raise () from /lib64/libc.so.6
#1  0x00007f7f5920c0c5 in abort () from /lib64/libc.so.6
#2  0x00007f7f59203a10 in __assert_fail () from /lib64/libc.so.6
#3  0x00007f7f548a484b in udcm_module_finalize (btl=0x716680, cpc=0x718c40) at 
../../../../../openmpi-1.8.1/ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c:734
#4  0x00007f7f548a3474 in udcm_component_query (btl=0x716680, cpc=0x717be8) at 
../../../../../openmpi-1.8.1/ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c:476
#5  0x00007f7f5489c316 in ompi_btl_openib_connect_base_select_for_local_port 
(btl=0x716680) at 
../../../../../openmpi-1.8.1/ompi/mca/btl/openib/connect/btl_openib_connect_base.c:273
#6  0x00007f7f54885817 in btl_openib_component_init 
(num_btl_modules=0x7fff906aa420, enable_progress_threads=false, 
enable_mpi_threads=false)
    at 
../../../../../openmpi-1.8.1/ompi/mca/btl/openib/btl_openib_component.c:2703
#7  0x00007f7f5982da5e in mca_btl_base_select (enable_progress_threads=false, 
enable_mpi_threads=false) at 
../../../../openmpi-1.8.1/ompi/mca/btl/base/btl_base_select.c:108
#8  0x00007f7f54ac7d42 in mca_bml_r2_component_init (priority=0x7fff906aa4f4, 
enable_progress_threads=false, enable_mpi_threads=false) at 
../../../../../openmpi-1.8.1/ompi/mca/bml/r2/bml_r2_component.c:88
#9  0x00007f7f5982cd1b in mca_bml_base_init (enable_progress_threads=false, 
enable_mpi_threads=false) at 
../../../../openmpi-1.8.1/ompi/mca/bml/base/bml_base_init.c:69
#10 0x00007f7f539ed739 in mca_pml_ob1_component_init (priority=0x7fff906aa630, 
enable_progress_threads=false, enable_mpi_threads=false)
    at ../../../../../openmpi-1.8.1/ompi/mca/pml/ob1/pml_ob1_component.c:271
#11 0x00007f7f598539b2 in mca_pml_base_select (enable_progress_threads=false, 
enable_mpi_threads=false) at 
../../../../openmpi-1.8.1/ompi/mca/pml/base/pml_base_select.c:128
#12 0x00007f7f597c033c in ompi_mpi_init (argc=1, argv=0x7fff906aa928, 
requested=0, provided=0x7fff906aa7d8) at 
../../openmpi-1.8.1/ompi/runtime/ompi_mpi_init.c:604
#13 0x00007f7f597f5386 in PMPI_Init (argc=0x7fff906aa82c, argv=0x7fff906aa820) 
at pinit.c:84
#14 0x000000000040096f in main (argc=1, argv=0x7fff906aa928) at ring_c.c:19

Greg

Reply via email to