I seem to recall that you have an IB-based cluster, right? >From a *very quick* glance at the code, it looks like this might be a simple >incorrect-finalization issue. That is:
- you run the job on a single server - openib disqualifies itself because you're running on a single server - openib then goes to finalize/close itself - but openib didn't fully initialize itself (because it disqualified itself early in the initialization process), and something in the finalization process didn't take that into account Nathan -- is that anywhere close to correct? On Jun 5, 2014, at 5:10 PM, "Fischer, Greg A." <fisch...@westinghouse.com> wrote: > OpenMPI Users, > > After encountering difficulty with the Intel compilers (see the “intermittent > segfaults with openib on ring_c.c” thread), I installed GCC-4.8.3 and > recompiled OpenMPI. I ran the simple examples (ring, etc.) with the openib > BTL in a typical BASH environment. Everything appeared to work fine, so I > went on my merry way compiling the rest of my dependencies. > > After getting my dependencies and applications compiled, I began observing > segfaults when submitting the applications through Torque. I recompiled > OpenMPI with debug options, ran “ring_c” over the openib BTL in an > interactive Torque session (“qsub –I”), and got the backtrace below. All > other system settings described in the previous thread are the same. Any > thoughts on how to resolve this issue? > > Core was generated by `ring_c'. > Program terminated with signal 6, Aborted. > #0 0x00007f7f5920ab55 in raise () from /lib64/libc.so.6 > (gdb) bt > #0 0x00007f7f5920ab55 in raise () from /lib64/libc.so.6 > #1 0x00007f7f5920c0c5 in abort () from /lib64/libc.so.6 > #2 0x00007f7f59203a10 in __assert_fail () from /lib64/libc.so.6 > #3 0x00007f7f548a484b in udcm_module_finalize (btl=0x716680, cpc=0x718c40) > at > ../../../../../openmpi-1.8.1/ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c:734 > #4 0x00007f7f548a3474 in udcm_component_query (btl=0x716680, cpc=0x717be8) > at > ../../../../../openmpi-1.8.1/ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c:476 > #5 0x00007f7f5489c316 in ompi_btl_openib_connect_base_select_for_local_port > (btl=0x716680) at > ../../../../../openmpi-1.8.1/ompi/mca/btl/openib/connect/btl_openib_connect_base.c:273 > #6 0x00007f7f54885817 in btl_openib_component_init > (num_btl_modules=0x7fff906aa420, enable_progress_threads=false, > enable_mpi_threads=false) > at > ../../../../../openmpi-1.8.1/ompi/mca/btl/openib/btl_openib_component.c:2703 > #7 0x00007f7f5982da5e in mca_btl_base_select (enable_progress_threads=false, > enable_mpi_threads=false) at > ../../../../openmpi-1.8.1/ompi/mca/btl/base/btl_base_select.c:108 > #8 0x00007f7f54ac7d42 in mca_bml_r2_component_init (priority=0x7fff906aa4f4, > enable_progress_threads=false, enable_mpi_threads=false) at > ../../../../../openmpi-1.8.1/ompi/mca/bml/r2/bml_r2_component.c:88 > #9 0x00007f7f5982cd1b in mca_bml_base_init (enable_progress_threads=false, > enable_mpi_threads=false) at > ../../../../openmpi-1.8.1/ompi/mca/bml/base/bml_base_init.c:69 > #10 0x00007f7f539ed739 in mca_pml_ob1_component_init > (priority=0x7fff906aa630, enable_progress_threads=false, > enable_mpi_threads=false) > at ../../../../../openmpi-1.8.1/ompi/mca/pml/ob1/pml_ob1_component.c:271 > #11 0x00007f7f598539b2 in mca_pml_base_select (enable_progress_threads=false, > enable_mpi_threads=false) at > ../../../../openmpi-1.8.1/ompi/mca/pml/base/pml_base_select.c:128 > #12 0x00007f7f597c033c in ompi_mpi_init (argc=1, argv=0x7fff906aa928, > requested=0, provided=0x7fff906aa7d8) at > ../../openmpi-1.8.1/ompi/runtime/ompi_mpi_init.c:604 > #13 0x00007f7f597f5386 in PMPI_Init (argc=0x7fff906aa82c, > argv=0x7fff906aa820) at pinit.c:84 > #14 0x000000000040096f in main (argc=1, argv=0x7fff906aa928) at ring_c.c:19 > > Greg > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/