Hi Doug
wow, looks like some messages are getting lost (or even delivered to the wrong peer on the same node.. ) Could you also try with:

-mca coll_base_verbose 1 -mca coll_tuned_use_dynamic_rules 1 -mca coll_tuned_bcast_algorithm <1,2,3,4,5,6>

The values 1-6 control which topology/aglorithm are used internally..

Once we figure out which topo/sequence causes this we can look to see if its a collective issue or a btl, bml, pml issue.

thanks
G
On Thu, 29 Jun 2006, Doug Gregor wrote:

I am running into a problem with a simple program (which performs several MPI_Bcast operations) hanging. Most processes hang in MPI_Finalize, the others hang in MPI_Bcast. Interestingly enough, this only happens when I oversubscribe the nodes. For instance, using IU's Odin cluster, I take 4 nodes (each has two Opteron processors) and run 9 processes:

        mpirun -np 9 ./a.out

The backtrace from 7/9 of the processes shows that they're in MPI_Finalize:

#0  0x0000003d1b92e813 in sigprocmask () from /lib64/tls/libc.so.6
#1  0x0000002a9598f55f in poll_dispatch ()
  from /san/mpi/openmpi-1.1-gcc/lib/libopal.so.0
#2  0x0000002a9598e3f3 in opal_event_loop ()
  from /san/mpi/openmpi-1.1-gcc/lib/libopal.so.0
#3  0x0000002a960487c4 in mca_oob_tcp_msg_wait ()
  from /san/mpi/openmpi-1.1-gcc/lib/openmpi/mca_oob_tcp.so
#4  0x0000002a9604ca13 in mca_oob_tcp_recv ()
  from /san/mpi/openmpi-1.1-gcc/lib/openmpi/mca_oob_tcp.so
#5  0x0000002a9585d833 in mca_oob_recv_packed ()
  from /san/mpi/openmpi-1.1-gcc/lib/liborte.so.0
#6  0x0000002a9585dd37 in mca_oob_xcast ()
  from /san/mpi/openmpi-1.1-gcc/lib/liborte.so.0
#7  0x0000002a956cbfb0 in ompi_mpi_finalize ()
  from /san/mpi/openmpi-1.1-gcc/lib/libmpi.so.0
#8  0x000000000040bd3e in main ()

The other two processes are in MPI_Bcast:

#0  0x0000002a97c2cbe3 in mca_btl_mvapi_component_progress ()
  from /san/mpi/openmpi-1.1-gcc/lib/openmpi/mca_btl_mvapi.so
#1  0x0000002a97b21072 in mca_bml_r2_progress ()
  from /san/mpi/openmpi-1.1-gcc/lib/openmpi/mca_bml_r2.so
#2  0x0000002a95988a4a in opal_progress ()
  from /san/mpi/openmpi-1.1-gcc/lib/libopal.so.0
#3  0x0000002a97a13fe7 in mca_pml_ob1_recv ()
  from /san/mpi/openmpi-1.1-gcc/lib/openmpi/mca_pml_ob1.so
#4  0x0000002a9846d0aa in ompi_coll_tuned_bcast_intra_chain ()
  from /san/mpi/openmpi-1.1-gcc/lib/openmpi/mca_coll_tuned.so
#5  0x0000002a9846d100 in ompi_coll_tuned_bcast_intra_pipeline ()
  from /san/mpi/openmpi-1.1-gcc/lib/openmpi/mca_coll_tuned.so
#6  0x0000002a9846a3d7 in ompi_coll_tuned_bcast_intra_dec_fixed ()
  from /san/mpi/openmpi-1.1-gcc/lib/openmpi/mca_coll_tuned.so
#7  0x0000002a956deae3 in PMPI_Bcast ()
  from /san/mpi/openmpi-1.1-gcc/lib/libmpi.so.0
#8  0x000000000040bcc7 in main ()

Other random information:
- The two processes stuck in MPI_Bcast are not on the same node. This has been the case both times I've gone through the backtraces, but I can't conclude that it's a necessary condition. - If I force the use of the "basic" MCA for collectives, this problem does not occur.
        - If I don't oversubscribe the nodes, things seem to work properly.
        - The C++ program source and result of ompi_info are attached

This should be easy to reproduce for anyone with access to Odin. I'm using Open MPI 1.1 configured with no special options. It is available as the module "mpi/openmpi-1.1-gcc" on the cluster. I'm using SLURM interactively to allocate the nodes before executing mpirun:

        srun -A -N 4

        Cheers,
        Doug Gregor




Thanks,
        Graham.
----------------------------------------------------------------------
Dr Graham E. Fagg       | Distributed, Parallel and Meta-Computing
Innovative Computing Lab. PVM3.4, HARNESS, FT-MPI, SNIPE & Open MPI
Computer Science Dept   | Suite 203, 1122 Volunteer Blvd,
University of Tennessee | Knoxville, Tennessee, USA. TN 37996-3450
Email: f...@cs.utk.edu  | Phone:+1(865)974-5790 | Fax:+1(865)974-8296
Broken complex systems are always derived from working simple systems
----------------------------------------------------------------------

Reply via email to