The hostfile contains a single line <machinename> slots=8
If I run it with 4 cores or fewer, the code will run fine.If I run it with 5 cores or more, it will hang some of the time after successfully executing several hundred broadcasts. The number varies from run to run. The code usually finishes with 5 cores. The probability of hanging seems to increase with the number of nodes. The syntax I use is simple.
mpiexec -machinefile hostfile -np 5 bcast_exampleThere was some discussion of a similar problem on the user list, but I could not find a resolution. I have tried setting the processor affinity (--mca mpi_paffinity_alone 1). I have tried varying the broadcast algorithm (--mca coll_tuned_bcast_algorithm 1-6). I have also tried excluding (-mca oob_tcp_if_exclude) my eth1 interface (see ifconfig.txt attached) which is not connected to anything. None of these changed the outcome.
Any thoughts or suggestions would be appreciated. -- "Through nonaction, no action is left undone." --Lao Tzu Louis F. Rossi ro...@math.udel.edu Department of Mathematical Sciences http://www.math.udel.edu/~rossi University of Delaware (302) 831-1880 (voice) Newark, DE 19716 (302) 831-4511 (fax)
bcast_example.c.gz
Description: GNU Zip compressed data
ompi_info.txt.gz
Description: GNU Zip compressed data
ifconfig.txt.gz
Description: GNU Zip compressed data