If you're willing to try some stuff: 1) What about "-mca coll_sync_barrier_before 100"? (The default may be 1000. So, you can try various values less than 1000. I'm suggesting 100.) Note that broadcast has somewhat one-way traffic flow, which can have some undesirable flow control issues. 2) What about "-mca btl_sm_num_fifos 16"? Default is 1. If the problem is trac ticket 2043, then this suggestion can help. P.S. There's a memory leak, right? The receive buffer is being allocated over and over again. Might not be that closely related to the problem you see here, but at a minimum it's bad style. Louis Rossi wrote: I am having a problem with BCast hanging on a dual quad core Opteron (2382, 2.6GHz, Quad Core, 4 x 512KB L2, 6MB L3 Cache) system running FC11 with openmpi-1.4. The LD_LIBRARY_PATH and PATH variables are correctly set. I have used the FC11 rpm distribution of openmpi and built openmpi-1.4 locally with the same results. The problem was first observed in a larger reliable CFD code, but I can create the problem with a simple demo code (attached). The code attempts to execute 2000 pairs of broadcasts. |
- [OMPI users] Dual quad core Opteron hangs on Bcast. Louis Rossi
- Re: [OMPI users] Dual quad core Opteron hangs on Bca... Eugene Loh
- Re: [OMPI users] Dual quad core Opteron hangs on... Lenny Verkhovsky
- Re: [OMPI users] Dual quad core Opteron hangs on... Eugene Loh
- Re: [OMPI users] Dual quad core Opteron hang... Eugene Loh
- Re: [OMPI users] Dual quad core Opteron ... Louis Rossi
- Re: [OMPI users] Dual quad core Opteron hangs on Bca... Matthew MacManes