On Dec 16, 2010, at 3:29 AM, Gilbert Grosdidier wrote: >> Does this problem *always* happen, or does it only happen once in a great >> while? >> > gg= No, this problem happens rather often, almost every other time. > Seems to happen more often as the number of cores increases.
Well that's a bummer -- it seems to indicate that this may be a problem in OMPI. Are you running multiple OMPI jobs concurrently? More specifically, are you starting multiple jobs on the same machine more-or-less at the same time? I'm wondering if our TCP startup mechanism is somehow accidentally getting the TCP ports from a different job. I can't imagine how that would be happening, but... > gg= Is there a way with the current code, to direct OpenMPI to use a > restricted range of TCP ports, > that I can choose at launch time ? Yes. In OMPI v1.4, there's 2 MCA params: oob_tcp_port_min_v4 (default: 0) oob_tcp_port_range_v4 (default: 65536) Try setting these values to mutually exclusive ranges for each of your jobs and see if that fixes the problem. Keep in mind that user-level ports start at 1024, so your lowest range might as well start at 1024. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/