One other parameter that I neglected to mention (and Scott pointed out to me is *not* documented in the FAQ) is the mpi_preconnect_oob MCA param.

This parameter will cause all the OOB connections to be created during MPI_INIT, and *may* help such kind of issues. You *do* need to have enough fd's available per process to allow this to happen at scale, of course. I'll try to add this information to the FAQ by the end of this week.

This kind of thing is much better in the v1.3 series -- the linear TCP wireup is no longer necessary (e.g., each MPI process only opens 1 TCP socket: to the daemon on its host, etc.).


On Jun 4, 2008, at 4:14 PM, Åke Sandgren wrote:

On Wed, 2008-06-04 at 11:43 -0700, Scott Shaw wrote:
Hi, I was wondering if anyone had any comments with regarding to my
posting of questions.  Am I off base with my questions or is this the
wrong forum for these types of questions?


Hi, I hope this is the right forum for my questions.  I am running
into a
problem when scaling >512 cores on a infiniband cluster which has
14,336
cores. I am new to openmpi and trying to figure out the right -mca
options

I don't have any real answerr to you question except that i have had no
problems running HPL on our 672 node dual quad core = 5376 cores with
infiniband.
We use verbs.
I wouldn't touch the oob parameters since it uses tcp over ethernet to
setup the environment.

--
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90 7866126
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Cisco Systems


Reply via email to