On Tue, Dec 17, 2013 at 11:16:48AM -0500, Noam Bernstein wrote: > On Dec 17, 2013, at 11:04 AM, Ralph Castain <r...@open-mpi.org> wrote: > > > Are you binding the procs? We don't bind by default (this will change in > > 1.7.4), and binding can play a significant role when comparing across > > kernels. > > > > add "--bind-to-core" to your cmd line > > I've previously always used mpi_paffinity_alone=1, and the new behavior > seems to be independent of whether or not I use it. I'll try bind-to-core.
That would be the problem. That variable no longer exists in 1.7.4 and has been replaced by hwloc_base_binding_policy. --bind-to core is an alias of -mca hwloc_base_binding_policy core. > One more possible clue. I haven't done a full test, but for one > particular setup (newer nodes, single node so presumably using > sm), there are apparently two ways to fix the problem: > 1. go back to the previous kernel, but stick with openmpi 1.7.3 > 2. stick with the new kernel, but go back to openmpi 1.6.4 > > So it appears to be some interaction between the new kernel and 1.7.3 that > isn't present with 1.6.4. > > We specifically switched to 1.7.3 because of a bug in 1.6.4 (lock up in some > collective communication), but now I'm wondering whether I should just test > 1.6.5. > > Noam > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
pgpvsBxN0Llm0.pgp
Description: PGP signature