Yes...it would indeed.
On 7/23/07 9:03 AM, "Kelley, Sean" <sean.kel...@solers.com> wrote: > Would this logic be in the bproc pls component? > Sean > > > From: users-boun...@open-mpi.org on behalf of Ralph H Castain > Sent: Mon 7/23/2007 9:18 AM > To: Open MPI Users <us...@open-mpi.org> > Subject: Re: [OMPI users] orterun --bynode/--byslot problem > > No, byslot appears to be working just fine on our bproc clusters (it is the > default mode). As you probably know, bproc is a little strange in how we > launch - we have to launch the procs in "waves" that correspond to the > number of procs on a node. > > In other words, the first "wave" launches a proc on all nodes that have at > least one proc on them. The second "wave" then launches another proc on all > nodes that have at least two procs on them, but doesn't launch anything on > any node that only has one proc on it. > > My guess here is that the system for some reason is insisting that your head > node be involved in every wave. I confess that we have never tested (to my > knowledge) a mapping that involves "skipping" a node somewhere in the > allocation - we always just map from the beginning of the node list, with > the maximum number of procs being placed on the first nodes in the list > (since in our machines, the nodes are all the same, so who cares?). So it is > possible that something in the code objects to skipping around nodes in the > allocation. > > I will have to look and see where that dependency might lie - will try to > get to it this week. > > BTW: that patch I sent you for head node operations will be in 1.2.4. > > Ralph > > > > On 7/23/07 7:04 AM, "Kelley, Sean" <sean.kel...@solers.com> wrote: > >> > Hi, >> > >> > We are experiencing a problem with the process allocation on our Open >> MPI >> > cluster. We are using Scyld 4.1 (BPROC), the OFED 1.2 Topspin Infiniband >> > drivers, Open MPI 1.2.3 + patch (to run processes on the head node). The >> > hardware consists of a head node and N blades on private ethernet and >> > infiniband networks. >> > >> > The command run for these tests is a simple MPI program (called 'hn') which >> > prints out the rank and the hostname. The hostname for the head node is >> 'head' >> > and the compute nodes are '.0' ... '.9'. >> > >> > We are using the following hostfiles for this example: >> > >> > hostfile7 >> > -1 max_slots=1 >> > 0 max_slots=3 >> > 1 max_slots=3 >> > >> > hostfile8 >> > -1 max_slots=2 >> > 0 max_slots=3 >> > 1 max_slots=3 >> > >> > hostfile9 >> > -1 max_slots=3 >> > 0 max_slots=3 >> > 1 max_slots=3 >> > >> > running the following commands: >> > >> > orterun --hostfile hostfile7 -np 7 ./hn >> > orterun --hostfile hostfile8 -np 8 ./hn >> > orterun --byslot --hostfile hostfile7 -np 7 ./hn >> > orterun --byslot --hostfile hostfile8 -np 8 ./hn >> > >> > causes orterun to crash. However, >> > >> > orterun --hostfile hostfile9 -np 9 ./hn >> > ortetrun --byslot --hostfile hostfile9 -np 9 ./hn >> > >> > works outputing the following: >> > >> > 0 head >> > 1 head >> > 2 head >> > 3 .0 >> > 4 .0 >> > 5 .0 >> > 6 .0 >> > 7 .0 >> > 8 .0 >> > >> > However, running the following: >> > >> > orterun --bynode --hostfile hostfile7 -np 7 ./hn >> > >> > works, outputing the following >> > >> > 0 head >> > 1 .0 >> > 2 .1 >> > 3 .0 >> > 4 .1 >> > 5 .0 >> > 6 .1 >> > >> > Is the '--byslot' crash a known problem? Does it have something to do with >> > BPROC? Thanks in advance for any assistance! >> > >> > Sean >> > >> > _______________________________________________ >> > users mailing list >> > us...@open-mpi.org >> > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users