Yes...it would indeed.

On 7/23/07 9:03 AM, "Kelley, Sean" <sean.kel...@solers.com> wrote:

> Would this logic be in the bproc pls component?
> Sean
> 
> 
> From: users-boun...@open-mpi.org on behalf of Ralph H Castain
> Sent: Mon 7/23/2007 9:18 AM
> To: Open MPI Users <us...@open-mpi.org>
> Subject: Re: [OMPI users] orterun --bynode/--byslot problem
> 
> No, byslot appears to be working just fine on our bproc clusters (it is the
> default mode). As you probably know, bproc is a little strange in how we
> launch - we have to launch the procs in "waves" that correspond to the
> number of procs on a node.
> 
> In other words, the first "wave" launches a proc on all nodes that have at
> least one proc on them. The second "wave" then launches another proc on all
> nodes that have at least two procs on them, but doesn't launch anything on
> any node that only has one proc on it.
> 
> My guess here is that the system for some reason is insisting that your head
> node be involved in every wave. I confess that we have never tested (to my
> knowledge) a mapping that involves "skipping" a node somewhere in the
> allocation - we always just map from the beginning of the node list, with
> the maximum number of procs being placed on the first nodes in the list
> (since in our machines, the nodes are all the same, so who cares?). So it is
> possible that something in the code objects to skipping around nodes in the
> allocation.
> 
> I will have to look and see where that dependency might lie - will try to
> get to it this week.
> 
> BTW: that patch I sent you for head node operations will be in 1.2.4.
> 
> Ralph
> 
> 
> 
> On 7/23/07 7:04 AM, "Kelley, Sean" <sean.kel...@solers.com> wrote:
> 
>> > Hi,
>> > 
>> >      We are experiencing a problem with the process allocation on our Open
>> MPI
>> > cluster. We are using Scyld 4.1 (BPROC), the OFED 1.2 Topspin Infiniband
>> > drivers, Open MPI 1.2.3 + patch (to run processes on the head node). The
>> > hardware consists of a head node and N blades on private ethernet and
>> > infiniband networks.
>> > 
>> > The command run for these tests is a simple MPI program (called 'hn') which
>> > prints out the rank and the hostname. The hostname for the head node is
>> 'head'
>> > and the compute nodes are '.0' ... '.9'.
>> > 
>> > We are using the following hostfiles for this example:
>> > 
>> > hostfile7
>> > -1 max_slots=1
>> > 0 max_slots=3
>> > 1 max_slots=3
>> > 
>> > hostfile8
>> > -1 max_slots=2
>> > 0 max_slots=3
>> > 1 max_slots=3
>> > 
>> > hostfile9
>> > -1 max_slots=3
>> > 0 max_slots=3
>> > 1 max_slots=3
>> > 
>> > running the following commands:
>> > 
>> > orterun --hostfile hostfile7 -np 7 ./hn
>> > orterun --hostfile hostfile8 -np 8 ./hn
>> > orterun --byslot --hostfile hostfile7 -np 7 ./hn
>> > orterun --byslot --hostfile hostfile8 -np 8 ./hn
>> > 
>> > causes orterun to crash. However,
>> > 
>> > orterun --hostfile hostfile9 -np 9 ./hn
>> > ortetrun --byslot --hostfile hostfile9 -np 9 ./hn
>> > 
>> > works outputing the following:
>> > 
>> > 0 head
>> > 1 head
>> > 2 head
>> > 3 .0
>> > 4 .0
>> > 5 .0
>> > 6 .0
>> > 7 .0
>> > 8 .0
>> > 
>> > However, running the following:
>> > 
>> > orterun --bynode --hostfile hostfile7 -np 7 ./hn
>> > 
>> > works, outputing the following
>> > 
>> > 0 head
>> > 1 .0
>> > 2 .1
>> > 3 .0
>> > 4 .1
>> > 5 .0
>> > 6 .1
>> > 
>> > Is the '--byslot' crash a known problem? Does it have something to do with
>> > BPROC? Thanks in advance for any assistance!
>> > 
>> > Sean
>> > 
>> > _______________________________________________
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to