I’m a little under the weather and so will only be able to help a bit at a time. However, a couple of things to check:
* add -mca ras_base_verbose 5 to the cmd line to see what mpirun thought the allocation was * is the hostfile available on every node? Ralph > On Nov 1, 2018, at 10:55 AM, Adam LeBlanc <alebl...@iol.unh.edu> wrote: > > Hello Ralph, > > Attached below is the verbose output for a failing machine and a passing > machine. > > Thanks, > Adam LeBlanc > > On Thu, Nov 1, 2018 at 1:41 PM Adam LeBlanc <alebl...@iol.unh.edu > <mailto:alebl...@iol.unh.edu>> wrote: > > > ---------- Forwarded message --------- > From: Ralph H Castain <r...@open-mpi.org <mailto:r...@open-mpi.org>> > Date: Thu, Nov 1, 2018 at 1:07 PM > Subject: Re: [OMPI users] Bug with Open-MPI Processor Count > To: Open MPI Users <users@lists.open-mpi.org > <mailto:users@lists.open-mpi.org>> > > > Set rmaps_base_verbose=10 for debugging output > > Sent from my iPhone > > On Nov 1, 2018, at 9:31 AM, Adam LeBlanc <alebl...@iol.unh.edu > <mailto:alebl...@iol.unh.edu>> wrote: > >> The version by the way for Open-MPI is 3.1.2. >> >> -Adam LeBlanc >> >> On Thu, Nov 1, 2018 at 12:05 PM Adam LeBlanc <alebl...@iol.unh.edu >> <mailto:alebl...@iol.unh.edu>> wrote: >> Hello, >> >> I am an employee of the UNH InterOperability Lab, and we are in the process >> of testing OFED-4.17-RC1 for the OpenFabrics Alliance. We have purchased >> some new hardware that has one processor, and noticed an issue when running >> mpi jobs on nodes that do not have similar processor counts. If we launch >> the MPI job from a node that has 2 processors, it will fail and stating >> there are not enough resources and will not start the run, like so: >> >> -------------------------------------------------------------------------- >> There are not enough slots available in the system to satisfy the 14 slots >> that were requested by the application: >> IMB-MPI1 >> >> Either request fewer slots for your application, or make more slots available >> for use. >> -------------------------------------------------------------------------- >> >> If we launch the MPI job from the node with one processor, without changing >> the mpirun command at all, it runs as expected. >> >> Here is the command being run: >> >> mpirun --mca btl_openib_warn_no_device_params_found 0 --mca >> orte_base_help_aggregate 0 --mca btl openib,vader,self --mca pml ob1 --mca >> btl_openib_receive_queues P,65536,120,64,32 -hostfile >> /home/soesterreich/ce-mpi-hosts IMB-MPI1 >> >> Here is the hostfile being used: >> >> farbauti-ce.ofa.iol.unh.edu <http://farbauti-ce.ofa.iol.unh.edu/> slots=1 >> hyperion-ce.ofa.iol.unh.edu <http://hyperion-ce.ofa.iol.unh.edu/> slots=1 >> io-ce.ofa.iol.unh.edu <http://io-ce.ofa.iol.unh.edu/> slots=1 >> jarnsaxa-ce.ofa.iol.unh.edu <http://jarnsaxa-ce.ofa.iol.unh.edu/> slots=1 >> rhea-ce.ofa.iol.unh.edu <http://rhea-ce.ofa.iol.unh.edu/> slots=1 >> tarqeq-ce.ofa.iol.unh.edu <http://tarqeq-ce.ofa.iol.unh.edu/> slots=1 >> tarvos-ce.ofa.iol.unh.edu <http://tarvos-ce.ofa.iol.unh.edu/> slots=1 >> >> This seems like a bug and we would like some help to explain and fix what is >> happening. The IBTA plugfest saw similar behaviours, so this should be >> reproduceable. >> >> Thanks, >> Adam LeBlanc >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >> https://lists.open-mpi.org/mailman/listinfo/users >> <https://lists.open-mpi.org/mailman/listinfo/users>_______________________________________________ > users mailing list > users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > https://lists.open-mpi.org/mailman/listinfo/users > <https://lists.open-mpi.org/mailman/listinfo/users><passing_verbose_output.txt><failing_verbose_output.txt>_______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users