I’m a little under the weather and so will only be able to help a bit at a 
time. However, a couple of things to check:

* add -mca ras_base_verbose 5 to the cmd line to see what mpirun thought the 
allocation was

* is the hostfile available on every node?

Ralph

> On Nov 1, 2018, at 10:55 AM, Adam LeBlanc <alebl...@iol.unh.edu> wrote:
> 
> Hello Ralph,
> 
> Attached below is the verbose output for a failing machine and a passing 
> machine.
> 
> Thanks,
> Adam LeBlanc
> 
> On Thu, Nov 1, 2018 at 1:41 PM Adam LeBlanc <alebl...@iol.unh.edu 
> <mailto:alebl...@iol.unh.edu>> wrote:
> 
> 
> ---------- Forwarded message ---------
> From: Ralph H Castain <r...@open-mpi.org <mailto:r...@open-mpi.org>>
> Date: Thu, Nov 1, 2018 at 1:07 PM
> Subject: Re: [OMPI users] Bug with Open-MPI Processor Count
> To: Open MPI Users <users@lists.open-mpi.org 
> <mailto:users@lists.open-mpi.org>>
> 
> 
> Set rmaps_base_verbose=10 for debugging output 
> 
> Sent from my iPhone
> 
> On Nov 1, 2018, at 9:31 AM, Adam LeBlanc <alebl...@iol.unh.edu 
> <mailto:alebl...@iol.unh.edu>> wrote:
> 
>> The version by the way for Open-MPI is 3.1.2.
>> 
>> -Adam LeBlanc
>> 
>> On Thu, Nov 1, 2018 at 12:05 PM Adam LeBlanc <alebl...@iol.unh.edu 
>> <mailto:alebl...@iol.unh.edu>> wrote:
>> Hello,
>> 
>> I am an employee of the UNH InterOperability Lab, and we are in the process 
>> of testing OFED-4.17-RC1 for the OpenFabrics Alliance. We have purchased 
>> some new hardware that has one processor, and noticed an issue when running 
>> mpi jobs on nodes that do not have similar processor counts. If we launch 
>> the MPI job from a node that has 2 processors, it will fail and stating 
>> there are not enough resources and will not start the run, like so:
>> 
>> --------------------------------------------------------------------------
>> There are not enough slots available in the system to satisfy the 14 slots
>> that were requested by the application:
>>   IMB-MPI1
>> 
>> Either request fewer slots for your application, or make more slots available
>> for use.
>> --------------------------------------------------------------------------
>> 
>> If we launch the MPI job from the node with one processor, without changing 
>> the mpirun command at all, it runs as expected.
>> 
>> Here is the command being run:
>> 
>> mpirun --mca btl_openib_warn_no_device_params_found 0 --mca 
>> orte_base_help_aggregate 0 --mca btl openib,vader,self --mca pml ob1 --mca 
>> btl_openib_receive_queues P,65536,120,64,32 -hostfile 
>> /home/soesterreich/ce-mpi-hosts IMB-MPI1
>> 
>> Here is the hostfile being used:
>> 
>> farbauti-ce.ofa.iol.unh.edu <http://farbauti-ce.ofa.iol.unh.edu/> slots=1
>> hyperion-ce.ofa.iol.unh.edu <http://hyperion-ce.ofa.iol.unh.edu/> slots=1
>> io-ce.ofa.iol.unh.edu <http://io-ce.ofa.iol.unh.edu/> slots=1
>> jarnsaxa-ce.ofa.iol.unh.edu <http://jarnsaxa-ce.ofa.iol.unh.edu/> slots=1
>> rhea-ce.ofa.iol.unh.edu <http://rhea-ce.ofa.iol.unh.edu/> slots=1
>> tarqeq-ce.ofa.iol.unh.edu <http://tarqeq-ce.ofa.iol.unh.edu/> slots=1
>> tarvos-ce.ofa.iol.unh.edu <http://tarvos-ce.ofa.iol.unh.edu/> slots=1
>> 
>> This seems like a bug and we would like some help to explain and fix what is 
>> happening. The IBTA plugfest saw similar behaviours, so this should be 
>> reproduceable.
>> 
>> Thanks,
>> Adam LeBlanc
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>> https://lists.open-mpi.org/mailman/listinfo/users 
>> <https://lists.open-mpi.org/mailman/listinfo/users>_______________________________________________
> users mailing list
> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
> https://lists.open-mpi.org/mailman/listinfo/users 
> <https://lists.open-mpi.org/mailman/listinfo/users><passing_verbose_output.txt><failing_verbose_output.txt>_______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to