Okay, debug-daemons isn't going to help as we aren't launching any daemons. 
This is all one node. So try adding "--mca odls_base_verbose 10 --mca 
state_base_verbose 10" to the cmd line and let's see what is going on.

I agree with Josh - neither mpirun nor hostname are invoking the Mellanox 
drivers, so it is hard to see why removing those drivers is allowing this to 
run.

On Jan 28, 2020, at 11:35 AM, Ralph H Castain <r...@open-mpi.org 
<mailto:r...@open-mpi.org> > wrote:

kay, debug-daemons isn't going to help as we aren't launching any daemons. This 
is all one node. So try adding "--mca odls_base_verbose 10 --mca 
state_base_verbose 10" to the cmd line and let's see what is going on.

I agree with Josh - neither mpirun nor hostname are invoking the Mellanox 
drivers, so it is hard to see why removing those drivers is allowing this to 
run.


On Jan 28, 2020, at 11:28 AM, Collin Strassburger <cstrassbur...@bihrle.com 
<mailto:cstrassbur...@bihrle.com> > wrote:

Same result.  (It works though 102 but not greater than that)
 Input: mpirun -np 128 --debug-daemons  --map-by ppr:64:socket  hostname
 Output:
[Gen2Node3:54348] [[18008,0],0] orted_cmd: received add_local_procs
[Gen2Node3:54348] [[18008,0],0] orted_cmd: received exit cmd
[Gen2Node3:54348] [[18008,0],0] orted_cmd: all routes and children gone - 
exiting
--------------------------------------------------------------------------
mpirun was unable to start the specified application as it encountered an
error:
 Error code: 63
Error name: (null)
Node: Gen2Node3
 when attempting to start process rank 0.
--------------------------------------------------------------------------
128 total processes failed to start
 Collin

Reply via email to