Okay, debug-daemons isn't going to help as we aren't launching any daemons. This is all one node. So try adding "--mca odls_base_verbose 10 --mca state_base_verbose 10" to the cmd line and let's see what is going on.
I agree with Josh - neither mpirun nor hostname are invoking the Mellanox drivers, so it is hard to see why removing those drivers is allowing this to run. On Jan 28, 2020, at 11:35 AM, Ralph H Castain <r...@open-mpi.org <mailto:r...@open-mpi.org> > wrote: kay, debug-daemons isn't going to help as we aren't launching any daemons. This is all one node. So try adding "--mca odls_base_verbose 10 --mca state_base_verbose 10" to the cmd line and let's see what is going on. I agree with Josh - neither mpirun nor hostname are invoking the Mellanox drivers, so it is hard to see why removing those drivers is allowing this to run. On Jan 28, 2020, at 11:28 AM, Collin Strassburger <cstrassbur...@bihrle.com <mailto:cstrassbur...@bihrle.com> > wrote: Same result. (It works though 102 but not greater than that) Input: mpirun -np 128 --debug-daemons --map-by ppr:64:socket hostname Output: [Gen2Node3:54348] [[18008,0],0] orted_cmd: received add_local_procs [Gen2Node3:54348] [[18008,0],0] orted_cmd: received exit cmd [Gen2Node3:54348] [[18008,0],0] orted_cmd: all routes and children gone - exiting -------------------------------------------------------------------------- mpirun was unable to start the specified application as it encountered an error: Error code: 63 Error name: (null) Node: Gen2Node3 when attempting to start process rank 0. -------------------------------------------------------------------------- 128 total processes failed to start Collin