Sorry for the incredibly late reply. Hopefully, you have already managed to find the answer.
I'm not sure what your comm_spawn command looks like, but it appears you specified the host in it using the "dash_host" info-key, yes? The problem is that this is interpreted the same way as the "-host n001.cluster.com <http://n001.cluster.com> " option on an mpiexec cmd line - which means that it only allocates _one_ slot to the request. If you are asking to spawn two procs, then you don't have adequate resources. One way to check is to only spawn one proc with your comm_spawn request and see if that works. If you want to specify the host, then you need to append the number of slots to allocate on that host - e.g., "n001.cluster.com <http://n001.cluster.com> :2". Of course, you cannot allocate more than the system provided minus the number currently in use. There are additional modifiers you can pass to handle variable numbers of slots. HTH Ralph On Oct 25, 2019, at 5:30 AM, Mccall, Kurt E. (MSFC-EV41) via users <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > wrote: I am trying to launch a number of manager processes, one per node, and then have each of those managers spawn, on its own same node, a number of workers. For this example, I have 2 managers and 2 workers per manager. I'm following the instructions at this link https://stackoverflow.com/questions/47743425/controlling-node-mapping-of-mpi-comm-spawn to force one manager process per node. Here is my PBS/Torque qsub command: $ qsub -V -j oe -e ./stdio -o ./stdio -f -X -N MyManagerJob -l nodes=2:ppn=3 MyManager.bash I expect "-l nodes=2:ppn=3" to reserve 2 nodes with 3 slots on each (one slot for the manager and the other two for the separately spawned workers). The first argument is a lower-case L, not a one. Here is my mpiexec command within the MyManager.bash script. mpiexec --enable-recovery --display-map --display-allocation --mca mpi_param_check 1 --v --x DISPLAY --np 2 --map-by ppr:1:node MyManager.exe I expect "--map-by ppr:1:node" to cause OpenMpi to launch exactly one manager on each node. When the first worker is spawned vi MPI_Comm_spawn(), OpenMpi reports: ====================== ALLOCATED NODES ====================== n002: flags=0x11 slots=3 max_slots=0 slots_inuse=3 state=UP n001: flags=0x13 slots=3 max_slots=0 slots_inuse=1 state=UP ================================================================= -------------------------------------------------------------------------- There are no allocated resources for the application: ./MyWorker that match the requested mapping: -host: n001.cluster.com <http://n001.cluster.com/> Verify that you have mapped the allocated resources properly for the indicated specification. -------------------------------------------------------------------------- [n001:14883] *** An error occurred in MPI_Comm_spawn [n001:14883] *** reported by process [1897594881,1] [n001:14883] *** on communicator MPI_COMM_SELF [n001:14883] *** MPI_ERR_SPAWN: could not spawn processes It the banner above, it clearly states that node n001 has 3 slots reserved and only one slot in used at time of the spawn. Not sure why it reports that there are no resources for it. I've tried compiling OpenMpi 4.0 both with and without Torque support, and I've tried using a an explicit host file (or not), but the error is unchanged. Any ideas? My cluster is running CentOS 7.4 and I am using the Portland Group C++ compiler.