I'm using OpenMPI 4.1.1 compiled with Nvidia's nvc++ 20.9, and compiled with Torque support.
I want to reserve multiple slots on each node, and then launch a single manager process on each node. The remaining slots would be filled up as the manager spawns new processes with MPI_Comm_spawn on its local node. Here is the abbreviated mpiexec command, which I assume is the source of the problem described below (?). The hostfile was created by Torque and it contains many repeated node names, one for each slot that it reserved. $ mpiexec --hostfile MyHostFile -np 21 -npernode 1 (etc.) When MPI_Comm_spawn is called, MPI is reporting that "All nodes which are allocated for this job are already filled." They don't appear to be filled as it also reports that only one slot is in use for each node: ====================== ALLOCATED NODES ====================== n022: flags=0x11 slots=9 max_slots=0 slots_inuse=1 state=UP n021: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP n020: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP n018: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP n017: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP n016: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP n015: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP n014: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP n013: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP n012: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP n011: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP n010: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP n009: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP n008: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP n007: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP n006: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP n005: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP n004: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP n003: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP n002: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP n001: flags=0x13 slots=9 max_slots=0 slots_inuse=1 state=UP Do you have any idea what I am doing wrong? My Torque qsub arguments are unchanged from when I successfully launched this kind of job structure under MPICH. The relevant argument to qsub is the resource list, which is "-l nodes=21:ppn=9".