Hi All,

I am seeing some funky behavior and am hoping someone has some ideas on where to start looking. I have installed openmpi 4.1.4 via spack on this cluster, Slurm aware. I then build Orca against that via spack as well (for context). Orca calls mpi under the hood with simple `mpirun -np X ....`. However I am running into a case where on some nodes I am getting `While computing bindings, we found no available cpus on the following node:` when trying to use more than `-np 2`. However, when I add `--oversubscribe` and `--host [hostname]` I can run successfully.

The other weird part of this is that it does not happen on all of my compute nodes. All of the compute nodes are installed identically with Rocky 8.

Here are examples:

```
[user@node2428 sbatch_scripts]$ mpirun --display-allocation -np 4 hostname

======================   ALLOCATED NODES   ======================
    node2428: flags=0x11 slots=4 max_slots=0 slots_inuse=0 state=UP
=================================================================
--------------------------------------------------------------------------
While computing bindings, we found no available cpus on
the following node:

  Node:  node2428

Please check your allocation.
--------------------------------------------------------------------------
[user@node2428 sbatch_scripts]$ mpirun --display-allocation --oversubscribe -np 4 hostname

======================   ALLOCATED NODES   ======================
    node2428: flags=0x11 slots=4 max_slots=0 slots_inuse=0 state=UP
=================================================================
--------------------------------------------------------------------------
While computing bindings, we found no available cpus on
the following node:

  Node:  node2428

Please check your allocation.
--------------------------------------------------------------------------
[user@node2428 sbatch_scripts]$ mpirun --display-allocation --oversubscribe --host node2428 -np 4 hostname

======================   ALLOCATED NODES   ======================
    node2428: flags=0x11 slots=4 max_slots=0 slots_inuse=0 state=UP
=================================================================
node2428
node2428
node2428
node2428
```

Thanks in advance!

--
Morgan Ludwig
Techsquare Inc.
http://www.techsquare.com/
mlud...@techsquare.com

Reply via email to