Certain combinations of topology configuration and srun -N option produce spurious job rejection with "Requested node configuration is not available" with select/cons_res. The following example illustrates the problem.
[sulu] (slurm) etc> cat slurm.conf ... TopologyPlugin=topology/tree SelectType=select/cons_res SelectTypeParameters=CR_Core ... [sulu] (slurm) etc> cat topology.conf SwitchName=s1 Nodes=xna[13-26] SwitchName=s2 Nodes=xna[41-45] SwitchName=s3 Switches=s[1-2] [sulu] (slurm) etc> sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST ... jkob up infinite 4 idle xna[14,19-20,41] ... [sulu] (slurm) etc> srun -N 2-4 -n 4 -p jkob hostname srun: Force Terminated job 79 srun: error: Unable to allocate resources: Requested node configuration is not available The problem does not occur with select/linear, or topology/none, or if -N is omitted, or for certain other values for -N (for example, -N 4-4 and -N 2-3 work ok). The problem seems to be in function _eval_nodes_topo in src/plugins/select/cons_res/job_test.c. The srun man page states that when -N is used, "the job will be allocated as many nodes as possible within the range specified and without delaying the initiation of the job." Consistent with this description, the requested number of nodes in the above example is 4 (req_nodes=4). However, the code that selects the best-fit topology switches appears to make the selection based on the minimum required number of nodes (min_nodes=2). It therefore selects switch s1. s1 has only 3 nodes from partition jkob. Since this is fewer than req_nodes the job is rejected with the "node configuration" error. I'm not sure where the code is going wrong. It could be in the calculation of the number of needed nodes in function _enough_nodes. Or it could be in the code that initializes/updates req_nodes or rem_nodes. I don't feel confident that I understand the logic well enough to propose a fix without introducing a regression. Regards, Martin
