Hello,

I am new to Slurm and I am working on setting up a cluster. I am testing out 
running a batch execution using an array and am seeing only one task executed 
in the array per node. Even if I specify in the sbatch command that only one 
node should be used, it executes a single task on each of the available nodes 
in the partition. I was under the impression that it would continue to execute 
tasks until the resources on the node or for the user were at their limit. Am I 
missing something or have I misinterpreted how sbatch and/or the job scheduling 
should work?

Here is one of the commands I have run:

sbatch --array=0-15 --partition=htc-amd --wrap 'python3 -c "import time; 
print(\"working\"); time.sleep(5)"'

The htc-amd partition has 8 nodes and the results of this command are a single 
task being run on each node while the others are queued waiting for them to 
finish. As I mentioned before, if I specify --nodes=1 it will still execute a 
single task on every node in the partition. The only way I have gotten it to 
use on a single node was to use --nodelist, which worked but only to execute a 
single task and queued the rest. I have also tried specifying --ntasks and 
--ntasks-per-node. It appears to reserve resources, as I can cause it to hit 
the QOS core/cpu limit, but it does not affect the number of tasks executed on 
each node.

Thank you for any help you can offer!

Jason

Reply via email to