Rémi Piatek <[email protected]> writes:
> Hello, > > I am getting started with SLURM and I am having a hard time understanding how > it > allocates CPUs to users depending on the resources they request. The problem I > am facing can be summarized as follows. Consider a bash script test.sh that > requests 8 CPUs but actually starts a job that uses 10 CPUs: > > #!/bin/sh > #SBATCH --ntasks=8 > stress -c 10 > > On a server with 32 CPUs, if I start 5 times this script with sbatch test.sh, > 4 > of them start running right away and the last one appears as pending, as shown > by the squeue command: > > JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) > 5 main test.sh jack PD 0:00 1 (Resources) > 1 main test.sh jack R 0:08 1 server > 2 main test.sh jack R 0:08 1 server > 3 main test.sh jack R 0:05 1 server > 4 main test.sh jack R 0:05 1 server > > The problem is that these 4 jobs are actually using 40 CPUs and overload the > server. I would on the contrary expect SLURM to either not start the jobs that > are actually using more resources than requested by the user, or to put them > on > hold until there are enough resources to start them. How can I make sure that > the users of my server do not start jobs that use too many CPUs? > > Some useful details about my slurm.conf file: > > # SCHEDULING > #DefMemPerCPU=0 > FastSchedule=1 > #MaxMemPerCPU=0 > SchedulerType=sched/backfill > SchedulerPort=7321 > SelectType=select/cons_res > SelectTypeParameters=CR_CPU > # COMPUTE NODES > NodeName=server CPUs=32 RealMemory=10000 State=UNKNOWN > # PARTITIONS > PartitionName=main Nodes=server Default=YES Shared=YES MaxTime=INFINITE > State=UP > > I am probably making a trivial mistake in the configuration file, of just > misunderstanding a basic concept of SLURM. Any help or advice would be much > appreciated. > > Many thanks in advance! Slurm just keeps track of how many cores have been assigned to running jobs - it doesn't check how many processes are actually started within a given job. So, it is up to the user to make sure she starts the correct number of processes. Cheers, Loris -- This signature is currently under construction.
