Rémi Piatek <[email protected]> writes:

> Hello,
>
> I am getting started with SLURM and I am having a hard time understanding how 
> it
> allocates CPUs to users depending on the resources they request. The problem I
> am facing can be summarized as follows. Consider a bash script test.sh that
> requests 8 CPUs but actually starts a job that uses 10 CPUs:
>
>     #!/bin/sh
>     #SBATCH --ntasks=8
>     stress -c 10
>
> On a server with 32 CPUs, if I start 5 times this script with sbatch test.sh, 
> 4
> of them start running right away and the last one appears as pending, as shown
> by the squeue command:
>
>     JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
>         5      main  test.sh     jack PD       0:00      1 (Resources)
>         1      main  test.sh     jack  R       0:08      1 server
>         2      main  test.sh     jack  R       0:08      1 server
>         3      main  test.sh     jack  R       0:05      1 server
>         4      main  test.sh     jack  R       0:05      1 server
>
> The problem is that these 4 jobs are actually using 40 CPUs and overload the
> server. I would on the contrary expect SLURM to either not start the jobs that
> are actually using more resources than requested by the user, or to put them 
> on
> hold until there are enough resources to start them. How can I make sure that
> the users of my server do not start jobs that use too many CPUs?
>
> Some useful details about my slurm.conf file:
>
>     # SCHEDULING
>     #DefMemPerCPU=0
>     FastSchedule=1
>     #MaxMemPerCPU=0
>     SchedulerType=sched/backfill
>     SchedulerPort=7321
>     SelectType=select/cons_res
>     SelectTypeParameters=CR_CPU
>     # COMPUTE NODES
>     NodeName=server CPUs=32 RealMemory=10000 State=UNKNOWN
>     # PARTITIONS
>     PartitionName=main Nodes=server Default=YES Shared=YES MaxTime=INFINITE
> State=UP
>
> I am probably making a trivial mistake in the configuration file, of just
> misunderstanding a basic concept of SLURM. Any help or advice would be much
> appreciated.
>
> Many thanks in advance!

Slurm just keeps track of how many cores have been assigned to running
jobs - it doesn't check how many processes are actually started within a
given job.  So, it is up to the user to make sure she starts the correct
number of processes.

Cheers,

Loris

-- 
This signature is currently under construction.

Reply via email to