[slurm-dev] Re: SLURM allows jobs to start even if they use more CPUs than requested

Morris Jette Tue, 23 Jun 2015 06:00:09 -0700

See the task/cgroups plugin for constraining jobs to specific CPUs and memory. 
Also see --exclusive option in srun.


On June 23, 2015 4:44:27 AM PDT, "Rémi Piatek" <[email protected]> wrote:
>
>I had considered this simple explanation, but it seemed unlikely to me,
>
>as it would imply that we completely have to rely on users to specify 
>correctly the number of CPUs they need. People use my server for 
>CPU-intensive jobs, so it is important for me to make sure that 
>resources are fairly shared. I was hoping slurm would allow me to do 
>this, and prevent people from free-riding (so far, it would be easy to 
>request a small number of CPUs and use a much larger number, thus 
>slowing down the other users).
>
>I read that when jobs exceed the memory requested and allocated by 
>slurm, they are automatically interrupted. Is there nothing similar for
>
>the use of CPUs?
>
>Thanks for the help! Much appreciated.
>
>
>On 06/23/2015 12:05 PM, Loris Bennett wrote:
>>
>> Rémi Piatek <[email protected]> writes:
>>
>>> Hello,
>>>
>>> I am getting started with SLURM and I am having a hard time
>understanding how it
>>> allocates CPUs to users depending on the resources they request. The
>problem I
>>> am facing can be summarized as follows. Consider a bash script
>test.sh that
>>> requests 8 CPUs but actually starts a job that uses 10 CPUs:
>>>
>>>      #!/bin/sh
>>>      #SBATCH --ntasks=8
>>>      stress -c 10
>>>
>>> On a server with 32 CPUs, if I start 5 times this script with sbatch
>test.sh, 4
>>> of them start running right away and the last one appears as
>pending, as shown
>>> by the squeue command:
>>>
>>>      JOBID PARTITION     NAME     USER ST       TIME  NODES
>NODELIST(REASON)
>>>          5      main  test.sh     jack PD       0:00      1
>(Resources)
>>>          1      main  test.sh     jack  R       0:08      1 server
>>>          2      main  test.sh     jack  R       0:08      1 server
>>>          3      main  test.sh     jack  R       0:05      1 server
>>>          4      main  test.sh     jack  R       0:05      1 server
>>>
>>> The problem is that these 4 jobs are actually using 40 CPUs and
>overload the
>>> server. I would on the contrary expect SLURM to either not start the
>jobs that
>>> are actually using more resources than requested by the user, or to
>put them on
>>> hold until there are enough resources to start them. How can I make
>sure that
>>> the users of my server do not start jobs that use too many CPUs?
>>>
>>> Some useful details about my slurm.conf file:
>>>
>>>      # SCHEDULING
>>>      #DefMemPerCPU=0
>>>      FastSchedule=1
>>>      #MaxMemPerCPU=0
>>>      SchedulerType=sched/backfill
>>>      SchedulerPort=7321
>>>      SelectType=select/cons_res
>>>      SelectTypeParameters=CR_CPU
>>>      # COMPUTE NODES
>>>      NodeName=server CPUs=32 RealMemory=10000 State=UNKNOWN
>>>      # PARTITIONS
>>>      PartitionName=main Nodes=server Default=YES Shared=YES
>MaxTime=INFINITE
>>> State=UP
>>>
>>> I am probably making a trivial mistake in the configuration file, of
>just
>>> misunderstanding a basic concept of SLURM. Any help or advice would
>be much
>>> appreciated.
>>>
>>> Many thanks in advance!
>> Slurm just keeps track of how many cores have been assigned to
>running
>> jobs - it doesn't check how many processes are actually started
>within a
>> given job.  So, it is up to the user to make sure she starts the
>correct
>> number of processes.
>>
>> Cheers,
>>
>> Loris
>>

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

[slurm-dev] Re: SLURM allows jobs to start even if they use more CPUs than requested

Reply via email to